blambert/learnings – Don’t Base64 encode PartitionKey and RowKey values in Azure Table Storage.

Azure Table Storage is a “typeless” entity storage system.

If you’re an RDBMS person, and learned about data storage by reading ISBN 0-201-14192-2, Azure Table Storage is a different way of thinking about your data.

In Azure Table Storage, an entity is simply a “bag of properties”; with a certain minimum “shape” or set of required properties.

These are:

  • Timestamp.
  • PartitionKey.
  • RowKey.

Timestamp
This is not the property you’re looking for; move along.  This property is managed by Azure.  Don’t put anything into it, don’t use it for anything, and don’t infer anything from its contents.  (Think of it as though it was named “NotYourProperty”.)

PartitionKey
The first part of an entity’s primary key.  Entities are organized (partitioned for load balancing, etc.) in Azure by their PartitionKey.

RowKey
The second part of an entity’s primary key, which serves to uniquely identify an entity within a given partition.

Read https://msdn.microsoft.com/en-us/library/dd179338.aspx to more fully grok the model.

As you design your storage needs in Azure Table Storage, you will notice that PartitionKey and RowKey are string values, which can be up to 32KB in length.

But, there are limitations on what can be stored in these properties!

They can’t contain the following characters:

  • The forward slash (/) character.
  • The backslash (\) character.
  • The number sign (#) character.
  • The question mark (?) character.

So you must be careful about what data you store in your PartitionKey and RowKey properties.

One of the simple things you need to know is this:

Do not Base64 encode binary data and use it as a PartitionKey or RowKey.

A lot of developers use Base64 encoding to turn binary data into strings.  It’s fast, reversible, well understood, and easy to find in the developer tool bag.  Especially in .NET, where you can simply call:

Convert.ToBase64String
Convert.FromBase64String

(If you have never read up on what Base64 is, I suggest reading the Wikipedia Base64 entry at this point.)

The Base64 “alphabet” (see https://www.rfc-editor.org/rfc/rfc4648.txt) contains these characters:

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=

And forward slash “/” cannot be used in Azure Table Storage PartitionKey or RowKey values.

So what should you do?  More on this in a later post.  At the moment, I am simply turning binary data into HEX strings.  There are smarter, more efficient approaches…

Brian