Windows Azure

I have been looking into Windows Azure a lot more lately and wanted to post a little bit about some of the things that I have found out so far.  It is certainly a new way of thinking of things and will take some careful thought.

Azure Tables

The first big difference is if you decide to use Azure Tables to store your data.  Here is the official verbiage on them:

Windows Azure Table – Programming Table Storage
Windows Azure Table provides scalable, available, and durable structured storage in the form of tables. The tables contain entities, and the entities contain properties. The tables are scalable to billions of entities and terabytes of data, and may be partitioned across thousands of servers. The tables support ACID transactions over single entities and rich queries over the entire table. Simple and familiar .NET and REST programming interfaces are provided via ADO.NET Data Services.

So what does this all mean to you?  Well, when you create a table, there are two important “columns” that you need to concern yourself with.  The PartitionKey and the RowKey

PartitionKey

The PartitionKey a string that we use to split up the data.  For example, if we were creating a table that would store news articles, the PartitionKey may be the section header (Top News, Sports, Weather, etc).  We can then take each of these groupings and put each individual one on a different server.  This is part of how we can scale.

RowKey

The RowKey is also a string that we use to sort the data.  This needs to be a unique entry for each row of data.  The most common thing to use so far has been some form of date entry.  This is what we use by default to sort the data.  So if you use the ticks of the current timestamp, then all your data will show up with the oldest entry first.  If you want them to show up the other way around, one easy way to make that happen is to subtrace the current timestamp from the DateTime.MaxValue and then get the ticks from that.  Just please remember to use UTC times!

Azure Queues

When you get data from a user and want to enter it into the data backend, there are a few ways to handle it.  The major reason that I see for using a queue is for the situation where something happens while entering data into the back end.  If you just enter the data directly from the web site’s code, you can get into a situation where some tables are updated and others are not.  If you instead place the data into a queue and then have a Worker Role process that pulls that out of the queue, if there is a problem then the message will still be in the queue and the next Worker Role process can pull it out and process it.

This isn’t without it’s own set of problems however.

  • If you have updated some data, when you try to insert into a table you can get an error that the data already exists.  So you need to check for that and call update instead.
  • You then have a problem where the data may have existed when you attempted to insert it again, but it was then deleted before you tried to update so that also fails.  So to handle that, you need to put the insert/update attempt into a loop to ensure it happens.
  • What if the reason that a Worker Role crashes or doesn’t complete is because the message is causing the problem (corrupt data)?  Then you need to have a way to check when you are processing a message to see if the timestamp when the message was entered into the database was a certain time in the past.  If that is the case, then you know that the message is bad so just delete it.

Azure Blobs

Blobs are a really interesting concept where you can store named files in the cloud.  You can also store metadata for these files.  One way that this can be used is to store the images used on your website.  Typically they have always been stored in a subdirectory of your website.  The problem with doing that is that the package that you use to deploy your website to Azure will then have these images.  This means if you want to update your images, you have to redeploy the entire package.  If you instead store them in blob storage, you can then update them whenever you want and it doesn’t affect the website at all.

You can use the name of the file to include a \ in it and treat those like folders.  So if you had music files you were storing, you could put it in as a file name called “rock\myrocksong.mp3”.  You could then do queries to pull out all the “rock” files.

How Do I Videos

There are some great “How Do I” videos available for Windows Azure, be sure to check them out.

Going forward

I am going to continue to look into Windows Azure and some of the other pieces of Azure and will post things as I figure them out.  I already have a few tips that I have been looking at, but I want to get a few more before I start posting them.  Let me know what you think of Azure and any of these things.