Windows Azure Storage and Concurrent Access

A lot of the examples of using Windows Azure Storage that I run across are pretty simple and just demonstrate the basics of reading and writing. Knowing how to read and write usually isn't sufficient for a multi-instance, distributed application. For applications that run more than one instance (i.e. pretty much everything you'd run in the cloud,) handling concurrent writes to a shared resource is usually important.

Luckily, Windows Azure Tables and Blobs have support for concurrent access through ETags. ETags are part of the HTML 1.1 specification, and work like this:

  1. You request a resource (table entity or blob) and when you get the resource you also get an ETag value. This value is a unique value for the current version of the resource; the version you were just handed.

  2. You do some modifications to the data and try to store it back to the cloud. As part of the request, you specify a conditional HTTP Request header such as If-Match HTTP along with the value of the ETag you got in step 1.

  3. If the ETag you specify matches the current value of the resource in the cloud, the save happens. If the value in the cloud is different, someone's changed the data between steps 1 and 2 above and an error is returned.

    Note: If you don't care and want to always write the data, you can specify an '*' value for the ETag in the If-Match header.

Unfortunately there's no centralized ETag document I can find for Windows Azure Storage. Instead it's discussed for the specific APIs that use it. See Windows Azure Storage Services REST API Reference as a starting point for further reading.

This is all low level HTTP though, and most people would rather use a wrapper around the Azure APIs to make them easier to use.

Wrappers

So how does this work with a wrapper? Well, it really depends on the wrapper you're using. For example, the Azure module for Node.js allows you to specify optional parameters that work on ETags. For instance, when storing table entities you can specify checkEtag: true. This translates into an HTTP Request Header of 'If-Match', which means "only perform the operation if the ETag I've specified matches the one on this resource in the cloud". If the parameter isn't present, the default is to use an ETag value of '*' to overwrite. Here's an example of using checkEtag:

 tableService.updateEntity('tasktable',serverEntity, {checkEtag: true}, function(error, updateResponse) {
    if(!error){
        console.log('success');
    } else {
        console.log(error);
   }
});

Note that I don't specify an ETag value anywhere above. This is because it's part of serverEntry, which I previously read from the server. You can see the value by looking at serverEntry['etag']. If the ETag value in serverEntity matches the value on the server, the operation fails and you'll receive an error similar to the following:

 { code: 'UpdateConditionNotSatisfied', 
      message: 'The update condition specified in the request was not satisfied.\nRequestId:a5243266-ac68-4c64-bc55-650da40bfba0\nTime:2012-02-14T15:04:43.9702840Z' }

Blob's are slightly different, in they can use more conditions than If-Match, as well as combine conditionals. Specifying Conditional Headers for Blob Service Operations has a list of the supported conditionals, note that you can use DateTime conditionals as well as ETag. Since you can do combinations of conditionals, the syntax is slightly different; you have to specify the conditions as part of an accessConditions. For example:

 var options = { accessConditions: { 'If-None-Match': '*'}};
blobService.createBlockBlobFromText('taskcontainer', 'blah.txt', 'random text', options, function(error){
    if(!error){
        console.log('success');
   } else {
        console.log(error);
   }
});

For this I just used one condition - If-None-Match - but I could have also added another to the accessConditions collection if needed. Note that I used a wildcard instead of an actual ETag value. What this does is only create the 'blah.txt' blob if it doesn't already exist.

Summary

For cloud applications that need to handle concurrent access to file resources such as blobs and tables, Windows Azure Storage provides this functionality through ETags. If you're not coding directly against the Windows Azure Storage REST API, you should ensure that the wrapper/convenience library you are using exposes this functionality if you plan on using it.