Windows Azure Storage (WAS) Internals – Achieving Consistency

Windows Azure Storage has three primary components - a Queue, a Binary Large Object (BLOB) store (two types of these), and Table Storage.

Storage of data on-premises is fairly well understood - but there components of it that you may not consider. When you move to a distributed architecture, certain factors should be taken into account, such as consistency. Consistency means that when you store a datum it should be available in the same bit format across the calling mechanism. In other words, if you store a picture with a certain name, whenever you call that name that particular picture should show up. That might sound obvious - but when you begin to scale horizontally, it’s a big consideration. Systems are spread out over multiple physical racks, which are further separated into separate “fault domains” each with its own power, networking and so on, and in Windows Azure, the storage is replicated to ensure high-availability.

Some “cloud” systems relax the consistency target to allow for the highest speed throughput. This might allow inconsistent reads, meaning that the datum recorded in the naming system would be available yet, or that it might allow an older version of the datum to be read. In Windows Azure, we took the position that the consistency is of the highest importance. We achieved this through constructs such as the Location Service (LS), Stream, Partition and Front-End layers, and separate replication engines. Of key importance in a system that allows high consistency is in the naming and object access protocols - in fact, these turn out to be some of the most pivotal.

Windows Azure Storage has a complex arrangement to ensure this high consistency. You can read some very deep internals here.  And a video of the talk held at an ACM conference is here.

Skip to main content