Geographically Distributed High Availability

This is the last post in the series on business continuity. It briefly describes how the service developer can provide geographically distributed high availability on the Azure platform.

The Azure platform currently does not directly support highly available, geographically distributed services. Customers can deploy services to multiple data centers, but the platform does nothing to help them integrate the separate deployments, or to achieve high availability with the resulting composite.

Geographically distributed high availability is essentially continuous disaster recovery. Each tenant’s data is replicated to multiple physical service instances. Load is distributed dynamically across those service instances to provide an optimal user experience. The two primary challenges are:

  • integrating physical service instances to create a single virtual service, and
  • minimizing data inconsistencies as client requests are routed among the different physical service instances.

Creating a Virtual Service

To create a single virtual service from a collection of physical service instances, clients of the service must use logical URIs that can be bound to different physical URIs. Using a global load balancer (GLB), a variety of policies can then be used to select a physical service instance to service a given client request. For example, many GLBs support the following types of routing:

  • ratio, which routes some fraction of client requests to each physical service instance,
  • geographical, which routes client requests to physical service instances in a given geography,
  • proximity, which routes client requests to the nearest physical service instances, and
  • round-robin, which routes client requests to physical service instances in a round-robin fashion.

Geographical routing is usually based on the requested domain name, but some services may require routing decisions to be based on other components of the request, such as path segments, query string parameters, or payload contents. In these cases, the service may use proximity routing to route requests to the nearest physical instance, which then proxies them to appropriate locations using a custom routing strategy.

Minimizing Data Inconsistencies

If the service data changes frequently, clients may see data inconsistencies, such as recently inserted data disappearing and recently deleted data reappearing, when their requests are routed to different service instances. One way to mitigate but not completely eliminate the inconsistencies is to affinitize clients to specific service instances, and to change affinities infrequently. For some data, this level of consistency may be acceptable. For other data, however, the service may have to use synchronous replication, which was described briefly in the previous post in this series, or to block writes for a client until all outstanding changes have propagated before routing their requests to a different service instance.