Dealing with Windows Azure Storage transient faults

Preamble

When an application uses a service, errors can occur because of temporary conditions such as intermittent service, infrastructure-level faults, network issues, or explicit throttling by the service. These types of error occur more frequently with cloud-based services, but can also occur in on-premises solutions. Very often, if you retry the operation a short time later (maybe only a few milliseconds later) the operation may succeed. These types of error conditions are referred to as transient faults. Transient faults typically occur very infrequently, and in most cases, only a few retries are necessary for the operation to succeed.

image

The Transient Fault Handling Application Block (Topaz) lets developers make their applications more resilient by adding robust transient fault handling logic. It encapsulates information about the transient faults that can occur when you use the following services in your application:

  • SQL Database
  • Windows Azure Service Bus
  • Windows Azure Storage
  • Windows Azure Caching Service

This blog post focuses on Windows Azure Storage.

Retries in Windows Azure Storage Client

The Windows Azure Storage Client Library has supported retries for a while as an integral part of the API. Instead of wrapping each API call with a call to one of Topaz's RetryPolicy methods, you would set the storage client’s RetryPolicy property and execute the action directly. In fact, you would get retries by default, so in order to use Topaz's retries you had to explicitly override the default retry policy in the storage client.

Using built-in support for retries, if such support is available, is usually more appealing than using external support like Topaz's: code is terser, and the API has more information about the actual operations it retries so it can be more effective. With external retry support an entire operation needs to be retried, while retries managed by the API can retry specific requests within a single operation. Also, when the API might perform additional work after it has returned a result (for example, if it returns an IEnumerable that might result in new requests to a service to pull new results while enumerating) an external retry would need to include not only the API call but the processing of the results.

Why would you want to use Topaz's retries instead of the built-in retry support from the Windows Azure Storage Client Library?

The built-in retry support prior to Windows Azure Storage Client v2 had two shortcomings:

  • It was relatively hard to extend, since it was based on delegates rather than classes.
  • It only allowed you to specify how long to wait but not which exceptions should be retried; t that decision was still made by the API. You could refuse a retry for exceptions that the client library considered should be retried, but you couldn't force a retry for exceptions that the client library had already vetoed.

With v2 the support for retries was overhauled to address these two shortcomings: it is class based now, and the new IRetryPolicy implementations are responsible for determining what exceptions should be retried. Because of this change, our recommendation now is that you use the built-in support for retries if you're using v2 of the Windows Azure Storage Client Library.

Does this mean that Topaz does not support Windows Azure Storage Client Library v2?

Absolutely not. Topaz supports retries for the new versions of the client library, as well as previous versions. You can use it if you wish, particularly if you want to benefit from Topaz's configuration support. It's just that as the Windows Azure Storage Client library evolved Topaz's support became less necessary so the recommendation changed.

Can you show me the difference in usage?

Here are examples of what using Topaz and the built-in retries look like. In both cases the initialization of retry policies can be separated from performing the requests, and many requests might share the same retry configuration.

Using Topaz

// setup retries
var retryPolicy =
    new RetryPolicy<StorageTransientErrorDetectionStrategy>(
new ExponentialBackoff(3, TimeSpan.FromMilliseconds(10),
TimeSpan.FromMilliseconds(10000), TimeSpan.FromMilliseconds(200)));

  client.RetryPolicy = new NoRetry();

// execute operation with retries
try
{
  foreach (var container in retryPolicy.ExecuteAction(() =>
client.ListContainers().ToArray()))
  {
    Console.WriteLine(container.Name);
  }
}
catch (Exception e)
{
  Console.WriteLine("Exception occurred: " + e.Message);
}

Note how the IEnumerable of containers is converted to an array as part of the operation to retry (which might result in some overhead) to ensure that all requests to the Windows Azure Storage services are retried.

Using built-in retry policies:

// setup retries
client.RetryPolicy = new ExponentialRetry(TimeSpan.FromMilliseconds(200), 3);

// execute operation with retries
try
{
  foreach (var container in client.ListContainers())
  {
    Console.WriteLine(container.Name);
  }
}
catch (Exception e) {
  Console.WriteLine("Exception occurred: " + e.Message);
}

Many thanks to Fernando Simonazzi for contributing to this post.