Implementing Geographic Redundancy

[This article was contributed by the SQL Azure team.]

clip_image002Geographic redundancy is a fancy word for distributing your data across multiple data centers globally. One way to accomplish this is to spread your data across many SQL Azure data centers using Data Sync Service for SQL Azure to synchronize a hub database to many global member databases. SQL Azure has six data centers worldwide

In the future I will blog about a more complicated technique that provides real time writes and geographic redundancy with Data Sync Services.

Considerations

The technique that I will discuss in this blog post is best for databases with low writes and high reads where the write data doesn’t need to be synchronized in real time. One scenario that fits well with this technique is a magazine publishing web site that I discussed in this blog post.

The current CTP of Data Sync Service for SQL Azure allows the most frequent synchronization to be an hour. This limitation reduces the usefulness of this specific technique to a limited number of applications. The good news is that you will have an option for more frequent synchronization in upcoming releases of Data Sync Services.

Setting up Data Sync Service for SQL Azure

In order to get geographic redundancy, you will need to have multiple SQL Azure service accounts with SQL Azure servers that are in different data centers. Currently, you are only allowed to have one server per account, which means to have resided in multiple data centers, you need multiple account. The location of the data centers and the count of the member databases depend on the needs of your application. Having two databases, in two different data centers is the minimum needed for geographic redundancy.

The next step is to set up Data Sync Service for SQL Azure. With this technique, the hub database is the database that you can read and write from, and the member databases are the ones you only read (SELECT) from. Having the member databases read-only simplifies the issues of data integrity and synchronization conflicts.

Modifying the Data Layer

The final step is to add some intelligence to your data layer code that will pick a local datacenter relative to the Windows Azure web role first. If the local database is offline, and the code fails over to a remote SQL Azure datacenter. This code expands on the connection handling code that was discussed in this blog post.

Connection Strings

Using the setup above, you now have multiple read-only member databases distributed globally. This means that you have multiple connection strings that your application can use. As a design exercise, let’s assume these applications are Windows Azure web roles and that they could exist in the same data centers as the SQL Azure database.

In this technique I am going to dynamically construct the connection string every time a read query is executed against the database. I want my code to have these goals:

  • Return a connection string where the primary server is in the same datacenter as the Windows Azure web role is running. This will increase performance and reduce data transfer charges.
  • Return a connection string where the failover partner is in a different data center than the primary server. This will give me geographic redundancy.
  • Build on earlier code examples to try again at the local datacenter for transient errors, instead of calling the remote data center.
  • Have one server per datacenter that has matching data.
  • All the databases have the same name; this keeps the code simpler for this example, since we don’t have to maintain a list of databases name for each server.
  • Only try one remote data center. If the local data center and the random failover partner fail, abort the operation.

ConnectionStringManager

ConnectionStringManager class, shown below, knows about the data centers and servers that hold the data. It also knows how to read the user id and password from the configuration file. From this information, it can construct a connection string to a local SQL Azure database, or return a random failover partner to connect to remotely.

The ConnectionStringManager class code can be found in the download at the end of the blog post. The ConnectionStringManager class code looks like this:

 internal class ConnectionStringManager
{
    /// <summary>
    /// Enumeration of SQL Azure Data Centers
    /// </summary>
    internal enum DataCenterEnum
    {
        EastAsia,
        NorthCentralUS,
        NorthEurope,
        SouthCentralUS,
        SoutheastAsia,
        WestEurope
    }

    protected Dictionary<DataCenterEnum, String> _datacenters =
        new Dictionary<DataCenterEnum, string>();
    Random _random = new Random();

    internal ConnectionStringManager()
    {
        // WWB: List of Servers With The Same Data, One Per DataCenter
        _datacenters[DataCenterEnum.SouthCentralUS] 
            = "n50rqzztlj";
        _datacenters[DataCenterEnum.NorthCentralUS] 
            = "iiyzhlnz4s";
    }

    /// <summary>
    // Overall performance could be improved by changing or
    // caching this algorithm, however for simplicity of 
    // the example, I am leaving it this way.
    /// </summary>
    /// <param name="localDataCenter"></param>
    /// <returns></returns>
    internal DataCenterEnum? FailOverPartner(
        DataCenterEnum localDataCenter)
    {
        // WWB: Generate a List Of FailOver Partners Not In
        // The Local Data Center
        List<DataCenterEnum> nonLocalServers = new List<DataCenterEnum>();
        foreach (DataCenterEnum dataCenter in _datacenters.Keys)
            if (dataCenter != localDataCenter)
                nonLocalServers.Add(dataCenter);

        // WWB: There Are No Valid FailOver Partners
        if (nonLocalServers.Count == 0)
            return (null);

        // WWB: Choose a Random Remote DataCenter
        int index = _random.Next(nonLocalServers.Count);

        return (nonLocalServers[index]);
    }

    internal String Database
    {
        get
        {
            return (ConfigurationManager.AppSettings["Database"]);
        }
    }

    internal String UserId
    {
        get
        {
            return (ConfigurationManager.AppSettings["UserId"]);
        }
    }

    internal String Password
    {
        get
        {
            return (ConfigurationManager.AppSettings["password"]);
        }
    }

    /// <summary>
    /// Return the server name for the datacenter.
    /// </summary>
    /// <param name="localDataCenter"></param>
    /// <returns></returns>
    internal String Server(DataCenterEnum localDataCenter)
    {
        return (_datacenters[localDataCenter]);
    }

    /// <summary>
    /// Returns the Connection String For the DataCenter
    /// Performance Could be Improved By Caching the Results
    /// And Not Constructing the String Everytime.
    /// </summary>
    /// <param name="localDataCenter"></param>
    /// <returns></returns>
    internal String ConnectionString(DataCenterEnum localDataCenter)
    {
        StringBuilder connectionString = new StringBuilder();

        connectionString.AppendFormat("Server=tcp:{0}.database.windows.net;",
            Server(localDataCenter));
        connectionString.AppendFormat("Database=AdventureWorksLTAZ2008R2;", 
            Database);
        connectionString.AppendFormat("User ID={0}@{1};", UserId,
            Server(localDataCenter));
        connectionString.AppendFormat("Password={0};", Password);
        connectionString.AppendFormat("Trusted_Connection=False;");
        connectionString.AppendFormat("Encrypt=True;");

        return (connectionString.ToString());
    }
}

The calling code builds on the connection handling code that was discussed in this blog post. Currently there is no way in Windows Azure to detect what data center you are in from your C# code. So we have to hard code the local data center in the code. This will need to be changed as you deploy the same code to other Windows Azure data centers. You could also move it to the Windows Azure configuration file and have you code read it from there.

 static private ConnectionStringManager 
    ConnectionStringManager = new ConnectionStringManager();
static private ConnectionStringManager.DataCenterEnum 
    LocalDataCenter = ConnectionStringManager.DataCenterEnum.SouthCentralUS;

/// <summary>
/// Generic Code that takes and input and executes 
/// a statement against the SQL Azure
/// </summary>
/// <param name="companyId"></param>
static void DoSomething(Int32 customerId)
{
    // Always Start Locally, It Is Less Expensive and Faster
    ConnectionStringManager.DataCenterEnum currentDataCenter 
        = LocalDataCenter;

    // This is the retry loop, handling the retries session
    // is done in the catch for performance reasons
    for (Int32 attempt = 1; ; )
    {
        // Create a SqlConnection Class, the connection isn't established 
        // until the Open() method is called
        using (SqlConnection sqlConnection =
            new SqlConnection(
                ConnectionStringManager.ConnectionString(currentDataCenter)))
        {
            try
            {
                // Open the connection
                sqlConnection.Open();

                // Statement To Call
                String sql = @"SELECT CompanyName FROM [SalesLT].[Customer]" + 
                    @" WHERE CustomerId = @CustomerId";

                SqlCommand sqlCommand = new SqlCommand(sql, sqlConnection);

                sqlCommand.Parameters.AddWithValue("@CustomerId", customerId);

                using (SqlDataReader sqlDataReader = sqlCommand.ExecuteReader())
                {
                    while (sqlDataReader.Read())
                    {
                        /// Do Something Wtih CompanyName
                    }

                    return;
                }
            }
            catch (SqlException sqlException)
            {
                // Increment attempts
                attempt++;

                // Find Maximum attempts
                Int32 maxRetryCount = Int32.Parse(
                    ConfigurationManager.AppSettings["ConnectionRetrys"]);

                // Throw Error if we have reach the maximum number of attempts
                if (attempt == maxRetryCount)
                    throw;

                // Determine if we should retry or abort.
                if (RetryLitmus(sqlException))
                {
                    // Transient Error, Lets Wait And Try Again
                    // In This Data Center
                    Thread.Sleep(ConnectionRetryWaitSeconds(attempt));
                }
                else if (currentDataCenter == LocalDataCenter)
                {
                    // Non Transient Error, Lets Try In a Remote DataCenter
                    currentDataCenter = ConnectionStringManager.FailOverPartner
                        (currentDataCenter).Value;
                }
                else
                {
                    // We have tried a remote datacenter, now it is time to error.
                    throw;
                }
            }
        }
    }
}

Summary

Do you have questions, concerns, comments? Post them below and we will try to address them.