How to Sync Large SQL Server Databases to SQL Azure

Article
09/24/2010

Over the past few days I have seen a number of posts from people who have been looking to synchronize large databases to SQL Azure and have been having issues. In many of these posts, people have looked to use tools like SSIS, BCP and Sync Framework and have run into issues such as SQL Azure closing the transaction due to throttling of the connection (because it took to apply the data) and occasionally local out-of-memory issues as the data is sorted.

For today’s post I wanted to spend some time discussing the subject of synchronizing large data sets from SQL Server to SQL Azure using Sync Framework. By large database I mean databases that are larger than 500MB in size. If you are synchronizing smaller databases you may still find some of these techniques useful, but for the most part you should be just fine with the simple method I explained here.

For this situation, I believe there are three very useful capabilities within the Sync Framework:

1) MemoryDataCacheSize: This helps to limit the amount of memory that is allocated to Sync Framework during the creation of the data batches and data sorting. This typically helps to fix any out-of-memory issues. In general I typically allocate 100MB (100000) to this parameter as the best place to start, but if you have larger or smaller amounts of free memory, or if you still run out-of-memory, you can play with this number a bit.

RemoteProvider.MemoryDataCacheSize = 100000;

2) ApplicationTransactionSize (MATS): This tells the Sync Framework how much data to apply to the destination database (SQL Azure in this case) at one time. We typically call this Batching. Batching helps us to work around the issue where SQL Azure starts to throttle (or disconnect) us if it takes too long to apply the large set of data changes. MATS also has the advantage of allowing me to tell sync to pick up where it left off in case a connection drops off (I will talk more about this in a future post) and has the advantage that it provides me the ability to get add progress events to help me track how much data has been applied. Best of all it does not seem to affect performance of sync. I typically set this parameter to 50MB (50000) as it is a good amount of data that SQL Azure can commit easily, yet is small enough that if I need to resume sync during a network disconnect I do not have too much data to resend.

RemoteProvider.ApplicationTransactionSize = 50000;

3) ChangesApplied Progress Event: The Sync Framework database providers have an event called ChangesApplied. Although this does not help to improve performance, it does help in the case where I am synchronizing a large data set. When used with ApplicationTransactionSize I can tell my application to output whenever a batch (or chunk of data) has been applied. This helps me to track the progress of the amount of data that has been sent to SQL Azure and also how much data is left.

RemoteProvider.ChangesApplied += new EventHandler<DbChangesAppliedEventArgs>(RemoteProvider_ChangesApplied);

When I combine all of this together, I get the following new code that I can use to create a command line application to sync data from SQL Server to SQL Azure. Please make sure to update the connection strings and the tables to be synchronized.

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Data.SqlClient;

using Microsoft.Synchronization.Data.SqlServer;

using Microsoft.Synchronization.Data;

using Microsoft.Synchronization;

namespace SQLAzureDataSync

{

class Program

{

public static string sqlazureConnectionString = "Server=[Your SQL Azure Server].database.windows.net;Database=AdventureWorksLTSQLAzure;User ID=[Your SQL Azure User Name]@[Your SQL Azure Server];Password=[Your SQL Azure Password];Trusted_Connection=False;Encrypt=True;";

public static string sqllocalConnectionString = "Server=(local);Database=AdventureWorksLT2008;Trusted_Connection=True";

public static readonly string scopeName = "AllTablesSyncGroup";

public static int transactionCount;

public static uint BatchSize = 50000;

public static uint MemorySize = 100000;

static void Main(string[] args)

{

// Test if input arguments were supplied:

if (args.Length == 0)

{

System.Console.WriteLine("Please enter an argument.");

System.Console.WriteLine("Usage: SyncTest.exe -setup");

System.Console.WriteLine(" SyncTest.exe -sync");

}

else if (args[0] == "-setup")

Setup();

else if (args[0] == "-sync")

Sync();

}

public static void Setup()

{

try

{

SqlConnection sqlServerConn = new SqlConnection(sqllocalConnectionString);

SqlConnection sqlAzureConn = new SqlConnection(sqlazureConnectionString);

DbSyncScopeDescription myScope = new DbSyncScopeDescription(scopeName);

DbSyncTableDescription TestSchema1 = SqlSyncDescriptionBuilder.GetDescriptionForTable("TestSchema1", sqlServerConn);

// Add the tables from above to the scope

myScope.Tables.Add(TestSchema1);

// Setup SQL Server for sync

SqlSyncScopeProvisioning sqlServerProv = new SqlSyncScopeProvisioning(sqlServerConn, myScope);

sqlServerProv.CommandTimeout = 60 * 30;

if (!sqlServerProv.ScopeExists(scopeName))

{

// Apply the scope provisioning.

Console.WriteLine("Provisioning SQL Server for sync " + DateTime.Now);

sqlServerProv.Apply();

Console.WriteLine("Done Provisioning SQL Server for sync " + DateTime.Now);

}

else

Console.WriteLine("SQL Server Database server already provisioned for sync " + DateTime.Now);

// Setup SQL Azure for sync

SqlSyncScopeProvisioning sqlAzureProv = new SqlSyncScopeProvisioning(sqlAzureConn, myScope);

if (!sqlAzureProv.ScopeExists(scopeName))

{

// Apply the scope provisioning.

Console.WriteLine("Provisioning SQL Azure for sync " + DateTime.Now);

sqlAzureProv.Apply();

Console.WriteLine("Done Provisioning SQL Azure for sync " + DateTime.Now);

}

else

Console.WriteLine("SQL Azure Database server already provisioned for sync " + DateTime.Now);

sqlAzureConn.Close();

sqlServerConn.Close();

}

catch (Exception ex)

{

Console.WriteLine(ex);

}

public static void Sync()

{

try

{

SqlConnection sqlServerConn = new SqlConnection(sqllocalConnectionString);

SqlConnection sqlAzureConn = new SqlConnection(sqlazureConnectionString);

SqlSyncProvider RemoteProvider = new SqlSyncProvider(scopeName, sqlAzureConn);

SqlSyncProvider LocalProvider = new SqlSyncProvider(scopeName, sqlServerConn);

//Set memory allocation to the database providers

RemoteProvider.MemoryDataCacheSize = MemorySize;

LocalProvider.MemoryDataCacheSize = MemorySize;

//Set application transaction size on destination provider.

RemoteProvider.ApplicationTransactionSize = BatchSize;

//Count transactions

RemoteProvider.ChangesApplied += new EventHandler<DbChangesAppliedEventArgs>(RemoteProvider_ChangesApplied);

SyncOrchestrator orch = new SyncOrchestrator();

orch.RemoteProvider = RemoteProvider;

orch.LocalProvider = LocalProvider;

orch.Direction = SyncDirectionOrder.UploadAndDownload;

Console.WriteLine("ScopeName={0} ", scopeName.ToUpper());

Console.WriteLine("Starting Sync " + DateTime.Now);

ShowStatistics(orch.Synchronize());

sqlAzureConn.Close();

sqlServerConn.Close();

}

catch (Exception ex)

{

Console.WriteLine(ex);

}

public static void RemoteProvider_ChangesApplied(object sender, DbChangesAppliedEventArgs e)

{

transactionCount += 1;

int totalBytes = transactionCount * Convert.ToInt32(BatchSize);

Console.WriteLine("Changes Applied event fired: Transaction " + totalBytes+" bytes.");

}

public static void ShowStatistics(SyncOperationStatistics syncStats)

{

string message;

message = "\tSync Start Time :" + syncStats.SyncStartTime.ToString();

Console.WriteLine(message);

message = "\tSync End Time :" + syncStats.SyncEndTime.ToString();

Console.WriteLine(message);

message = "\tUpload Changes Applied :" + syncStats.UploadChangesApplied.ToString();

Console.WriteLine(message);

message = "\tUpload Changes Failed :" + syncStats.UploadChangesFailed.ToString();

Console.WriteLine(message);

message = "\tUpload Changes Total :" + syncStats.UploadChangesTotal.ToString();

Console.WriteLine(message);

message = "\tDownload Changes Applied :" + syncStats.DownloadChangesApplied.ToString();

Console.WriteLine(message);

message = "\tDownload Changes Failed :" + syncStats.DownloadChangesFailed.ToString();

Console.WriteLine(message);

message = "\tDownload Changes Total :" + syncStats.DownloadChangesTotal.ToString();

Console.WriteLine(message);

}

Liam Cavanagh

How to Sync Large SQL Server Databases to SQL Azure

Additional resources