Initializing Azure SQL Database with SQL Server Transactional Replication


Introduction

For migrating databases from an on-premise SQL Server to Azure SQL Database, Transactional Replication has become more and more popular. This approach has some important advantages: you may maintain a live copy of your production data in Azure SQL Database, and reduce the downtime for the actual switch to the cloud to a minimum. Ideally, you would only have to stop activity on the on-premise database, wait until the replication agents have transferred the last changes, then redirect your applications to the cloud database. You can also test your applications against Azure SQL Database on live production data to ensure that they will continue to work after the switch.

The article Migration from SQL Server to Azure SQL Database Using Transactional Replication gives you an overview of the major steps and benefits, and additional links to topics that will help you with setting up the replication topology.

This article here provides you with some practical advice that we have collected from previous support cases and internal testing.:

Considerations for setting up Azure SQL Database subscriptions

Make sure that your Publisher and Distributor are running a supported version of SQL Server

The Publisher and Distributor must be at least at one of the following versions:
SQL Server 2016
SQL Server 2014 SP1 CU3
SQL Server 2014 RTM CU10
SQL Server 2012 SP3
SQL Server 2012 SP2 CU8
Attempting to configure replication using an older version may fail the Distribution agent with the following errors:
MSSQL_REPL20084: The process could not connect to Subscriber.
MSSQL_REPL40532: Cannot open server <name> requested by the login. The login failed.

Applying Snapshots

With Azure SQL Database subscriptions, you are limited to applying regular snapshots to the subscriber database. It is not possible to restore a backup from the publisher database, so you cannot use the sp_addsubscriber ... @sync_type = 'initialize with backup' or 'initialize from lsn'.

Do not use manual subscriber initialization

In theory, you could load the subscriber data through alternate methods, like importing the database from a BACPAC file, or loading the data with SSIS or BCP. But this runs a very high risk that your replication will fail later on, because the publisher and subscriber data might no longer match. This may happen if the publisher data changes while or after exporting the data, and before the import into the subscriber has completed.

Typical errors would be:

20598 The row was not found at the Subscriber when applying the replicated command
2601 Cannot insert duplicate key row in object 'xxx' with unique index 'zzz'
2627 Violation of xxx constraint 'zzz'. Cannot insert duplicate key in object

Choosing the DTU Performance level of the Subscriber database

The initial synchronization is depending directly on the performance level of the subscriber database. Applying the snapshot is usually very I/O-intensive and will quickly overwhelm the lower scaling levels. Don't even think about Basic. Standard S3 may work for smaller databases. For anything meaningful, go directly to the highest Premium level you can afford. Choose a P2 as a minimum, and a P6 for best performance. PremiumRS might be an option as well.

The idea behind that is to transfer the snapshot as quickly as possible. This reduces the risk of transient errors e.g. because of an internet hickup or a reconfiguration of the subscriber database. Any retry or resend of the snapshot is usually more expensive than getting it right in the first attempt.

After the snapshot has been applied, you may scale down again to the intended DTU level.

Include all necessary article options to replicate the required objects

The schema of the subscriber tables mainly depend on the article's schema_option parameter. It defines which of the table details are scripted and moved to the subscribers. The recommendation is to create the publication from a script, so that you may check the sp_addarticle commands and see what options have been enabled in the schema_option parameter. If you have used SQL Server Management Studio, you may script out the publication now for comparison (right-click the publication -> Generate scripts…).

The following schema_option values are relevant for configuring the table details:

0x08 Replicate timestamp columns. If not set, timestamp columns are replicated as binary.
0x10 Generates a corresponding clustered index. Even if this option is not set, indexes related to primary keys and unique constraints are generated if they are already defined on a published table.
0x20 Converts user-defined data types (UDT) to base data types at the Subscriber.
0x40 Generates corresponding nonclustered indexes. Even if this option is not set, indexes related to primary keys and unique constraints are generated if they are already defined on a published table.
0x80 Replicates primary key constraints. Any indexes related to the constraint are also replicated, even if options 0x10 and 0x40 are not enabled.
0x100 Replicates user triggers on a table article, if defined.
0x200 Replicates foreign key constraints. If the referenced table is not part of a publication, all foreign key constraints on a published table are not replicated.
0x400 Replicates check constraints.
0x800 Replicates defaults.
0x1000 Replicates column-level collation
0x4000 Replicates UNIQUE constraints. Any indexes related to the constraint are also replicated, even if options 0x10 and 0x40 are not enabled.

If any of these options is not set, you may be missing the triggers, defaults, or indexes going with it. For example, if the last four numbers on schema_option are 0x5FF8, you have included all indexes, constraints, triggers etc. If it reads e.g. 0x509F then you are missing UDTs, non-clustered indexes, triggers, FK constraints, check constraints, defaults.

If you have set one of the following values, then some of your data types will get converted to basic data types:

0x10000000 Converts xml columns to ntext on the Subscriber.
0x20000000 Converts large object data types (nvarchar(max), varchar(max), and varbinary(max)) introduced in SQL Server 2005 to data types that are supported on SQL Server 2000.
0x200000000 Converts date and time data types (date, time, datetimeoffset, and datetime2) introduced in SQL Server 2008 to data types that are supported on earlier versions of SQL Server.
0x2000000000 Converts the hierarchyid data type to varbinary(max) so that columns of type hierarchyid can be replicated to Subscribers that are running SQL Server 2005.
0x8000000000 Converts the geography and geometry data types to varbinary(max) so that columns of these types can be replicated to Subscribers that are running SQL Server 2005.

Consider creating several publications instead of a single large publication

If your publisher database contains one or more large tables, you may think about splitting them into separate publications. This has the advantage that you can (re-)initialize a subset of your data and thus get  better control of the subscriber load. It avoids the issue that the transfer of larger objects may delay or prevent the transfer of all other objects.

Consider data compression

If some of your tables are are really large, it might be a good time to think about data compression. If you compress your large tables e.g. with:

ALTER TABLE [dbo].[tablename] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = ROW)

you may then maintain the compression while replicating the tables to the subscriber. You simply have to configure the corresponding schema_option for your article, e.g.:

exec sp_addarticle ... @schema_option = 0x000000040803509F

This schema_option relates to:

0x400000000 Replicates the compression option for data and indexes. For more information, see Data Compression.

After applying the snapshot: Seed your Identity values at the subscriber

If your database contains tables with Identity columns, make sure to enable schema_option chema_option 0x04: "Identity columns are scripted using the IDENTITY property". This will maintain the Identity columns on your migrated database.

This has one drawback though: the Identity property will be created with the "NOT FOR REPLICATION" flag. This means that even though the Distribution agent is inserting all your data, the Identity seed will not be moved forward accordingly.

So before redirecting your applications to the migrated Azure SQL Database, you need to reseed your Identity values using DBCC CHECKIDENT. Please refer to the blog article Fix identities after migrating through SQL Replica for a script solution.

Further references


Comments (0)

Skip to main content