Using MaxCmdsInTran with “sync with backup” breaks Transactional Replication

Share this Post

Holger Linke
Microsoft SQL Server Escalation Services

Abstract:

This article describes an issue with SQL Server Transactional Replication that was detected recently. If you have configured the Log Reader agent to use the MaxCmdsInTran parameter, and also have configured the Distribution database to use the “sync with backup” option, you may experience a failure of the Log Reader Agent, or find that not all changes have been replicated to your subscribers.

 Configuration

Consider a scenario where you encounter all of the following conditions:

  • You are using Transactional Replication on SQL Server 2008, 2008 R2, or 2012
  • The Log Reader Agent has been configured with the MaxCmdsInTran parameter
  • The “sync with backup” option has been enabled on the Distribution database
  • On one or more connections, you execute several smaller transactions on published tables, consisting of a few commands each
  • In parallel, on a separate connection, you execute a larger transaction on a published table; the transaction consists of more commands than you have configured for the MaxCmdsInTran parameter

The issue might then occur after the next start of the Log Reader Agent, when the agent will process the pending transactions, and will attempt to split the large transaction according to the MaxCmdsInTran value.

 Symptoms

If you are affected by this issue, you will see one or more of the following symptoms:

  • The Log Reader agent starts to split the large transaction according to the value configured for MaxCmdsInTran. It commits the first sub-transaction to the distribution database, but skips the remaining commands from the large transactions. For example, if MaxCmdsInTran is set to 10.000, only the first 10.000 commands of the transaction will be replicated. All other commands are not transferred, leading to non-convergence with the subscribers.
  • The Log Reader agent attempts to transfer the first sub-transaction of the large transaction database several times. These attempts are failing with “duplicate key” errors for the MSrepl_transactions or MSrepl_commands system tables in the Distribution database.
  • The Log Reader agent might first fail with a “duplicate key” error for the MSrepl_transactions or MSrepl_commands system tables, but appear to recover itself from that. It continues to work and will transfer newer transactions. But a data validation will show that some parts of the large transaction are missing at the subscribers.
  • If you back up the distribution database while the Log Reader agent is performing the MaxCmdsInTran operation, the Log Reader might detect a possible inconsistent state in the distribution database and fail.

 Typically, from the large transaction, only the first MaxCmdsInTran sub-transaction would be picked up by the Log Reader; all other commands from the large transaction would be skipped. The smaller transactions will always remain unaffected, and all their data will be replicated.

Once you are affected by this issue, the Publisher and its Subscribers will remain out of sync. This usually requires a reinitialization of your topology.

 Associated Error Messages:

Log Reader Agent errors:

  • The process could not execute ‘sp_MSadd_replcmds’ on ‘<name of distributor>’.
  • Cannot insert duplicate key row in object ‘dbo.MSrepl_transactions’ with unique index ‘ucMSrepl_transactions’.
  • Cannot insert duplicate key row in object ‘dbo.MSrepl_commands’ with unique index ‘ucMSrepl_commands’.
  • Error 18846:
    Possible inconsistent state in the distribution database: dist_backup_lsn {…}, dist_last_lsn {…}.
    Execute “sp_repldone NULL, NULL, 0, 0, 1”, and then executesp_replflush.
    Reinitialize all subscriptions to the publication.

 Distribution Agent error:

  • Error20598:
    The row was not found at the Subscriber when applying the replicated command
  • Error 2627:
    Violation of %ls constraint ‘%.*ls’. Cannot insert duplicate key in object
    ‘%.*ls’. The duplicate key value is %ls.

These  errors might occur much later if any of the skipped rows from the large transaction is updated or deleted; or re-inserted if the skipped command was a Delete.

 Recommendation:

  • Never useMaxCmdsInTran if “sync with backup” has been set for the Distribution database.
  • Avoid using MaxCmdsInTran if possible at all. Instead, investigate ways to reduce the size of the transactions.
  • If you have to decide between using either MaxCmdsInTran or “sync with backup”, choose “sync with backup”.

 Status:

Possible solutions for this issue are investigated, but it is still open if and when a solution will become available.

 More Information:

  • MaxCmdsInTran:

This Log Reader parameter specifies the maximum number of statements grouped into a transaction, as the Log Reader writes commands to the distribution database. By default, this parameter is not set so that the
transaction boundaries of the Publisher are preserved. Using this parameter allows the Log Reader Agent to divide large transactions at the Publisher into several smaller transactions. Specifying this parameter can reduce contention at the Distributor and reduce latency between the Publisher and Subscriber. Because the original transaction is applied in smaller units, the Subscriber can access rows of a large logical Publisher transaction prior to the end of the original transaction, breaking strict transactional atomicity.

Originally, MaxCmdsInTran was not designed to be always turned on. It exists in the product to work around cases where a large number of DML operations were performed in a single transaction by accident. If you are routinely falling into this situation, you should seriously look into ways to reduce the size of your transactions.

 

  • sync with backup:

Setting this option on the distribution database ensures that transactions in the log of the publication database will not be truncated until they have been backed up at the distribution database. The distribution database  can be restored to the last backup, and any missing transactions are delivered from the publication database to the distribution database. Replication continues unaffected. Setting this option on the distribution database does not affect replication latency. However, the option will delay the truncation of the log on the publication database until the corresponding transactions in the distribution database have been backed up. (This can create a larger transaction log in the publication database.).

We recommend that you always set this option on the distribution database.

 Documentation

The previous documentation specifically advertised MaxCmdsInTran to enhance replication performance, for example in the following paragraph:

Enhance Transactional Replication Performance

http://technet.microsoft.com/en-us/library/ms151762.aspx

Distribution Agent and Log Reader Agent Parameters

Use the MaxCmdsInTran parameter for the Log Reader Agent.

This recommendation is no longer correct and has been updated in the documentation now. You should not use the MaxCmdsInTran parameter as a measure to improve the overall Log Reader performance. It should not remain activated permanently, e.g. in order to resolve ongoing, day-to-day performance bottlenecks of the Log Reader.


Share this Post

About: ReplTalk


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.