How Replication setting Immediate_sync may cause Transactional Replication Distribution database growth and long running cleanup.
Distribution Cleanup for highly active servers with 60-100+ millions rows cached, database growth and cleanup will be an ongoing issue. If we can reduce the number of transactions/commands cached in the distribution database we can improve overall replication performance.
For these environments we recommend disabling “immediate_sync” if not needed for business reasons. When enabled, this setting maintains the replicated transactions in the Distribution database, just-in-case, a new subscriber is added or an existing subscriber needs to reinitialize. If new subscribers are not being added or existing not being reinitialized, this feature can be disabled. When disabled, the transactions are deleted as soon as they are replicated to the subscriber thereby reducing replication metadata overhead.
This job is responsible for deleting data from MSrepl_transactions and MSrepl_commands tables in the Distribution Database. Two parameters are passed to the cleanup job. The first one is minimum number of hours to keep transactions and second is to delete if transaction are older.
EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 4, @max_distretention = 72
When old data gets purged is determined by "immediate sync" setting of the Publication.
select immediate_sync,* from distribution.dbo.MSpublications
select immediate_sync, * from syspublications
In the above example:
If "immediate sync" = TRUE, then keep transaction for at least 4 hours EVEN IF they have been replicated to the subscriber but delete any transaction older the 72 hours. Metadata that is younger than 72 hours, but older than 4 hours will be kept. This is conservative.
If "immediate sync" = FALSE, then keep transaction for at least 4 hours but if the transaction has already been replicated (distributed) to the subscriber go ahead and delete EVEN IF not older then 72 hours. Metadata that is younger than 72 hours, but older than 4 hours may be deleted quite aggressively.
If the amount of metadata within the retention period is a problem, you could potentially change the immediate_sync flag with sp_changepublication, without reinit required, but there are some differences that you should be aware of. Read BOL for sp_addpublication about immediate_sync. There are potentially negative side effects for no-sync subptions if they have been marked expired or inactive, and also adding new subscribers and adding new articles will require more manual intervention.
Microsoft SQL Server Escalation Services
... post followup:
If you're going to turn off immediate_sync to reduce Distribution DB overhead, keep @min_distretention = 4 setting in the Distribution Cleanup Agent. This allow the Distribution database to maintain x amount of rows preventing potential blocking. As the Log Reader Agent is writing at one end of clustered index and the Cleanup Agent is deleting at the other, having some rows in between keeps the inserts and deletes from getting in the way of each other.