Inside Distribution Cleanup Watermark

Share this Post

The Distribution database is a cache of undistributed commands that the Distribution SQL Agent jobs are responsible to deliver to their subscribers.  This cache is populated by the Publisher LogReader Agent job as new data changes are detected. The primary tables holding the cache of undistributed data is MSrepl_transactions and MSrepl_commands.

Distribution Cleanup

Like any cache, the distribution tables required purging of commands already delivered to subscribers.  This task is performed by SQL Agent job “Distribution clean up: distribution” running every ten minutes deleting transactions older than 72 hours.

clip_image002

EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 0, @max_distretention = 72

Two related SQL Agent jobs “Expired subscription clean up” running “EXEC sys.sp_expired_subscription_cleanup”  and “Agent history clean up: distribution” running “EXEC dbo.sp_MShistory_cleanup @history_retention = 48” are also created when Replication is enabled.  As the names imply, they cleanup Distribution History tables and remove metadata for subscribers not yet synchronized within the retention period and are now expired.  The “Expired subscription clean up” job is resource intense, scheduled to run daily at 1am and can be adjusted under Job Properties.

AdventureWorks Publication

To demonstrate cleanup watermark, a Snapshot publication was created on “Customers” and a Transactional publication on “SalesOrderDetail” using AdventureWorksLT2012.  From Replication Monitor, both publications were created, and Snapshot delivered by their Distribution Agents on 3/1.

clip_image004

Snapshot Publications are used to refresh target table at regular intervals or on-demand.  The Snapshot Publication has 2 SQL Agent Jobs, first a Snapshot Agent job to export out tables and schema, i.e. the snapshot files, and second Distribution Agent responsible for delivering the newly exported data to the subscriber.  In this example; Snapshot Job runs 1st of every month at midnight followed by the Distribution Agent at 1am.

clip_image006

The Transactional publication also has a Snapshot Agent, run once to preload the Subscriber tables.  It also has a SQL Agent job called Log Reader Agent scanning the published database’s transaction log continuously looking for committed data changes, writing those changes into MSrepl_transactions and MSrepl_commands in the distribution database.  The Transactional publication has its own Distribution Agent continuously applying data in distribution database cache to the subscriber.

Querying MSrepl_transactions in the Distribution database, entries for both snapshots created on 3/1 are recorded when snapshot were created.  The transaction watermark (xact_seqno) and time recorded (entry_time) are used later during cleanup.

SELECT [publisher_database_id],[xact_id],[xact_seqno],CONVERT(char(10), [entry_time],126) [entry_time]
FROM [distribution].[dbo].[MSrepl_transactions]

clip_image008

These watermarks represent Snapshot processes since no other data changes have been made.  To confirm these are Snapshot delivery commands, execute sp_browsereplcmds with or without parameters.  This command displays the distribution cache in readable format displaying “sync” commands pointing to snapshot files for each publication.

sp_browsereplcmds '0x0000004000000023000800000001','0x00000040000000670049'

clip_image010

Another key table for Distribution cleanup in MSdistribution_history containing comments and watermarks for last successful synchronization and displayed in Replication Monitor. Notice Agent_id 1 delivering the Snapshot Publication then stopping as configured while Agent_id 2, the Transactional Publication’s Distribution agent, runs continuously looking for new transaction in the distribution database cache.

SELECT [agent_id],[runstatus],[start_time],[comments]
,[xact_seqno],[delivered_transactions],[delivered_commands]
FROM [distribution].[dbo].[MSdistribution_history]
order by agent_id, time
agent_id  comments                                           xact_seqno             
--------- -------------------------------------------------- ----------------------
1         Bulk copied data into table 'Customer' (847 rows)  0x00000000000000000000
1         Delivered snapshot from the '\\ReplData\unc...     0x00000000000000000000
1         The process was successfully stopped.              0x00000040000000880008

2         Bulk copied data into table 'SalesOrderDetail' (54 0x00000000000000000000
2         Delivered snapshot from the '\\ReplData\unc...     0x00000000000000000000
2         No replicated transactions are available.          0x00000040000000880008

Snapshot delivered, inserting new data.

For demonstration, the Distribution Cleanup Agent was turned off and 3 UPDATEs are made in the SalesOrderDetail table on 3/1, 3/8, and 3/15, all of which were delivered to the subscriber.  The watermark and entry_time recorded for each transactions are shown below.

SELECT [xact_seqno],CONVERT(char(10), [entry_time],126) [entry_time]
FROM [distribution].[dbo].[MSrepl_transactions]

clip_image012

Distribution Cleanup

On 3/15, the Distribution Cleanup Job executing command below will be run removing transaction older than 72 hours.  From the MSrepl_transaction entry_time values above, cleanup should remove transactions recorded on 3/1 and 3/8 as these are outside the 72 hour cutoff.

EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 0, @max_distretention = 72

Wait, what, why did the Distribution Cleanup leave 3/1 and 3/8 transactions, those are clearly older than 72 hours?

clip_image014

Cleanup Watermark

The cleanup logic calculates the cutoff point, called Max_Cleanup_Xact_Seqno, by examining Distribution tables MSsubscriptions and MSdistribution_history progress for ALL publication.  Transactions below this value have been delivered and can be deleted, values above this “may” be undelivered and can’t be deleted.

To display watermark for ALL publication, execute Max_Cleanup_Xact_Seqno.sql now located on GitHub.

This script returns SQL Replication Distribution Max_Cleanup_Xact_Seqno watermark.  Only transactions\commands below this watermark will be DELETED when Distribution Cleanup Agent Job runs.  The maximum cleanup is one value calculated across all publications on the Published database.  Distribution Agents with Max_Agent_Hist_Xact_Seqno 0x00 indicate failed agent. Examine MSdistribution_history or Replication Monitor to learn why.  Those with older Xact_Seqno indicate agents which have not run for days, resulting in low watermark.  The Max_Cleanup_Xact_Seqno watermark advances when all Distribution Agent successfully execute.

clip_image016

You can query MSrepl_transactions for each Xact_Seqno to show date when transactions were recorded in the Distribution database.

SELECT [publisher_database_id],[xact_id],[xact_seqno],CONVERT(char(10), [entry_time],126) [entry_time]
FROM [distribution].[dbo].[MSrepl_transactions]
WHERE xact_seqno = 0x00000040000000880008000000000000

clip_image018

In my example, the Transactional Publication has data > 72 hours (3/1 and 3/8), however, the Snapshot Publication’s Distribution agent last ran on 3/1.  The distribution cleanup can’t remove the older Transactional Publication activity from 3/1 and 3/8 when Snapshot Publication’s watermark is still back on 3/1.  All activity from Transactional publication will remain in the Distribution database until watermark for ALL publications is advanced.

Advancing Watermark

For cleanup to remove older Transactional publication data I’ll need to advance watermark for ALL my publications.  For my example, this means executing the Snapshot Publication’s Distribution Agent.  Once this older Distribution Agent executes, its watermark is advanced allowing cleanup to delete data for ALL publications. Since this is a “snapshot” publication which has already been delivered, executing the Snapshot’s Distribution Agent simply records “No replicated transactions are available.” in MSdistribution_history while advancing the watermark.

clip_image020

Checking SQL Replication Distribution Max_Cleanup_Xact_Seqno watermark I see it is now advanced allowing Distribution Cleanup to remove all transaction > 72 hours.

clip_image022

New Distribution Agent Strategy

Understanding cleanup watermark is calculated across ALL publications for a given published database, requires changing Snapshot Publication’s Distribution Agent to run at frequent intervals advancing its watermark allowing cleanup for ALL publications. Re-running the Distribution Agent will not re-deliver the Snapshot files until Snapshot Agent is re-run, recreating Snapshot files to refresh the Subscriber.

Observe Transactional Publication’s Distribution Agents for those with high latency or in failed state and these too lead to low cleanup watermarks preventing other publications from being cleaned up.

By ensuring all Distribution Agent are running at regular intervals, Snapshot or Transactional, it confirms the cleanup watermark is advanced allowing Distribution Cleanup to delete transactions for ALL publications on the published database.

Chris Skorlinski
Microsoft SQL Server Escalation Services


Share this Post

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.