Re-configure Replication after restoring publication database from Backup - Replication Scenario

Sharing one of the interesting scenario I worked today.

Somebody messed up the data at publication database (few rows updated with a value which they aren't supposed to) and publication database was restored from a older backup to rollback the change. Now when log reader agent was started it starting failing with an error:

2012-03-14 01:08:14.360 Publisher: {call sp_repldone ( 0x00159ebb000031250001, 0x00159ebb000031250001, 0, 0)}
2012-03-14 01:08:14.376 Status: 0, code: 20011, text: 'The process could not execute 'sp_repldone/sp_replcounters' on 'SQLPROD'.'.
2012-03-14 01:08:14.423 The process could not execute 'sp_repldone/sp_replcounters' on 'SQLPROD'.
2012-03-14 01:08:14.470 Repl Agent Status: 6
2012-03-14 01:08:14.517 Status: 0, code: 9003, text: 'The log scan number (1416891:12581:1) passed to log scan in database 'DBNAME' is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure during startup. '.
2012-03-14 01:08:14.579 Status: 0, code: 22017, text: 'The process could not set the last distributed transaction.'.
2012-03-14 01:08:14.610 Status: 0, code: 22037, text: 'The process could not execute 'sp_repldone/sp_replcounters' on 'SQLPROD'.'.

-- DBCC OPENTRAN in the publication database shows:

Replicated Transaction Information:
        Oldest distributed LSN     : (1416699:339145:57)
        Oldest non-distributed LSN : (1416699:341745:3)

-- So distribution database has more information about the publication database than what publication database has now because it was restored from a older backup. To be precise:
Publication database says it has sent this LSN                                -->         1416699:339145:57 (Oldest distributed LSN)
Publication database says it has to start sending from this LSN      -->         1416699:341745:3 (Oldest non-distributed LSN)

But Log reader agent says I have to start from here                        -->         1416891:12581:1 (High value comapred to what publication database says)

So how do you handle this situation? Well here is the guide:

Option 1:

The Log Reader Agent can be recovered by using the sp_replrestart stored procedure, which re-synchronizes the metadata between the Publisher and the Distributor. This can result in Subscribers potentially having more data than the Publisher. In the event that these changes are made to the publication database again and subsequently replicated, the Distribution Agent will fail with data consistency errors. To override these errors use the Distribution Agent profile “Continue on data consistency errors.”
Note:  Care should be taken when overriding these errors. For more information, see Skipping Errors in Transactional Replication in SQL Server Books Online.

How if works? sp_replrestart will fill NO-OP (No-Operation) transaction which will cause the LSN's to increase until the log LSN matches the LSN as per Distribution database. So from there-on, Logreader agent will be able to read the LSN, it is expecting.

Note: Depending on how old the backup of the Published database which was restored, it may take hours for this operation and may make the transaction log to grow big.
and grow by gigs, until they match.

Caution:   This option (Option 1) can lead to inconsistencies between data at the Publisher and data at the Subscribers!!!

Option 2:
--> Stop all applications accessing the publication database.

--> Run the Distribution Agent until all Subscribers are synchronized with the outstanding commands in the distribution database. Verify that all commands are delivered to Subscribers by using the Undistributed Commands tab in Replication Monitor or by querying the MSdistribution_status view in the distribution database.

--> Use the tablediff Utility or another tool to manually synchronize the Publisher with the Subscriber, which allows you to recover data from the subscription database that was not contained in the publication database backup.

--> Drop the subscription(s).

--> At the Publisher on the publication database, execute sp_addsubscription. Specify the name of the database at the Subscriber containing the published data for @destination_db, a value of push for @subscription_type, and a value of 'replication support only' for @sync_type.

--> At the Publisher on the publication database, execute sp_addpushsubscription_agent or sp_addpullsubscription_agent and start the distribution agent.

--> Now you can allow changes to the publication database.

 How it works? 'replication support only' in the @sync_type says that we already have data and schema at the subscriber and we are asking replication to just create stored procedures and other replication objects and start synchoronization.

Option 3: (Recommended)

Script the Publication -> Drop Subscriptions -> Drop Publication -> Re-create them

You can use "Sync with Backup" to avoid these scenarios but at the expense of high log reader latency depending on the frequency of the log backup at the publication database. Read more about "sync with backup" option from https://msdn.microsoft.com/en-us/library/ms152560.aspx

 

Please comment if you tried this and whether it helped you or you faced some issues.!

 

Sakthivel Chidambaram,

SQL Server Support, Microsoft