INF : How does Logical Record Feature Works in Merge Replication

By default, merge replication processes data changes on a row-by-row basis. In many circumstances this is appropriate, but for some applications, it is essential that related rows be processed as a unit. The logical records feature of merge replication allows you to define a relationship between related rows in different tables so that the rows are processed as a unit.

Grouping Changes to Related Rows with Logical Records

https://msdn.microsoft.com/en-us/library/ms152507(v=SQL.100).aspx

Though this feature will be removed in future versions as documented in the above link, we do not have the exact version of SQL yet when this feature will be removed. But as per books online, for backward compatibility it is still there in SQL 2008 R2 and Denali. Thus I guess it’s going to stay there for some time.

You will notice that it is mentioned in the above article, that “If merge processing is interrupted, such as in the case of a dropped connection, the partially completed set of related replicated changes is rolled back if logical records are used”

We will now try to understand how does the rows get processed as a logical Unit and how do they rollback in case of a merge agent failure.

1. Let us take an example of 3 tables CUSTOMERS, ORDERS and ORDERITEMS

2. CUSTOMERS and ORDERS tables share parent/child relationship on CUSTOMER_ID column. CUSTOMERS table -> Primary key(CUSTOMER_ID ) on CUSTOMERS table, Foreign Key(CUSTOMER_ID ) on ORDERS table.

3. ORDERS and ORDERITEMS tables share parent/child relationship on Order_ID column. ORDERS table -> Primary key(Order_ID) on ORDERS table, Foreign Key(Order_ID) on ORDERITEMS table.

4. Now let us take an example where a user named “MSFT” reviewed all the customers and orders which belonged to his region (location_id = 5) and modified the “Last_Modified_By” column. Below are the updates that he will run.

--On Publisher

Update CUSTOMERS set Last_Modified_By ='MSFT'where location_ID ='5'

Update ORDERS set Last_Modified_By ='MSFT'

where CUSTOMER_ID in(select CUSTOMER_ID from CUSTOMERS

where location_ID ='5')

Update ORDERITEMS set Last_Modified_By ='MSFT'

where CUSTOMER_ID in(select CUSTOMER_ID from CUSTOMERS

where location_ID ='5')

( 10000 row(s) affected)

( 22500 row(s) affected)

( 47500 row(s) affected)

 

5. Start the Sync for Merge Agent.

6. Now consider that Merge agent fails in between the sync with some error like Network failures etc . It stops/fails in between before it is able to push *all* the changes made above, to the subscriber.

7. Run the below select statements at the subscriber to track the changes made on behalf of the above UPDATE statements(which were made at the publisher)

--On Subscriber

selectcount(*)from CUSTOMERS where Last_Modified_By ='MSFT'and location_ID ='5'

selectcount(*)from ORDERS where Last_Modified_By ='MSFT'

and CUSTOMER_ID in(select CUSTOMER_ID from CUSTOMERS

where location_ID ='5')

selectcount(*)from ORDERITEMS where Last_Modified_By ='MSFT'

and CUSTOMER_ID in(select CUSTOMER_ID from CUSTOMERS

where location_ID ='5')

(1616 row(s) affected)

(18000 row(s) affected)

(20000 row(s) affected)

8. You might wonder why only partial rows were updated for all the 3 tables. , either the entire updates should have been committed or they should have rolled back.
Grouping Changes to Related Rows with Logical Records
https://msdn.microsoft.com/en-us/library/ms152507.aspx

9. It works as per the logic mentioned in the article, but only thing is we need to understand how it works.

How it works:

==============

1. So consider in the CUSTOMERS table, we have 10000 Distinct CUSTOMER_ID values.

2. We *may or may not* have corresponding child records for all the 10000 distinct CUSTOMER_ID in the ORDERS table and the ORDERITEMS table.

3. Take for example we only have rows for 3 CUSTOMER_ID s in ORDERS and ORDERITEMS tables, out of 10000 distinct CUSTOMER_ID s from the CUSTOMERS table.

4. Out of the 3 CUSTOMER_ID s, let say 1st CUSTOMER_ID has 12000 child rows in ORDERS table and 15000 child rows in ORDERITEMS table.

5. Now that 1st CUSTOMER_ID with all those child record changes(12000 child rows in ORDERS table and 15000 child rows in ORDERITEMS table) is replicated as a single logical unit.

6. Out of the 3 CUSTOMER_ID s, Let us say that the 2nd Customer_ID has 6000 child rows in ORDERS and some 5000 child rows in ORDERITEMS table. This again is replicated as a single logical unit.

7. Consider that while processing 3rd Customer_ID with X child rows in ORDERS and some X child rows in ORDERITEMS table, we stopped the synch or connection drops.

8. Then the rollback happens only for the 3rd Customer_ID along with X child rows in ORDERS and some X child rows in ORDERITEMS table(which is still considered as a logical unit).

9. It does not affect the 1st CUSTOMER_ID and 2nd Customer_ID which are already processed(which in turn has already affected 18000 rows(12000+6000) in ORDERS table and 20000 rows(15000+5000) in ORDERITEMS table.

So only 2 CUSTOMER_IDs from CUSTOMERS table are replicated as a part of logical unit and remaining 1614 rows which were updated on CUSTOMERS table never were replicated as a part of logical records because they never had a corresponding child row in the other two tables (they got replicated to the subscriber but not as a part of logical unit) .

Thanks

Durai Murugan
Support Engineer, Microsoft SQL Server Support

Reviewed by:
Rishi Maini
Support Escalation Engineer, Microsoft SQL Server Support