How It Works: Shapshot Database (Replica) Dirty Page Copy Behavior (NewPage)

I ran into an issue last week that I found interesting and thought you might find it interesting as well.

The primary database had a replica (snapshot database) established and the issue reported was that the replica was acquiring additional space when only new data was added to the primary database.  The understanding was that only data that existed at the time of the snapshot would need to be copied to the replica if changed (dirtied) on the primary database.

The design of the replica is to establish copy on write behavior.   When a page is going to be modified(dirtied) the current image is written to the replica before the primary data page can be modified.  

Among the work to acquire more space (a new page) for an object SQL Server uses the NewPage routine.  The job of NewPage is to format a page that has never been used in the database or one that was used and is being reallocated (delete and insert).   

The first action that takes place in NewPage is to check to see if this is a primary database file that has replica(s).  If a replica is present the data is read from disk (as is) and written to the replica.

New Page - Reallocated

When a table is truncated or other de-allocation occurs the page itself often does not need a physical change.   The physical changes required occurs on the allocation pages (GAM. SGAM, IAM ...).  

If you truncate a table the allocation pages are modified and handled with the copy on write behavior.  The data pages are not modified and remain intact.   At this point the primary database shows the pages as unallocated and the replica(s) show the pages as allocated.

Selecting data from the table via the snapshot database retrieves these pages from the primary files, as the data has not been changed and remains intact on disk for the primary file.   This avoids the need to copy an entire table's data when an action such as a truncate occurs.

When the page is reallocated the NewPage method moves the current copy of the data (from disk of the primary) to the replica(s) before the page is formatted and reused.

New Page - Never Used

If the page was never used by any object since the database was created it is copied to the replica(s).   Simply stated the current design is not optimized to understand if the page was allocated at the time the replica(s) was created and if not avoid the copy.   This is a way that adding new data to the primary can copy pages to the replica(s) that you may not have expected.

In the future this behavior may be modified to use the allocation information of the replica(s) to avoid a unallocated copy action.

Keys:   Allocation of a page requires a read of the page from the primary and a write of that data to one or more replica(s).

Bob Dorr
SQL Server Senior Escalation Engineer