Performing Transaction Log Backups using AlwaysOn Availability Group Read-Only Secondary Replicas - Part 1

This is the first post in a planned series of articles relating to SQL Server Transaction Log Backups of databases using a read-only secondary replica (ROSR).  With the introduction of ROSRs in SQL Server 2012, administrators can now perform full database (copy only) backups as well as transaction log backups on one of the secondary replicas within the same Availability Group.  This allows the administrator to reduce or eliminate resource contention between production activity and backups.  This series of posts will address some of the aspects of transaction log backups that may not be obvious, especially as it pertains to how the log truncation LSN is maintained across all of the replicas (primary and secondary), as well as what can actually be backed up.    For example, if a transaction log backup is taken on an asynchronous replica that is significantly behind the primary replica, what will actually be backed up and how will that affect the log chain?   In addition, since the log backups can be taken from either the primary replica or any secondary replica, how will the log chain be maintained?   Can log backups be taken on more than one database replica at the same time?    This series of posts will attempt to address these and other questions and full explain how this new feature will enhance the DBAs capabilities in maintaining databases in an AlwaysOn environment.

Major themes in this series

  • Explain what is a Log Sequence Number (LSN)
  • Describe the log backup process on a ROSR
  • Various Transaction Log Backup Scenarios
  • Communication must exist between primary &
    secondary to take backups on an ROSR
  • Some common backup errors on ROSR

What is a Log Sequence Number (LSN)?

When troubleshooting AlwaysOn Availability Groups, there are several DMVs that refer to various LSNs (log sequence numbers).  In addition, backup and restore operations frequently refer to and output LSN values.

So what is an LSN?  An LSN, is a sequence number that is used to maintain order and track operations within a database.  An LSN can be in either decimal form or hex depending on where it is referenced.  Most DMVs reference the LSN in decimal format.  The decimal format of the LSN is composed of three concatenated
pieces of information. Those three pieces are (from left to right):

 

  • VLF Sequence Number:  Up to 10 digit number (4 bytes) without leading zeros.  (VLF = virtual log file.  Refer to Transaction Log Physical Architecture for more information).
  • Log Block Number:  10 digit decimal number (4 bytes) padded with leading zeros.
  • Log Record Number: 5 digit decimal number (2 bytes) with leading zeros

 

The hexadecimal format is simply the hexadecimal representation of each portion (VLF, Log Block number, and Log Record number) separated by colons (:).    For example:  34 in decimal is 22 in Hex, 156 in decimal is 9C in hex, and 65 in decimal is 41 in hex.   So the LSN represented by 34000000015600065 in decimal would appear as 22:9C:41 in hexadecimal.

 

What is a Log Block?

In the previous section, the definition of an LSN was mention as a combination of the VLF (virtual log file), the Log Block number, and the Log Record number.  What is a “Lob Block” and what is a “Log Record”?  For the purposes of this article, a log block is a collection of log records that are written to disk together.  A “log record” is a single atomic record of activity within the SQL engine.   The relationship between “log record”, “log block”, and Virtual Log File (VLF) is as follows:

  • A VLF is comprised of 1 or more log blocks each of which is an integer multiple of 512 bytes, but no more than 60KB total in size.
  • A log block is comprised 1 or more log records.  A log block can be compared with a data page within the data file, containing a header with meta-data information, slot arrays, the “data” (the log records), and beginning with Microsoft SQL Server 2005, a checksum.
  • Finally, a log record is the smallest unit within the transaction log file, and contains the information recorded for activity within the database.   The next diagram shows the relationship between the physical log file, the Virtual Log File(s), Log Block(s), and Log Record(s).

 

 

Now it is time to look at one important DMV for reviewing the current status of a database replica.  How this pertains to database backups will be more clear in part two of this series, “Describe the Log Backup Process on a ROSR”.  Right now, there is plenty to look at in this DMV.

 

sys.dm_hadr_database_replica_states

The DMV, sys.dm_hadr_database_replica_states, has several columns that have “LSN” as part of their name that are helpful in troubleshooting issues dealing with AlwaysOn replication.  Even though these fields have “lsn” in their name, in fact, they are not all true LSNs.  Of the ones listed in the table above, only the last_redone_lsn and end_of_log_lsn are true LSNs.  The others are actually Log Block IDs.

The following diagram depicts the relationship between these “LSNs” values when looking at the Primary Replica and a corresponding Secondary replica.   Points “B”, “C”, and “E” point to the log blocks themselves representing the fact those values are Log Block IDs.  In contrast, points “A” and “D” point to log records and represent true LSNs.

 

 

As noted in the blog, AlwaysON - HADRON Learning Series: - How does AlwaysON Process a Synchronous Commit Request, AlwaysOn does not ship individual log records, it ships log blocks.  The log block is used as a container for log records so by virtue of the log block tracking the individual LSN tracking also occurs.  Additionally, it is important to note that last_hardened_lsn is a “flush LSN” meaning that it represents the START of the NEXT block past the blocks already stored on stable media.   In other words, an LSN value that is less than last_hardened_lsn has been saved to stable media.   Any LSN greater than or equal to the last_hardened_lsn value has not been flushed.

Combining everything that has been discussed so far, we can now query sys.dm_hadr_database_replica_states and begin to analyze the information obtained.

On an instance that is acting as a primary replica,execute the following query:

 SELECT ar.replica_server_name as ServerName,
 drs.synchronization_state_desc as SyncState,
 drs.last_hardened_lsn, drs.last_redone_lsn
FROM sys.dm_hadr_database_replica_states drs
LEFT JOIN sys.availability_replicas ar 
 ON drs.replica_id = ar.replica_id
ORDER BY ServerName
 

 

You should see results similar to the following:

In this results set, there are several factors that can be determined.

First, LC2SQLAOLB1 is acting as the Primary Replica.   We know this because the last_redone_lsn is NULL.   REDO operations only take place on secondary replicas.   (We could have also determined this by joining with other DMVs to show its role clearly defined.

Second, looking at the last_hardened_lsn values for all three servers, we see that they are identical.  In order for the Sync State to be “SYNCHRONIZED”, both the primary and secondary must be configured for “SYNCHRONOUS COMMIT” and the commit policy is in a “WaitForHarden” meaning the secondary has notified the primary that it has hardened to the end of the log and is “caught up”.  Visually from DMVs however, we typically look to see that last_hardened_lsn is identical – although depending on how much activity there is and whether the DMV has all of the latest acknowledgments from the secondary, it is possible for the secondary to not be “exactly” the same when querying the DMV.

On the other hand, LC2SQLAOLB3 is configured for asynchronous commit, so even though its last_hardened_lsn is identical to the primary replica, its state will never be SYNCHRONIZED, but SYNCHRONIZING.  Put in other words, even though LC2SQLAOLB3 has all of the log records as the primary, it will never be in a SYNCHRONIZED state.

Finally, comparing the values last_redone_lsn and last_hardened_lsn we see that the last_redone_lsn is less than the last_hardened_lsn.  From our earlier discussions on the various pieces of an LSN, we can see that the last_hardened_lsn (block ID: 0000066325) is one block ahead of the last_redone_lsn ( 0000066324).
Remember, by definition, the last_hardened_lsn is a “flush lsn” and points to the next block following what has been written to stable media.

 

In Summary

This article has attempted to explain several foundation points that will make subsequent study of log backups on secondary replicas easier to understand.  The LSN is comprised of three pieces concatenated together:

  • Virtual Log File sequence number
  • Log Block ID number
  • Log Record number

 

When executed on the primary, the DMV, sys.dm_hadr_database_replica_states can be used to show the status of the various secondary replicas.   In addition, in order for a secondary replica to be SYNCHRONIZED it must be configured for synchronous commit, and the last_hardened_lsn must be the

same as the primary.   And for a secondary replica that is configured for asynchronous commit, it will never be in a SYNCHRONIZED state even if it has the identical last_hardened_lsn value.

Finally it was demonstrated that even though the secondary replica LC2SQLAOLB2 was SYNCHRONIZED, the last_redone_lsn was one log block ID less than the last_hardened_lsn.   This is because the last_hardened_lsn is not a true LSN, but rather a “Flush LSN” which by definition, points to the next log block ID following those log records that have been written to stable media.

 

Coming Up: Part 2 - Describe the log backup process on an ROSR

The next post in this series goes into detail the process of taking log backups on an ROSR.  It will discuss the communication process between the primary and secondary (the messages sent back and forth), the farthest point that can be backed up on a secondary, the effects of the redo queue size on log backups as well as other factors relating to sync status.