Tracking database recovery progress using information from DMV

Technorati Tags: Engine,SQL Server 2008

You must be very familiar with the database recovery related messages printed to the SQL Server Error log. These come in very handy when troubleshooting issues that are related to long recovery. These messages provide information about the stage of the recovery process and approximate time for completion.

2010-12-29 12:02:10.43 spid25s Starting up database 'testdb'.
2010-12-29 12:02:31.23 spid25s Recovery of database 'testdb' (11) is 0% complete (approximately 1725 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2010-12-29 12:03:08.94 spid25s Recovery of database 'testdb' (11) is 1% complete (approximately 1887 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2010-12-29 12:03:08.95 spid25s Recovery of database 'testdb' (11) is 1% complete (approximately 1887 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2010-12-29 12:04:57.97 spid25s Recovery of database 'testdb' (11) is 43% complete (approximately 192 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2010-12-29 12:04:57.97 spid25s 580056 transactions rolled forward in database 'testdb' (11). This is an informational message only. No user action is required.
2010-12-29 12:04:57.97 spid25s Recovery of database 'testdb' (11) is 43% complete (approximately 192 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2010-12-29 12:06:32.05 spid25s Recovery of database 'testdb' (11) is 99% complete (approximately 2 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2010-12-29 12:06:35.09 spid25s 1 transactions rolled back in database 'testdb' (11). This is an informational message only. No user action is required.
2010-12-29 12:06:35.09 spid25s Recovery is writing a checkpoint in database 'testdb' (11). This is an informational message only. No user action is required.
2010-12-29 12:06:35.44 spid25s Recovery completed for database testdb (database ID 11) in 244 second(s) (analysis 37849 ms, redo 109038 ms, undo 97146 ms.) This is an informational message only. No user action is required.

Starting with SQL Server 2008, you do not need to repeatedly open/refresh the SQL Server error log or execute the stored procedure sp_readerrorlog to get up to date information about the progress of the database recovery. Most of the information is readily available in the dynamic management views [DMV]. The 2 DMV’s that offer insights into the progress of database recovery are: sys.dm_exec_requests and sys.dm_tran_database_transactions. The information presented in these DMV’s vary depending upon the situation: recovery of databases during server startup, recovery of database after a attach operation, recovery of database after a restore operation.

Here is a view of sys.dm_exec_requests showing recovery related information: [I repeatedly queried this DMV and loaded it into a temporary table for later analysis]

RollBack status information from requests dmv

The key information here is the command type [DB STARTUP] and the session_id that indicates this is a system task performing the startup recovery. The percent_complete shows the same value that the error log messages indicate about the progress within the stage of recovery. You can use the information from other columns [wait information] to understand if the recovery is taking much longer due to an I/O issue or other problems.

Since the redo and undo portions can actually happen on different tasks in fast recovery scenarios in Enterprise Edition, you could actually see 2 different session_id’s for the same database recovery process.

In the case of attaching and restoring the database, this information will be reflected under the same session_id as the user session which executes those commands.

The DMV sys.dm_tran_database_transactions exposes information that can be useful to track the progress of the undo stage of the recovery. Here is a snapshot that corresponds to the above mentioned progress information from sys.dm_exec_requests.

RollBack status information from transactions dmv

The key information here is how the database_transaction_log_bytes_reserved keeps coming down as undo progresses. Also in the case where there are several transactions to undo, you will be able to see their progress using the database_transaction_next_undo_lsn.

Next time you encounter a really long recovery problem, try using some of these information points to understand if the recovery is progressing very slow or there is a lot of work to be done or if it completely stuck.

Suresh B. Kandoth

SQL Server Escalation Services

Other references:

https://blogs.msdn.com/b/psssql/archive/2008/01/23/how-it-works-what-is-restore-backup-doing.aspx

https://blogs.msdn.com/b/psssql/archive/2009/05/21/how-a-log-file-structure-can-affect-database-recovery-time.aspx

Here are some sample scripts I used to gather and report out the information I showed earlier:

-- create the tables to store the information you collect about recovery

USE <db_name>

GO

DROP TABLE [dbo].[tbl_recovery_tracking]

GO

DROP TABLE [dbo].[tbl_dm_tran_database_transactions]

GO

CREATE TABLE [dbo].[tbl_recovery_tracking](

      [runtime] [datetime] NOT NULL,

      [command] [nvarchar](256) NOT NULL,

      [session_id] [smallint] NOT NULL,

      [database_id] [smallint] NOT NULL,

      [total_elapsed_time] [int] NOT NULL,

      [percent_complete] [real] NOT NULL,

      [estimated_completion_time] [bigint] NOT NULL,

      [wait_resource] [nvarchar](256) NOT NULL,

      [wait_time] [int] NOT NULL,

      [wait_type] [nvarchar](60) NULL,

      [blocking_session_id] [smallint] NULL,

      [reads] [bigint] NOT NULL,

      [writes] [bigint] NOT NULL,

      [cpu_time] [int] NOT NULL

) ON [PRIMARY]

GO

CREATE TABLE [dbo].[tbl_dm_tran_database_transactions](

      [runtime] [datetime] NOT NULL,

      [transaction_id] [bigint] NOT NULL,

      [database_id] [int] NOT NULL,

      [database_transaction_log_record_count] [bigint] NOT NULL,

      [database_transaction_log_bytes_used] [bigint] NOT NULL,

      [database_transaction_log_bytes_reserved] [bigint] NOT NULL,

      [database_transaction_next_undo_lsn] [numeric](25, 0) NULL

) ON [PRIMARY]

GO

 

-- collect the information in a loop

WHILE 1 = 1

BEGIN

      INSERT INTO [dbo].[tbl_recovery_tracking]

      SELECT GETDATE() as runtime, command,

      session_id, database_id, total_elapsed_time,

      percent_complete, estimated_completion_time,

      wait_resource, wait_time, wait_type, blocking_session_id,

      reads, writes, cpu_time

      FROM sys.dm_exec_requests

      WHERE command = 'DB STARTUP' -- may need to change this if troubleshooting recovery as part of attach database or restore

 

      INSERT INTO tbl_dm_tran_database_transactions

      SELECT GETDATE() as runtime,

      transaction_id, database_id,

      database_transaction_log_record_count, database_transaction_log_bytes_used,

      database_transaction_log_bytes_reserved, database_transaction_next_undo_lsn

      FROM sys.dm_tran_database_transactions

     

      WAITFOR DELAY '00:00:01' -- change this capture interval

END

GO

-- after you collect information for some time, you can analyze the information to understand the progress of recovery

SELECT runtime, command,

      session_id, database_id, total_elapsed_time,

      percent_complete, estimated_completion_time,

      wait_resource, wait_time, wait_type, blocking_session_id,

      reads, writes, cpu_time

FROM [dbo].[tbl_recovery_tracking]

WHERE session_id = 25 -- change this

ORDER BY runtime

GO

SELECT

runtime, transaction_id, database_id,

database_transaction_log_record_count,

database_transaction_log_bytes_used, database_transaction_log_bytes_reserved,

database_transaction_next_undo_lsn

FROM tbl_dm_tran_database_transactions

WHERE database_id = 11 and transaction_id = 1452239 -- change this

ORDER BY runtime

GO