Point-in-time restore and knowing where to stop

 

In some disaster recovery scenarios, a database needs to be restored to a point in time before a particular operation has occurred, e.g. before a user mistakenly dropped a table, ran a DELETE or UPDATE with no WHERE clause, etc. If the database is in the full recovery mode and the log backup chain is not broken, then this is can be achieved by using the RESTORE LOG command with the STOPAT clause. The STOPAT clause is used to specify a point in time when the recovery should stop during log restore. In this case, this would be the time immediately before the transaction containing the erroneous DROP TABLE command is committed. The problem is, frequently you do not know the specific point in time when the user made the mistake. On one hand, to minimize data loss, you need to stop the recovery immediately before the transaction commit. On the other hand, if you overshoot that point in time when restoring the log, you need to start the restore sequence from scratch, restoring the latest full backup, differential backups (if any), and a set of transaction log backups. This may have to be repeated several times, until you arrive at the exact STOPAT time by trial and error. Needless to say, this approach may not be realistic, particularly for large databases.

Luckily, in many cases there is a way to determine the precise point where the restore should stop. This approach may be difficult to implement in some cases, particularly on systems with a lot of write activity, but when it does work, it can save you a lot of time during disaster recovery, while minimizing data loss.

I am going to assume that the transaction log containing the transaction with the user error operation has been backed up – if that is not the case, you will need to perform the tail-log backup before starting with the steps below.

The approach is to use the undocumented fn_dump_dblog() table-valued function, which provides a rowset over the contents of a transaction log backup file (I described the function in an earlier blog post). Each row in the function’s result set represents a log record in the log backup file specified as the function argument. The first column in the result set is named Current LSN, and represents the Log Sequence Number of each log record in the backup file. The RESTORE LOG command has the STOPBEFOREMARK clause, which similarly to the STOPAT clause allows the recovery to be stopped immediately before a particular LSN (in SQL Server 2005 and later). Therefore, if we can determine the LSN associated with the commit of the transaction that included the erroneous command, we can specify that value in the STOPBEFOREMARK clause, and thus restore the database to a point in time immediately before the disaster occurred. The difficulty here is that it may not be easy or even possible to find the erroneous command in the large result set returned by the fn_dump_dblog() function, particularly if the database had a lot of write activity when the erroneous command was run.

The sample script below demonstrates the approach.

-- Create test database, switch it to full recovery model

CREATE DATABASE PointInTimeRestore;
GO

USE PointInTimeRestore;
GO

ALTER DATABASE PointInTimeRestore SET RECOVERY FULL;
GO

— Create the initial full backup
BACKUP DATABASE PointInTimeRestore
TO DISK = ‘C:\MSSQLData\MSSQL10.SS2008\MSSQL\Backup\PointInTimeRestore.bak’;
GO

— Create test table, insert sample data

CREATE TABLE Table1
(
Col1 INT PRIMARY KEY
);
GO

INSERT INTO Table1 (Col1)
VALUES (1), (2), (3);
GO

— Simulate user error and drop the table
DROP TABLE Table1;
GO

— Backup the log containing the erroneous command
BACKUP LOG PointInTimeRestore
TO DISK = ‘C:\MSSQLData\MSSQL10.SS2008\MSSQL\Backup\PointInTimeRestore.trn’;
GO

— Change database context to allow restore of the test database
USE tempdb;
GO

— Restore the database, but do not run recovery to be able to restore transaction logs
RESTORE DATABASE PointInTimeRestore
FROM DISK = ‘C:\MSSQLData\MSSQL10.SS2008\MSSQL\Backup\PointInTimeRestore.bak’
WITH REPLACE, NORECOVERY;
GO

— Examine log backup contents to find the LSN corresponding to the DROP TABLE transaction
SELECT *
FROM fn_dump_dblog
(
DEFAULT, DEFAULT, DEFAULT, DEFAULT,
‘C:\MSSQLData\MSSQL10.SS2008\MSSQL\Backup\PointInTimeRestore.trn’,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT
);
GO

/*
In this case, the log record corresponding to the commit of the transaction
that dropped the Table1 table can be easily found in the result set – the
Operation column for the row is LOP_COMMIT_XACT, and the value in the
Transaction ID column corresponds to the value in the Transaction ID column for
an earlier row with Operation LOP_BEGIN_XACT and Transaction Name DROPOBJ.
We also happen to know that this is the only table that was dropped at the time
in the End Time column.
The LSN in the Current LSN column is 0000002a:00000074:0036.
*/

/*
Restore log to the point in time just before the table was dropped.
Note that we need to prepend 0x to the LSN value –
will get the “The named mark does not identify a valid LSN” an error otherwise.
*/
RESTORE LOG PointInTimeRestore
FROM DISK = ‘C:\MSSQLData\MSSQL10.SS2008\MSSQL\Backup\PointInTimeRestore.trn’
WITH STOPBEFOREMARK = ‘lsn:0x0000002a:00000074:0036’;
GO

— Confirm that the table is again present in the database
USE PointInTimeRestore;
GO

SELECT *
FROM Table1;

© 2019 Microsoft. All rights reserved.

Adding the IDENTITY property to a column of an existing table

 

Until SQL Server 2005, it was not possible to alter a column in an existing table to add the IDENTITY property. To achieve that, it was necessary to create a new table with an IDENTITY column, and move the data into that table. For large tables, this could be problematic.

With the introduction of table partitioning in SQL Server 2005, there is a neat solution to this problem. As described in Transferring Data Efficiently by Using Partition Switching, the schemas of the source and destination tables in the ALTER TABLE … SWITCH statement have to match exactly (loosely speaking – the specific requirements are quite detailed). However, and that’s the key part here, the topic says that “The IDENTITY property is not considered.” In other words, the target table can have the IDENTITY property on a column, even though the source table does not have it. Therefore, we can switch the data from the source table into the target table, which is a very fast metadata-only operation, and gain the IDENTITY property without having to physically move data from one table to another. The target table and associated constraints can then be renamed to match the original source table.

Here’s a script that demonstrates this approach:

/*
A table without an IDENTITY column.
We want to add the IDENTITY property to the Col1 column
*/
CREATE TABLE AddIdentity (
Col1 INT NOT NULL,
Col2 VARCHAR(10) NOT NULL,
CONSTRAINT pkAddIdentity PRIMARY KEY (Col1)
);

/*
A temporary table, with the schema identical to the AddIdentity table,
except that the Col1 column has the IDENTITY property
*/
CREATE TABLE AddIdentityTemp (
Col1 INT NOT NULL IDENTITY(1,1),
Col2 VARCHAR(10) NOT NULL,
CONSTRAINT pkAddIdentityTemp PRIMARY KEY (Col1)
);

— Insert test data
INSERT INTO AddIdentity (Col1Col2)
VALUES (1‘a’);

— Switch data into temporary table
ALTER TABLE AddIdentity SWITCH TO AddIdentityTemp;

— Look at the switched data
SELECT Col1Col2
FROM AddIdentityTemp;

— Drop the original table, which is now empty
DROP TABLE AddIdentity;

— Rename the temporary table, and all constraints, to match the original table
EXEC sp_rename ‘AddIdentityTemp’‘AddIdentity’‘OBJECT’;
EXEC sp_rename ‘pkAddIdentityTemp’‘pkAddIdentity’‘OBJECT’;

— Reseed the IDENTITY property to match the maximum value in Col1
DBCC CHECKIDENT (AddIdentityRESEED);

— Insert test data
INSERT INTO AddIdentity (Col2)
VALUES (‘b’);

— Confirm that a new IDENTITY value has been generated
SELECT Col1Col2
FROM AddIdentity;

© 2019 Microsoft. All rights reserved.

Tempdb configuration check script

 

There is a number of well known best practices for configuring the tempdb database on SQL Server. Specifically:

  • Multiple data files are used (some sources recommend 1 data file per CPU core, while others recommend anywhere from 1/4 to 1 file per CPU core)
  • All data files have equal size
  • All data files have equal maximum size
  • All data files have equal growth increment
  • No data files have percent growth increment

Note that the multiple data file recommendation is usually presented in terms of the number of files per CPU core. So how do we determine the number of CPU cores to use in this calculation? In the general case, it would be incorrect to use the total number of cores in the machine, because not every core may be in use by SQL Server due to a non-default CPU affinity mask. Ultimately, what matters here is not so much the number of cores, but the number of scheduler threads. The reasoning behind this best practice is to provide a separate data file for each scheduler thread, so that multiple simultaneous tempdb space allocation requests can use separate data files, thus reducing space allocation contention.

Starting from SQL Server 2005, the number of schedulers for a SQL Server instance can be easily found from the scheduler_count column in sys.dm_os_sys_info DMV. This is the value used in the query below to determine if the multiple data file recommendation is followed. The specific rule I’m using here is between 1/2 and 1 data file per scheduler. If needed, it is trivial to change the query for a different definition of this best practice.

Here is a query that returns a single row result set showing if tempdb is configured according to these best practices. This can be run against multiple instances using a multi-server query in SSMS 2008, to quickly find out if tempdb is configured according to best practices across the enterprise.

WITH
TempdbDataFile AS
(
SELECT  size,
max_size,
growth,
is_percent_growth,
AVG(CAST(size AS decimal(18,4))) OVER() AS AvgSize,
AVG(CAST(max_size AS decimal(18,4))) OVER() AS AvgMaxSize,
AVG(CAST(growth AS decimal(18,4))) OVER() AS AvgGrowth
FROM tempdb.sys.database_files
WHERE   type_desc 'ROWS'
AND
state_desc 'ONLINE'
)
SELECT  CASE WHEN (SELECT scheduler_count FROM sys.dm_os_sys_info)
BETWEEN 
COUNT(1)
AND 
COUNT(1) * 2
THEN 'YES'
ELSE 'NO'
END
AS 
MultipleDataFiles,
CASE SUM(CASE size WHEN AvgSize THEN ELSE END)
WHEN COUNT(1THEN 'YES'
ELSE 'NO'
END AS EqualSize,
CASE SUM(CASE max_size WHEN AvgMaxSize THEN ELSE END)
WHEN COUNT(1THEN 'YES'
ELSE 'NO'
END AS EqualMaxSize,
CASE SUM(CASE growth WHEN AvgGrowth THEN ELSE END)
WHEN COUNT(1THEN 'YES'
ELSE 'NO'
END AS EqualGrowth,
CASE SUM(CAST(is_percent_growth AS smallint))
WHEN THEN 'YES'
ELSE 'NO'
END AS NoFilesWithPercentGrowth
FROM TempdbDataFile;

© 2019 Microsoft. All rights reserved.