SQL Server changes behavior of the file autolocation invoked during the planning phase of a full restore with REPLACE clause

A colleague reported the following concern from one of the customers he works with:

We just ran into a small issue with our 2012 upgrades. In older versions of SQL, a RESTORE DATABASE command using the “WITH REPLACE” option worked as long as the logical file names matched even if the file paths were different. Now that doesn’t seem to work.

Example:

Database TestDB on SERVER1 has two files, whose logical names are TestDB_Data1 (for which the physical path is C:\MSSQL\TestDB_Data_1.mdf) and TestDB_log_1 (whose physical path is C:\MSSQL\TestDB_log_1.ldf)

Database TestDB on SERVER2 also has two files whose logical names match those of the database with the same name on SERVER1. However, in this second server, the physical paths do not match: D:\MSSQL\TestDB_Data_1.mdf for the data file, and D:\MSSQL\TestDB_log_1.ldf for the log file.

In SQL 2008 and before, I could restore TestDB from SERVER1 to SERVER2 and vice versa using the “with REPLACE” option.

In SQL 2012 I get an error saying that the paths can’t be found and the error references the path of the DB files where the DB was backed up not the paths of the local database I am replacing.

I don’t know if you’ve noticed, but in SQL Server (2005, 2008, and 2008 R2), when you do a full restore on top of an existing database, if you specify the REPLACE clause, and do not specify the MOVE clause to explicitly indicate the path where you want to lay down those files, SQL Server will decide where to put the files following this criteria:

  1. For those files whose logical names in the existing database (the one you are trying to overwrite with the restore) match those of the database whose full backup you are restoring, the backup will use the physical filename associated to the files of the existing database.
  2. If, on the other hand, the logical name doesn’t match, then the physical filename stored in the backupset will be used instead.

Let me illustrate it with an example (make sure paths c:\temp\data and c:\temp\backup exists in the machine where your instance of SQL is running before you try this script):

IF EXISTS (select * from sys.databases where name = 'OriginalDatabase')
DROP DATABASE OriginalDatabase
GO
CREATE DATABASE OriginalDatabase
ON PRIMARY
( NAME = N'FirstDataFile', FILENAME = N'C:\temp\data\OriginalDatabase\FDFile.mdf' , SIZE = 4096KB , FILEGROWTH = 1024KB ),
( NAME = N'SecondDataFile', FILENAME = N'C:\temp\data\OriginalDatabase\SDFile.mdf' , SIZE = 4096KB , FILEGROWTH = 1024KB )
LOG ON
( NAME = N'FirstTransactionLogFile', FILENAME = N'C:\temp\data\OriginalDatabase\FTLFile.ldf' , SIZE = 1024KB , FILEGROWTH = 10%),
( NAME = N'SecondTransactionLogFile', FILENAME = N'C:\temp\data\OriginalDatabase\STLFile.ldf' , SIZE = 1024KB , FILEGROWTH = 10%)
GO

After running that first part of the script, this is the situation in our two working directories:

a

Now we run a full backup of that database and then we drop it:

BACKUP DATABASE OriginalDatabase TO DISK='c:\temp\backup\OriginalDatabase.bak' WITH FORMAT, INIT
GO
IF EXISTS (select * from sys.databases where name = 'OriginalDatabase')
DROP DATABASE OriginalDatabase
GO

So now, both directories are empty:

b

Now we create a new database with an even different name. One of the data files of this new database has an exactly matching logical name (SecondDataFile) with one of the data files in the original database. However, its physical filename is completely different (the location varies and the name of the file varies too). Same circumstance applies to one of the transaction log files (FirstTransactionLogFile):

CREATE DATABASE NewDatabase
ON PRIMARY
( NAME = N'StartingDataFile', FILENAME = N'C:\temp\data\NewDatabase\XXX_StDFile.mdf' , SIZE = 4096KB , FILEGROWTH = 1024KB ),
( NAME = N'AnotherDataFile', FILENAME = N'C:\temp\data\NewDatabase\XXX_ADFile.mdf' , SIZE = 4096KB , FILEGROWTH = 1024KB ),
( NAME = N'SecondDataFile', FILENAME = N'C:\temp\data\NewDatabase\XXX_SeDFile.mdf' , SIZE = 4096KB , FILEGROWTH = 1024KB )
LOG ON
( NAME = N'FirstTransactionLogFile', FILENAME = N'C:\temp\data\NewDatabase\XXX_FTLFile.ldf' , SIZE = 1024KB , FILEGROWTH = 10%),
( NAME = N'AnotherTransactionLogFile', FILENAME = N'C:\temp\data\NewDatabase\XXX_ATLFile.ldf' , SIZE = 1024KB , FILEGROWTH = 10%),
( NAME = N'LastTransactionLogFile', FILENAME = N'C:\temp\data\NewDatabase\XXX_LTLFile.ldf' , SIZE = 1024KB , FILEGROWTH = 10%)
GO

After having created this database, this is the aspect of our two working directories:

c

And now we try to restore the backup of our original database on top of the new one we just created. Since the family guid of both databases do not match because they come from different families (i.e. one was not created from a backup of the other, neither one was an attached copy of the other), we are forced to use the REPLACE clause so that SQL doesn’t fail.

RESTORE DATABASE NewDatabase FROM DISK = 'c:\temp\backup\OriginalDatabase.bak' WITH REPLACE
GO

After the restore command completes, this is how our directories look like:

d

Notice that for those files whose logical names matched between the existing database and the ones present in the original database at the time the full backup was taken, the physical file location, name and extension SQL has chosen is that specified in the existing database (NewDatabase). While for those files in the backupset whose logical filenames didn’t have a match in the existing database, the physical location, file name and extension was extracted from the backupset.

Now, if we repeat these steps against an instance of SQL Server 2012 the results after having completed every step are identical, except after completing the RESTORE DATABASE, in which case the layout of the files looks like this:

g

It seems (and that’s the case actually) that the physical location, name and extension of the restored files were all extracted from the backupset.

The problem you may encounter, as the customer who reported this did, is that you could be relying on the way this undocumented file autolocation functionality worked when restoring full database backups using the REPLACE clause, to repeatedly restore backups of many databases from production servers onto preproduction servers whose disk layouts didn’t match. That would save you from remapping the physical properties of each file through the MOVE clause of the backup over and over.

Beginning with SQL Server 2012, this starts throwing errors in cases where the paths that existed in production didn’t even exist in preproduction. Therefore, when SQL tries to create the file in the physical location stored in the backupset (i.e. the original location in the production environment), the restore fails because it is not able to encounter such path.

But, what is the reason why this changed in SQL Server 2012? Well, let me describe another scenario and see how it worked in pre-2012 version of SQL Server. To do so, we will use the following script:

if exists(select * from sysdatabases where name = 'OriginalDatabase')
drop database OriginalDatabase
go
CREATE DATABASE OriginalDatabase ON
(name=OriginalDatabase0, filename='C:\temp\data\OriginalDatabase\OriginalDatabase0.mdf'),
(name=OriginalDatabase1, filename='C:\temp\data\OriginalDatabase\OriginalDatabase1.mdf'),
(name=OriginalDatabase2, filename='C:\temp\data\OriginalDatabase\OriginalDatabase2.mdf')
LOG ON (name=OriginalDatabase3, filename='C:\temp\data\OriginalDatabase\OriginalDatabase3.ldf')
go
BACKUP DATABASE OriginalDatabase TO DISK='OriginalDatabase.bak' WITH INIT, FORMAT
go
ALTER DATABASE OriginalDatabase REMOVE FILE OriginalDatabase1
go
BACKUP LOG OriginalDatabase TO DISK='OriginalDatabase.bak'
go
ALTER DATABASE OriginalDatabase ADD FILE
(NAME = OriginalDatabase1, FILENAME = 'C:\temp\data\OriginalDatabase\OriginalDatabase1_new.mdf') TO FILEGROUP [PRIMARY]
go
BACKUP LOG OriginalDatabase TO DISK='OriginalDatabase.bak'
go
ALTER DATABASE OriginalDatabase REMOVE FILE OriginalDatabase2
go
BACKUP LOG OriginalDatabase TO DISK='OriginalDatabase.bak'
go
ALTER DATABASE OriginalDatabase ADD FILE
(NAME = OriginalDatabase2, FILENAME = 'C:\temp\data\OriginalDatabase\OriginalDatabase2_new.mdf') TO FILEGROUP [PRIMARY]
go
BACKUP LOG OriginalDatabase TO DISK='OriginalDatabase.bak'
go

After having created the database, there were the following four files:

C:\temp\data\OriginalDatabase\OriginalDatabase0.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase1.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase2.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase3.ldf

By the time the second BACKUP LOG is run, the layout of the database files is like this:

C:\temp\data\OriginalDatabase\OriginalDatabase0.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase1_new.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase2.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase3.ldf

And, by the time the fourth BACKUP LOG is taken and the script completed, this is the layout:

C:\temp\data\OriginalDatabase\OriginalDatabase0.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase1_new.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase2_new.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase3.ldf

If, on top of the existing database, we restore the first backupset (the one containing the full database backup), followed by the second and third backupsets (corresponding to first and second log backups):

RESTORE DATABASE OriginalDatabase FROM DISK='OriginalDatabase.bak' WITH REPLACE, FILE=1, NORECOVERY
go
RESTORE log OriginalDatabase FROM DISK='OriginalDatabase.bak' WITH FILE=2, NORECOVERY
go
RESTORE log OriginalDatabase FROM DISK='OriginalDatabase.bak' WITH FILE=3, NORECOVERY
go
RESTORE DATABASE OriginalDatabase WITH RECOVERY
go

One expects that the resulting file layout would be like this:

C:\temp\data\OriginalDatabase\OriginalDatabase0.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase1_new.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase2.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase3.ldf

However, because of the way the file autolocation functionality used to work in pre-2012 versions of SQL Server, it ends up leaving the following set of files:

C:\temp\data\OriginalDatabase\OriginalDatabase0.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase1_new.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase2_new.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase3.ldf

If we repeat the exercise from the beginning, but before running the restore commands we drop the database, the result is quite different. And it is exactly what we would expect it to be, even when we don’t drop the database:

C:\temp\data\OriginalDatabase\OriginalDatabase0.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase1_new.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase2.mdf
C:\temp\data\OriginalDatabase\OriginalDatabase3.ldf

In SQL Server 2012, during the restore of a full backup without MOVE clauses that indicate the target physical location of each file, on top of an existing database, the file autolocation algorithm requires: not only a logical name match between the file declared in the backupset and any given file in the existing database, but also their file GUID, file ID, and create LSN must match. If any of those attributes don’t match, the physical location used for that logical file is the one stored in the backupset. This additional requirement was implemented to fix this last scenario I just exposed which was considered a defect.

I wonder how many of you could also be negatively affected by this change. If that’s your case, please let me know through a comment to this post.