SQL Server Resource database corruption–yes, it’s possible

It’s very rare that I run into an issue with the Resource database, and the one I ran into recently was rarer still. But before I get into the nitty-gritty of the issue, let us begin by outlining a few details about the resource database:

The Resource database

The resource database is a hidden system database, and cannot be accessed explicitly by the users. Also, as documented here, there is no way inside SQL Server to back up the resource DB. The only way to take a backup of the resource db is to make file level copies. This is something that you can do either manually or through VSS (disk level) backups.

Now, it’s not without reason that we do not have any way to take backups of the Resource database. A few salient points:

  • The resource DB is a read-only database
  • Outside of a hardware issue, there is no way for the resource db to get corrupted.

But what if there is a hardware problem, say, god forbid, your SAN crashes, or if there’s some sort of a “scribbler” issue with one of the hardware drivers (more details on that in a different post), and you end up with your resource database corrupted, what do you do? Here are the options, in order:

  1. The ideal way to get out of this situation is to restore the resource db files from file level backups. So if you’re reading about this database for the first time, the first thing you should do is to make file-level copies of the resource db files (or add them to the set of files you back-up using VSS backups). I would recommend taking backups of the resource db files immediately after the successful application of a hotfix/Service Pack/CU/Security Update.
  2. If you are in this situation already, and do not have a backup of your resource db files, do not despair. Simply take another server, install an instance with the same instance id and instance name as the target instance, and bring it to the same build as well. Once this is done, stop the SQL Service, copy the resource db files, and use them to replace the corrupted resource db files on the problem instance. Your SQL server service should come online now. I’ve tested this extensively on SQL 2008 and 2008 R2, and it indeed works.
  3. If this is a cluster, and you’re on SQL 2008 or later, you can try bringing SQL up on the second node. If the second node’s copy of the resource db files are not corrupted, you should be successful.

Now, allow me to explain why this special case described in bullet 3 exists:
In SQL 2005, the resource db was tied to the master database, and the resource db mdf and ldf files had to be in the same folder as the master db files, else your SQL Service would fail to start. In case of a cluster, the resource db resided on a clustered drive, and when the failover happened, the ownership of the resource database was passed to the second node. Since we had only one copy of the resource database to patch, we were able to patch all the nodes on the cluster in a single run in case of SQL 2005.

This behaviour changed from SQL 2008 onwards. In SQL 2008 and 2008 R2, the resource database is no longer tied to the master database, and exists in the Binn folder instead. So basically, the resource database is a part of the instance binaries from SQL 2008 onwards. This is why, in case of SQL 2008 and 2008 R2, you need to patch both the nodes separately (one by one). Makes sense? This is why I mentioned in point 3 above that if you are on a cluster and SQL is 2008 or later, there is a good chance you might be able to get SQL up on the other node, even if the resource db files on one node are corrupted.

As a last word, if you’re not sure how your resource db files came to be corrupted, please take it as a top priority to find the root cause behind the corruption, as this is definitely something that warrants further investigation.

If you have any interesting incidents to share w.r.t the resource database, please feel free to do so in the comments section.