Checksum in SQL2005

Page checksum is new feature in SQL2005 that provides you a stronger mechanism than torn-page to detect any corruptions in IO path. Here are some details and scenarios

 

PAGE CHECKSUM:

 

Between the time a database page is written to the disk and subsequently read by SQL Server, it may get corrupted for reasons that are outside the control of SQL Server. Some example reasons are a faulty I/O device and/or driver and an incomplete write of the page due to power loss. Checksum mechanism allows SQL Server to detect the database page corruption under such situations. Please note, checksum mechanism allows you to isolate I/O subsystem as the cause of page corruption and it does not provide any way to automatically fix it. As an administrator, you can use this information to identify and remove the faulty I/O device. Also, you can fix the existing page corruption(s) either by recreating the index(s) affected (if applicable) or by recovering the affected file(s).

Checksum can be enabled or disabled on a per database basis by using Alter Database command. When enabled, SQL Server computes the checksum of the page both when a page is written and when a page is read, assuming it has checksum on it. The SQL Server computes the checksum based on the bit pattern on the page, stores it in the page header and then issues an I/O to write the page. When the SQL Server reads the page, it re-computes the checksum using the same logic and then compares it with the value available in the page header. If the checksum value matches then it is assumes the page did not get corrupted during the write-read cycle. The algorithm to compute the checksum is complex and therefore expensive to compute. Since the cost of computing the checksum is incurred on each page read and write, it can add to the CPU overhead and can possibly impact the throughput of your workload. Another thing to keep in mind is that the checksum is not unique for a specific bit pattern on the page. Two pages can possibly map to the same checksum value. So there is remote possibility that page corruption may go undetected.

Since you can enable/disable checksum at a database level, following scenarios need further clarifications

· A database is enabled for checksum after it has been created:

The checksum are computed on the pages that are written only after the database has been enabled for checksum and re-computed when such pages are read back. Since the database was not initially created with checksum enabled, there will be pages in the database that have no checksums. When reading a page that has no checksum, there is no benefit of computing the checksum as there is no checksum to verify it against. So it is not done. Currently, there is no utility or command that you can use to force computing of checksums on all the pages in the database that don’t have one. The benefit of doing this is limited. Clearly, if the database page is already corrupt, the checksum will be computed based on this corrupt data and there will be no way to detect it. However, if such command/utility is available, we can detect the page corruption(s) due to bit rot (i.e. a valid page getting corrupted after it has been written to the disk) or can use this detect corruptions in database files when they are transferred or copied onto another media.

· Disabling database checksums:

You may want to disable checksum computation if you don’t want to pay the CPU overhead of computing checksums and/or you have determined that your I/O system is reliable. When you disable the checksum, SQL Server does not compute and write the checksum when the page is written or subsequently read. In this case, you are exposing SQL Server to potential un-detected external page corruption(s). An un-detected corruption(s) may lead

o To data-loss unless you are running under full recovery (assuming log backups are not corrupted).

o To potential incorrect application behavior if the corruption is such that the page is valid structurally but some value got changed.

o Reduced availability of your database as you may need to perform media recovery to correct the corruption(s).

· How do I verify checksum for all pages in my databases?

You can also run DBCC CHECKDB command, preferably with PHYSICAL_ONLY option for the low overhead consistency check, to verify the page checksum on all pages, eligible for such computation.

· Database has torn page detection enabled:

You can only enable a database either for checksum or for torn page detection but not both at the same time. Since the checksum is a stricter validation of possible page corruptions than torn page detection, so you automatically get the torn page detection barring the remote possibility of a corrupt torn page producing the same checksum. However, if you plan to turn off the page checksum, it is recommended that you turn on the torn page detection as it is a low cost mechanism to eliminate a subset of the I/O based page corruptions as detected by checksum.

 

BACKUP CHECKSUM:

This has been documented well in BOL. The following provides some additional clarifications:

 

Backup checksum uses same algorithm as used by page checksum. It is computed for each page that is backed up, but the page checksums of all pages is combined into one value (i.e. backup checksum). Backup checksum is stored on the backup media, not on database page(s). During backup, you can optionally choose to generate backup checksum which can then later be used during restore to validate that the backup is not corrupt. Additionally, the page checksum, if available, can be verified during backup and recovery