Backup Compression and Checksum

Notice that when I wrote this repro and captured the screenshots available below, I didn’t think they will be rendered in this blog. Unfortunately, you will have to click on each image to be able to read the descriptions of every relevant chunk of data highlighted in the raw data stream.

A while ago Paul asked me about this topic and the confusion it tends to cause to many people. Since I received this question again recently from another PFE at Microsoft, I decided to share this info through my blog so that more people can benefit from it.

Backup compression does a checksum over the whole backup (but doesn't test page checksums). So where's the checksum over the whole backup stored? And how can it be tested? Or is BOL incorrect?

And this is what I found:

The checksums calculated to satisfy the presence of the CHECKSUM clause are enabled on a per backupset basis and are persisted in this way:

clip_image002

clip_image002[6]

For the computation of that checksum, page checksums (when present) are leveraged so that the operation completes faster. If for a particular page, its header reveals that page checksum is not calculated, then it will be computed by the backup operation, but won’t modify the page header at all.

On the other hand, when we use backup compression that is only possible if the mediaset is formatted to support compression. Meaning it affects everything stored in that given mediaset. The checksums used in compression are calculated differently, over a different data set and also stored in a different way.

clip_image002[8]

With that explained, next question was: So how can the checksum in a compressed backup (not using WITH CHECKSUM) actually be checked without restoring?

And the answer is that if you want to check the checksums created for the compressed blocks, you just have to issue a RESTORE VERIFYONLY and that will check all the checksums in any compressed block from the beginning of the mediaset to the end of the backupset being verified.