In the previous posts of this series, I covered the system table checks that have to be done before anything else can be checked by CHECKDB, and the allocation checks that have to be done before any of the logical checks. Now I want to cover the meat of the functionality in CHECKDB – the logical checks. Note – this is a description of what happens for SQL Server 2005 – its pretty similar in SQL Server 2000.
Primitive checks of critical system tables
Logical checks of all tables
Although this section has two titles, the actual logical checks performed in each stage are the same – i.e. everything I’m going to describe below. If any errors are found in the critical system tables (remember these from Part 1?) and:
- repair is not specified; or,
- repair is specified, but not all the errors can be repaired
- Validate each table’s storage engine metadata
- Read and check all data, index and text pages, depending on the page type
- Check all inter-page relationships
- Check the page header counts
- Perform any necessary repairs
- Loop through all the indexes, and then all the rowsets of each index, building a list of known allocation units for the index (including all the DATA, LOB, and SLOB allocation units). This allows reverse lookups – when we read a page, because the information stamped on the page is the allocation unit, we need a very fast way to convert between an allocation unit and an index/object ID.
- Make sure we skip any indexes that are in the middle of being built/rebuilt online, because they will not have a complete set of data.
- Build a list of computed columns and generate the necessary code to enable the column values to be recomputed.
- Build a list of columns that are used in non-clustered indexes
- Make sure the various USED, DATA, RSVD counts are not negative (see the BOL for DBCC CHECKDB for an explanation of how this can happen – yes, I’m deliberately not explaining here so that you read through the new BOL entry 🙂
- Figure out what each kind of column is (e.g. a regular column, a generated uniquifier, a dropped column)
- Build a series of mappings to allow conversion between column IDs at different levels of storage abstraction
- Check that the relational and storage-engine nullability flags for a column agree
- Make sure the columns counters in metadata match what we’ve just seen
- Records in a page must be strictly ordered by the defined keys of the index (although the records themselves aren’t necessarily stored in sorted order in the data portion of the page, accessing the records through the slot array must yield them in the correct order)
- No two records can have duplicate key values (remember that non-unique indexes have a hidden, automatically-generated uniquifier column added to the key – to make or extend the composite key – so that record uniqueness is guaranteed)
- If the index is partitioned, each record is run through the partitioning function to ensure its stored in the correct partition
- All the complex columns in each record are checked:
- Complex columns are those storing legacy text or LOB values (text, ntext, image, XML, nvarchar(max), varchar(max), varbinary(max)) or in-row pointers to variable length columns that have been pushed off-row in rows that are longer than 8060 bytes
- The column is checked to make sure its storing the right kind of data – either the value itself or a text pointer or some kind of in-row root containing pointers to portions of an off-row value.
- The linkages between what is stored in-row and the off-row values stored in other pages are eventually checked too
- Check computed columns
- If the column is persisted (either because its defined as a persisted computed column or because its used as a non-clustered index key), its value is recomputed and checked against the persisted value
- This is also important when we come to do the non-clustered index cross-checks – as any discrepancy in the stored column values will cause mismatches.
- Data purity checks
- The column value is checked to ensure its within the bounds for its data-type (e.g. the minutes-past-midnight portion of the internal representation of a datetime value cannot be greater than 1440 – 24 hours x 60 minutes)
- Non-clustered index cross-checks
- This is probably my favorite part of CHECKDB and is one of the most complicated bits of code.
- What CHECKDB is trying to do is make sure that each record in a heap or clustered index has exactly one matching record in each non-clustered index, and vice-versa. The brute-force (n-squared complexity) way to do this is to do a physical lookup of all the matching rows – but that’s incredibly time consuming so instead we have a fast algorithm to detect problems.
- Imagine a table defined by the following DDL:
CREATE TABLE t (c1 INT, c2 char(10), c3 varchar(max))
CREATE INDEX index1 ON t (c1)
CREATE INDEX index2 ON t (c2) INCLUDE (c3)
- Each row in the heap has to have two matching non-clustered index rows, on in each of index1 and index2. But how can we tell without doing the direct lookup or having to store tremendous amounts of state? We use a hash-bitmap algorithm.
- Imagine a large bitmap – say half a million bits. Initially all the bits are zero.
- For each record in the heap that we come across, we generate the matching non-clustered index records and hash them to a value. This is very quick because we have all the column values we need. We then take the hash value, map it directly to a bit in the bitmap and flip the bit. So in our example above, each record in the heap will produce two hash values (one for the record in each non-clustered index) and so will cause two bits to be flipped.
- For each record in each non-clustered index, just hash the record and flip the corresponding bit.
- The idea is that the bit-flips should cancel each other out and the bitmap should be left with all zeroes at the end of the checks.
- Taking the view that corruptions are magnitudes rarer than clean databases, we went a step further and allowed the checks for multiple tables to use the same bitmap. If you think about it, this won’t cause any problems, even if two sets of records map to the same bit in the bitmap – as long as the number of bit-flips is a power of 2 (i.e. each record really does have its correct matching record) then there should be no problem.
- Here’s the catch – what happens when there’s a bit left on in the bitmap? Well, this is where the trade-off comes into play. If there’s a bit left on, we can’t tell which records in which table or index mapped to it so, we have to re-scan the tables and indexs that used the bitmap to see which records map to the bit. For every one we find, we actually do the physical lookup of the matching record and then do a comparison of all the columns, including any LOB columns used as INCLUDE’d columns in non-clustered indexes. This process is called the deep-dive and can add a significant amount to the run-time of CHECKDB if it occurs.
- pages in heaps that have forwarding or forwarded records
- pages in indexes
- pages that have records with LOB columns that have their data stored off-row (these can be heap pages, clustered index data pages or non-clustered index leaf pages in SQL Server 2005)
- text pages that have records with child nodes on other pages
- All pages in an index level should be pointed to by a page in the next level higher in the b-tree, and also by the left- and right-hand neighboring pages in the same level of the b-tree. Exceptions are made for the left-most and right-most pages in a level (where the m_prevPage and m_nextPage fields are (0:0)) and for the root page of the b-tree, where the parent page link comes from the storage-engine metadata. The 8976 error message that I referenced above is generated from this set of checks.
- Key ranges in neighboring pages in a level should not overlap
- Key ranges of pages should be correctly defined by the parent pages in the next level up in the b-tree (the parent pages contain the minimum key value that can exist on a child page)
- slot count
- ghost record count
- free space count
Indexed view and XML index checks