Under the covers: IAM chains and allocation units in SQL Server 2005

(I'm sitting here in Seattle airport at 7am on Sunday waiting to catch the same flight to Boston that I caught two weeks ago. Instead of TechEd, this time I'm going to a training course at MIT. I'd enjoy the air travel a lot more with a bigger gap in between the flights. At least I got an aisle seat and another 5000 air miles...)

Anyway, time for part two of the discussion of IAM chains.

Yesterday I explained what IAM chains are and how they correspond to indexes in SQL Server 2000. A table can have a heap or clustered index, plus up to 249 non-clustered indexes, plus storage for LOB values (commonly called the text index). This means that in SQL Server 2000, there can be a maximum of 251 IAM chains per table.

In SQL Server 2005, IAM chains and IAM pages are exactly the same, but what they correspond to is different. A table can now have up to 750000 IAM chains! Wow! What on earth did we do?...

There are now three things that IAM chains map space allocations for:

  1. heaps and b-trees (a b-tree is the internal structure used to store an index)
  2. LOB data
  3. row-overflow data

and we now call these units of space allocation tracking allocation units. The internal names for these three types of allocation unit are (respectively):

  1. hobt allocation unit (Heap Or B-Tree, pronounced 'hobbit', yes, as in Lord Of The Rings)
  2. LOB allocation unit
  3. SLOB allocation unit (Small-LOB)

[Edit]And the external names are, respectively:

  1. IN_ROW_DATA allocation unit
  2. LOB_DATA allocation unit
  3. ROW_OVERFLOW_DATA allocation unit

Let's look at three new features in SQL Server 2005 that made these changes are necessary and boosted the number of potential IAM chains per table.

Included Columns
This is the ability for non-clustered indexes to include non-key columns at the leaf-level. This is useful for three reasons:

  1. it allows a non-clustered index to truly cover a query where the query results include more than 16 columns or the combination of column lengths in the query results is greater than 900 bytes (remember that a non-clustered index key is limited to 16 columns and 900 bytes)
  2. it allows columns to be include in the non-clustered index that have data types that cannot be part of an index key (e.g. text or XML)
  3. it allows a non-clustered index to cover a query without having to have all the columns included in the index key. As the index key is included in rows at all levels of the b-tree, this allows the index to be smaller.

An example of space saving: imagine a 100 million row index, with a key length of 900 bytes, but only the first two integer keys are really needed as the index key, the other 4 fixed-length columns could be included.

With the 900 byte key, 8 rows can fit per database page (i.e. the fanout is 8). This means there will be 12500000 pages at the leaf level, 1562500 pages at the next level up in the b-tree and so on, giving a total of 12500000 + 1562500 + 195313 + 24415 + 3052 + 382 + 48 + 6 + 1 = 14285717 pages (including 1785717 to store the upper levels of the b-tree).

If we go with the included columns method then the key size shrinks to 8 bytes, and with the row overhead we can get the row length in the upper levels of the b-tree down to 15 bytes (giving a fanout of approx. 537). Note that the fanout at the leaf-level is still going to be 8,  because the amount of data stored in each row at the leaf-level is the same. So, this means there will be 12500000 pages at the leaf level, 23278 pages at the next level up and so on, giving a total of 12500000 + 23278 + 44 + 1 = 12523323 pages (including 23323 to store the upper levels of the b-tree). Compared to the full-size 900-byte key, this is a 12% saving of 1762394 pages, or 13.6GB! Obviously this is a contrived case but you can see how the savings can occur.

Anyway, I digress. The main reason for adding this feature it to enable true covering queries. A covering query is one where the query optimizer knows it can get all the query results from the non-clustered index and so the query can be satisfied without having to incur the extra IOs of looking up data in the base table - a significant performance saving.

Now that non-clustered indexes can have included columns, and those columns can be LOB data types. This means that having a single LOB allocation unit (as in the case of the single text index in SQL Server 2000) isn't possible any more because each index may have its own set of LOBs. Now, you may ask why we didn't just have a single set of LOBs with multiple references from various indexes plus the base table. We considered that but it would have made things a lot more complicated.

So, with this new feature, each index needs two allocation units - one for the data or index rows (the hobt allocation unit) and one for any LOB data.

Large Rows

One of the things that has plagued schema designers for a long time is the 8060 byte limit on table row sizes so we've removed this restriction in SQL Server 2005. The way we've done this is to allow variable-length columns (e.g. varchar, sqlvariant) to get pushed off-row when the row size gets too big to fit on a single page.

But where do these column values get pushed to? We effectively turn them into mini LOB columns. The column value in the row is replaced with a 16-byte pointer to the off-row column value, which is stored as if its a LOB value in a seperate allocation unit - the row-overflow (or SLOB) allocation unit. These values are stored in text pages in exactly the same way as regular LOB values are, just using a seperate allocation unit. The SLOB allocation unit is only created when the first column value is pushed off-row.

This feature works for non-clustered indexes too - if you consider the ability to have included columns in non-clustered indexes then you could easily have non-clustered index rows that won't fit on a page. It would have been short-sighted of us to get rid of the 900-byte limit and replace it with an 8060-byte limit by not extending the row-overflow feature to non-clustered indexes too.

Now with the addition of this new feature, each index can have up to three allocation units - hobt, LOB, and SLOB. Even with this, that only makes a maximum of 750 IAM chains per table (remember an IAM chain now maps the storage allocations for an allocation unit, so 250 indexes * 3 allocation units = 750 IAM chains). But I mentioned 750 thousand IAM chains per table earlier - where do all the rest come from?

Partitioning

This is what gives us the 1000x multiplier. As you may already know, partitioning is the new feature that allows tables and indexes to be split into a series of ranges, with each range stored separately (most commonly in seperate filegroups). Partitioning is a topic for a separate post.

If each range or partition of the table or index is stored seperately, then each is going to need its own hobt allocation unit. Of course, the LOB values associated with each partition need to be stored with it, and so each partition also needs a LOB allocation unit. Also, the row-overflow feature is per-row, and so rows in each partition will overflow into SLOB allocation units just as for un-partitioned tables and indexes. Thus each partition of a table or index can have up to three allocation units (and hence three IAM chains).

Still, where does that 1000 come in? Each table or index can have up to 1000 partitions. This gives us 250 indexes x 1000 partitions x 3 allocation units = 750000 IAM chains. Realistically this probably won't happen, but it is possible.

Now you know some more of how things are organized in SQL Server 2000 and SQL Server 2005 and hopefully this will help understanding some of my future posts.

Ah - here comes the breakfast cart...