Fragmentation (part 3): What are extents?

(Ok - another flight - another blog post. This time its Boston back to Seattle. The three of us who'd come over for the training course upgraded to 1st Class on Alaska for the flight back (great value at $100 for a 6-hour flight) and so there's oodles of room for laptops, newspapers, and long legs)

In the previous posts I explained about database pages - their structure and some page types. Now I'd like to explain how we group pages into units called extents.

An extent is a group of eight physically consecutive pages in a database data file. Extents are always aligned on 64KB boundaries (i.e. 8-page boundaries), starting at the beginning of each data file. Extents, and all their properties, are exactly the same in SQL Server 2000 and 2005.

There are two types of extents: mixed extents and dedicated (or uniform) extents.

Mixed extents

The first 8 pages that are allocated to any IAM chain (either for an index in SQL Server 2000 or allocation unit in SQL Server 2005) are single-page allocations, which we called mixed pages. This means that only a single page is allocated to the IAM chain at a time, rather than a whole extent. The rationale behind this is to allow very small tables to take up the minimum amount of space.

These mixed pages are allocated from mixed extents that are not allocated to any particular IAM chain. The extent is tracked as being allocated globally, so no IAM chain can allocate it.

As the mixed extent is not allocated to any particular IAM chain, this means that it may hold pages allocated to 8 separate IAM chains. IAM pages themselves are always mixed pages, regardless of how many pages have been allocated to an IAM chain. This means a mixed extent may hold a variety of page types too,including IAM, data, index, or text pages.

Two interesting facts:

  • once an IAM chain has passed the 8-page threshold and switched to allocating dedicated extents, it will never go back to allocating mixed pages again.
  • for the purposes of fragmentation, we completely ignore mixed pages and extents as there are so few of them in each IAM chain and it complicates the various algorithms involved.

Dedicated/Uniform extents

Once the magic 8-page threshold is passed, all further allocations are from dedicated extents. This means that an extent at a time is allocated to an IAM chain (and marked as such in one of the IAM pages in the IAM chain). This is also tracked globally.

All pages from a dedicated extent must be allocated to the same IAM chain, and they will all be the same type except in the case of clustered indexes, where there could be a mixture of data pages (from the leaf level) and index pages (from the upper b-tree levels).

Two more interesting facts:

  • just because an extent is allocated to an IAM chain, that doesn't mean that all the pages are. The pages are allocated as needed, so initially only one page will be allocated. There are some exceptions to this rule, including during an offline index build operation, but I'm not going to go into the algorithm details.
  • dedicated extents can be deallocated from an IAM chain if all the pages in the extent become deallocated.

You may ask "how do you know which pages are allocated in an extent?" and "how do you track the global allocation state of extents, especially mixed extents?" . The answers are by using the PFS pages, and by using the GAM + SGAM pages respectively. I'll cover these in the next post.

(Passing over Montana now at 8pm - pretty clear skies and the sun's at an angle to throw the landscape into sharp relief - very cool. First class is great - actual crockery rather than plastic plates, cups and food...)