Here’s a topic that’s cropped up several times during Q&As at TechEd this year – what happens to non-clustered indexes when changes are made to the underlying table? Are they always rebuilt or not?
Before we get into that discussion, I’ll give you a little background.
One way to describe a non-clustered index is whether it is a covering index or not. A covering index is one which has all the table columns necessary to satisfy a query, and so there is no need to go back to the underlying table (which I’ll call the base table) to fetch additional columns. Most non-clustered indexes are not covering indexes and so it is necessary for each non-clustered index record to include a link back to the corresponding base table record. This link takes various forms, depending on the structure of the base table:
Heap: A heap does not impose any logical ordering on the records within it, so the only way to find a particular record within a heap is to do a full table scan or to know the physical location of the record (i.e. which record slot on which data page) – called the physical record ID or physical RID. Clearly doing a full table scan every time a query needs to go from a non-clustered index record back to a heap is not feasible, so each non-clustered index record stores the physical RID of the corresponding heap record.
Clustered index: A clustered index does impose a logical ordering on the records within it and the ordering exactly matches the definition of the clustered index keys – known as the cluster key. The fastest way to find a particular record in a clustered index is to use the cluster key to navigate through the clustered index b-tree to find the exact record. So, non-clustered indexes over a clustered index include the cluster key of the base table record in each non-clustered index record. This is known as the logical RID. There is a twist though, depending on whether the clustered index is unique or not.
Non-unique clustered index: Every record in a clustered index HAS to be unique, otherwise there would be no way to deterministically navigate to a particular record using the index b-tree. In a non-unique clustered index, SQL Server has to add a special column, called the uniquifier, to each record, so that if multiple records have the same cluster key values, the uniquifier column will be the tie-breaker. This uniquifier column is added as part of the cluster key, and so it is also present in all non-clustered index records as part of the logical RID.
Unique clustered index: Every record in a unique clustered index is already unique and so no extra column is required in the cluster key.
Ok – background out of the way. Now let’s discuss what happens to non-clustered indexes when various actions are performed to the base table.
SQL Server 2000
- Going from a heap to a clustered index: Changing the fundamental structure of a table from a heap to a clustered index means you’re changing the way that records can be looked up in the table, from physical to logical. This means that all the physical RIDs that are stored in the non-clustered index records need to be changed to logical RIDs. So – in this case, all non-clustered indexes are rebuilt after the clustered index has been built.
- Going from a clustered index to a heap: This is the opposite of the case above – you’re changing the way records are looked up from logical to physical. This means that all the logical RIDs that are stored in the non-clustered index records need to be changed to physical RIDs. So – in this case, all non-clustered indexes are rebuilt after the clustered index has been dropped.
- Rebuilding a unique clustered index: This operation has no effect on the cluster key, so there is no need for the non-clustered indexes to be rebuilt. Various early Service Packs of SQL Server 2000 had bugs that caused this behavior to be broken – and this is the cause of much of the confusion around this behavior.
- Rebuilding a non-unique clustered index: In SQL Server 2000, when a non-unique clustered index (which contains uniquifier columns) is rebuilt, all the uniquifier values are regenerated. This essentially means that the cluster keys have changed and so all the non-clustered indexes are rebuilt after the clustered index rebuild has completed.
- Changing the clustered index schema: This is simple – any change to the clustered index schema that changes the cluster keys will cause all the non-clustered indexes to be rebuilt.
SQL Server 2005
Going from a heap to a clustered index: This is the same as SQL Server 2000 – all non-clustered indexes are rebuilt, regardless of whether the clustered index was created online or offline.
Going from a clustered index to a heap: Again, all non-clustered indexes are rebuilt, regardless of whether the clustered index is dropped online or offline.
Rebuilding a unique clustered index: This is the same as SQL Server 2000 – the cluster keys aren’t changing and so the non-clustered indexes are not rebuilt.
Rebuilding a non-unique clustered index: Aha! This is different from SQL Server 2000. SQL Server 2005 will RE-USE the old uniquifier values so the cluster keys don’t change. This means that non-clustered indexes are NOT rebuilt in this case – that’s very cool!
Changing the clustered index schema:
Changing the cluster key: This behavior has to be the same as SQL Server 2000 – any change to the cluster keys forces all the non-clustered indexes to be rebuilt after the new clustered index has been created.
Changing anything else (e.g. partitioning it or moving it): This is one of the cases that confuses people in SQL Server 2005. Applying a partitioning scheme to a clustered index, or moving it to a different filegroup, doesn’t change the cluster key at all – so there’s no need to rebuild the non-clustered indexes.
So the rule of thumb to remember is: if the physical or logical RID contained in the non-clustered index records needs to change then all the non-clustered indexes need to be rebuilt. Its as simple as that.
Hopefully this will help to clear up some confusion – let me know of any questions!