Some things I learned about SQL Indexing Tuning recently (posted by Paul)



Where to put the clustered index


Last month there was a heated discussion in our group about whether to move the clustered index of a table from the primary key (int identity) to two columns that are heavily referenced in queries.


Many people would think that the best place to put the key would be on the columns that are heavily queried with the thinking that this would translate into the best performance, e.g. a clustered index is always faster than a non-clustered index.


About the same time I also witnessed an architect rip a database schema apart because said developer was using synthetic keys instead of natural ones, and how one should always use a natural key instead of a synthetic and put the clustered key on that.


Pretty convincing arguments until I came across Kimberly Tripp's blog on the topic. It seems that the above two arguments are not always - or even in most cases - valid. In fact she asserts that the best place (usually) to put a clustered index would be on a synthetic key. The criteria she looks for in a clustering key are:


1. Unique
2. Narrow
3. Static (not changed)


You can read the full explanation at http://www.sqlskills.com/blogs/kimberly/default.aspx#abdaee3f7-1e15-414b-b75f-a290db645159. I haven't checked out her webcasts yet so am not sure it's there, but I've seen her present these arguments mathematically, walking through how the indexes are actually used and calculating out the IO involved with each step. She also goes into a fair amount of explanation as to why tuning practices that were developed for earlier versions of SQL have been superceded.


In the military we were taught to not to memorize the maintenance manuals because procedures change, and if we tried to go from memory we could find ourselves doing unsafe things, such as applying the wrong amount of torque to a bolt.


In the same vein there is no silver bullet to SQL server index tuning; if you want to get the most out of your database you have to take the time to tune your indexes and not rely on the strategy of "we always did it this way" or "because I just don't like [some] practice".


In the where clause, put the most restrictive (selective) index at the beginning


To be honest I never thought much about how I ordered my AND statements in the where clause. But apparently SQL filters data progressively and if you have indexes the order is important. So the first statement will run faster than the second.


SELECT m.Member_No, m.FirstName, m.Region_No
FROM dbo.Member AS m WITH (INDEX (MemberFirstName, member_region_link))
WHERE m.FirstName LIKE 'K%'  
        AND m.Region_No > 6
        AND m.Member_No < 5000 


SELECT m.Member_No, m.FirstName, m.Region_No
FROM dbo.Member AS m WITH (INDEX (member_region_link, MemberFirstName))
WHERE m.FirstName LIKE 'K%'  
        AND m.Region_No > 6
        AND m.Member_No < 5000 



Such a simple thing and yet it seems missed from a lot of SQL performance tip sheets.


Paul


Comments (2)
  1. Gabe says:

    I think the proper thing to cluster on is the key that is most often used to reference the rest of the record.

    A table that is most often used as a foreign key that is joined on its identity column will most often be looked up by that column, so that table should be clustered on its identity.

    On the other hand, a table that is often searched by range (perhaps order dates) should most likely be clustered on the range key.

    Of course, if all you are doing is searching for the key itself (customer names, perhaps), you are probably better off to build an index on that key and let the optimizer cover the query with that index instead of scanning the table.

  2. MSDN Archive says:

    Regarding:

    "On the other hand, a table that is often searched by range (perhaps order dates) should most likely be clustered on the range key. "

    Actually she points out that this was the case back in SQL6.5, but is not a valid strategy on SQL7 and up. The reason was that there was no row level locking in 6.5 so you wanted to avoid locking pages, e.g. you tried to group your data by range. But in 7.0 was introduced row level locking and it’s no longer a recommended strategy. She covers all this in detail on her blog.

    One of the key take-aways is that what you are really doing with indexes is providing information to SQL Server so that SQL can make the best decisions.

    The second take away is that index tuning strategies not only change based on your underlying data, but also on which version of SQL Server you are trying to tune against.

Comments are closed.

Skip to main content