Yes, I’m alive.
I have been chatting with my wife about the times I need to “go dark” – in this case, I am working on features that I can’t discuss yet, other than to say that I am *very* excited to go to work every day.. (well, since I work out of my house, I really just walk up the stairs, but give me some artisitic liberty). That doesn’t mean that I can’t talk about other topics, of course, so feel free to mail me if you have a pressing one. I just can’t blog about my regular work each day.
I’ve gotten some questions about OPTIMIZE FOR UNKNOWN, which was added in 2008. It does seem a bit daft to try to reason about the unknown, but in this case there is a good reason.
For a primer on parameters and parameter sniffing, I’ll refer you to my previous post on the subject:
The UNKNOWN extension is intended to help customers trying to get a good “default” parameterized plan. OPTIMIZE FOR is able to let you specify a particular paramter to use when creating a plan. Internally, it uses the cardinality estimate for a specific value to help determine the plan through the normal costing formulas. The main reason to do this is that the plans are cached, and if you cache a plan for a very common parameter value, it may not be the best plan for average use. Alternatively, if you cache the plan for the average predicate selectivity, you also may be suboptimal when you cache this plan for use in your workload.
OPTIMIZE FOR UNKNOWN lets you say “I want to select a plan that is based on the average selectivity for the predicate (the not common parameter value), but I don’t know a good value to use”.
Statistics in SQL Server have a histogram to track the distribution of values and their frequency. It also contains an average frequency over the whole table, which is stored inverted as 1/frequency in a field that is called the “density”). This value is used in different places. One place where it is used is in cardinality estimation for the GROUP BY operator (if you have 3 occurannces per value, grouping on that value should reduce your cardinality to 1/3 of the previous value). This same value happens to also be used for cardinality estimation for parameters with some predicates. If I have “WHERE col1 = @p”, you can estimate that, for the average case, that predicate will return whatever the average number of duplicate values in the table is (assuming there is a match, which is generally the case).
So, OPTIMIZE FOR UNKNOWN is a nice way to tell the optimizer that you want a parameterized plan for the average parameter frequency distribution case. The nice benefit is that if you add a lot of data to the table and then update the statistics, the definition of “average” is defined based on the current statistics, so you don’t have to worry about pegging a constant in your plan that is not representative in the future as your data changes.
My almost-in-print book chapter in the Inside SQL Server series for 2008 has a section on the math behind cardinality estimation with statistics. (Update: apparently Kalen changed the name of the book series – it appears to be called Microsoft SQL Server 2008 Internals now: http://sqlblog.com/blogs/kalen_delaney/archive/2008/10/23/SQL-Server-2008-internals-is-orderable.aspx)
I hope this gives you some insight into the optimizer’s thought process when generating a query plan.