Conor vs. Misbehaving Parameterized Queries + OPTIMIZE FOR hints

In my previous post, I debugged a customer problem for you, including the various guesses I had to make and why they matter.

The specific problem was likely related to parameter sensitivity, an issue in query optimization where the optimizer will try to generate a plan that is optimal for one parameter value but perhaps not optimal over all parameters values likely to be used in the future calling pattern of this query.  I had recommended using the optimizer hint OPTIMIZE FOR (@p <value>) to nudge the optimizer to use a value that was better for the average workload.

Another user wants to better understand the difference between OPTIMIZE FOR(@p value) and OPTIMIZE for (@p UNKNOWN), a new hint that was added in SQL 2008.  This new hint optimizes for an average value rather than a specific value.  It is probably a good time to review a few background details of parameterized query plans so that I can use some vocabulary that we use internally to discuss specific customer cases.

So, the abbreviated summary (which you can find in earlier blog posts if you have not seen before:

1. Queries in SQL Server (all current versions) are compiled and are usually cached.  That plan is used for all subsequent executions as long as it does not recompile.

2. Queries with parameter values _can_ (but do not always) have their values “sniffed”, meaning that the optimizer takes a peek at the current state of the containing T-SQL batch, using this value to generate a more optimal plan.  You can tell if a particular parameter value was sniffed by looking at the showplan xml output, but please understand that the current system is imperfect in terms of sniffing.  For the rest of this post, I will assume that values ARE sniffed for simplicity in describing the behavior that happens in the system.

 

When the optimizer knows a specific value to sniff, it will be able to determine things like whether a table scan is likely to be faster than a seek + bookmark lookup for a given filter.  For example, “SELECT * FROM T WHERE col = @p” would likely generate a scan plan if col = @p is non-selective.  If it is highly selective, then the seek may be optimal.  This seek/scan plan flip is one common case where parameter sniffing can yield a more optimal plan for a specific parameter value.  Whether this plan is optimal for the workload is another matter entirely.  Unfortunately, the optimizer is in a tricky spot.  Sniffing the value often leads to a much better plan – so much so that customers complain if they don’t get that plan.   So, there is an interaction between the initial sniffed value + the resulting plan that is generated with the average calling pattern for that parameter.  (It gets more complicated when you have many parameters).

There are a number of different common “plan flips” where the optimizer generates different plans based on what is sniffed.  In addition to seek/scan flips, I often see index flips where the optimizer will choose a seek on index 1 vs. a seek on index 2.  I also see join order flips in OLTP applications.  The main ingredients here are that your workload be repetitive enough to see the patterns.

Even if you do have plan flips, this does not necessarily mean that your application will have problems.  The relative performance of those plans may be very close, meaning that there is no real substantial benefit in picking one over the other.  Even if the performance isn’t great, it might be that the query is just unimportant to an application – perhaps it is not really visible to the application and the whole job takes awhile, so some variability in one query is insignificant.  The key point is that this problem includes the perception of the user and is not just some arbitrary metric defined in the optimizer.

The most common class of parameter sensitive problem (PSP) we see is where sniffing one value has “good” performance (from the perspective of the customer) for all parameter values while sniffing + caching the plan for another parameter value leads to “bad” performance. 

Common Case (Atypical parameter):

  Executing value 1 Executing value 2
Sniffed value 1 “good” “bad”
Sniffed value 2 “good” “good”

 

Less Common (Distinct Sets of Optimal Plans/Values): 

  Executing value 1 Executing value 2
Sniffed value 1 “good” “bad”
Sniffed value 2 “bad” “good”

 

90% of the cases I see (where I can also get enough information to categorize) relate to having an uncommon value sniffed.  If we take our earlier example, if @p is a very frequent value in the data distribution, the filter WHERE col = @p is highly unselective and the Scan plan may be generated.  However, this value may be very infrequently used.  As a result, most of the time the seek plan is generated but occasionally the scan plan might be generated instead if this infrequent value happens to be passed when the query needs to be recompiled.  Understanding the frequency of calling values vs. the frequency of data distribution is necessary to fully understand which case you have.  Note that I could construct an example this is opposite of this, where picking the scan plan is the right answer in most cases but sniffing a very infrequent value with low representation in the data set could cache a seek plan.  The metric is overall workload time.

The other kind of problem we see is that plans are really only optimal for certain sets of parameters.  If you use the wrong plan with the wrong parameter, then the performance is “bad” (again, from the perspective of a customer).  This is harder because you actually don’t want to cache plans like this, so adding option(recompile) is usually needed if you want consistent performance.  (You can also split the query into N copies and call each separate copy based on the values you see via T-SQL).

To optimize overall workload throughput, the other piece of the equation is compilation time.  If you have a very expensive query to run but is cheap to compile, option(recompile) fixes all PSP problems easily :).  If you have a PSP problem where the performance difference between the “good” and “bad” cases is larger than the total compilation time, you might very well be able to use option(recompile) to bound that query’s performance.  If you have a critical code path that needs faster performance than compiling each time, then you should consider the OPTIMIZE FOR hints.  If you are in the common case, you can hint a regularly used parameter value and that will avoid the “bad” performance case.  If you are in the less common case, you may need to revert to OPTION(RECOMPILE) or perhaps split your queries into different copies so each set of parameters can get their own plans.

Now I’ve given enough background to explain OPTIMIZE FOR … UNKNOWN.  There are some customers who have a common PSP problem but do not know a common value to pass.  Perhaps their data distribution changes over time.  Perhaps they don’t know their data set that well.  OPTIMIZE FOR… UNKNOWN tells the optimizer to use an “average” value to optimize the query, and this often (but not always) will avoid the “bad” squares in the taxonomy I described.

Here’s an example of how this works:

SELECT * FROM T WHERE col = @p.

Let’s say that T has 10000 rows.  Let’s say that we have the values 1-5000 each in the table once.  Let’s have the value 5001 in there 5000 times.  Furthermore, we have the column “col” in an index and we also have a lot of other columns that are not in that index.  If the optimizer sniffs any of the values 1-5000, it is likely to generate the seek + fetch plan.  For the value 5001, the scan plan is optimal. 

When the optimizer is reasoning about parameters, it might not have a specific value to use.  So, it reverts to use the density/frequency information stored in our histograms.  That stores a representation of how common an “average” value is in the data distribution.  In this case, the first 5000 values exist once and the last value is there 5000 times, so selecting an “average” value in an equality predicate would yield ~2 rows.  2 may be a low enough cardinality such that the seek + fetch plan is still optimal and the customer is happy.  Note that running the parameter 5001 with that plan may still be very slow but perhaps it is never/rarely done.  That parameter will get suboptimal performance.

The use of the frequency/density information varies a bit based on the scalar predicate or relational operator.  GROUP BY col would use the density, but an arbitrary scalar condition in a WHERE clause may or may not (the conditions are too complex to explain and we don’t document them all publicly anyways).

I hope this gives a bit more insight into parameter sensitivity problems and how to solve them

Happy Querying!

Conor