A technique for ensuring plan stability in SQL Server 2000


One of the great features in SQL Server is its optimizing query processor. The Sybase version of SQL Server pioneered on-the-fly query optimization, and, even to this day, intrinsic query optimization is one of the most useful features of the product. By automatically optimizing end-user queries, SQL Server further separates the logical work requested from the physical means of doing it, providing greater plan reuse and flexibility for most users.


There are, however, times when some of the advanced features of the optimizer can cause problems. Sometimes it simply helps too much. One of those is when it automatically uses the parameter values passed into a stored procedure to tailor the execution plan it will construct for a query. This is called “parameter sniffing” and is normally a good thing. It allows the optimizer to take into account the actual values of stored procedure parameters as it builds plans to service queries within the procedure that make use of them. IOW, if you pass a value of “bar” for a parameter named @foo, then use @foo in the WHERE clause of a query within the procedure, the optimizer is smart enough to use “bar” when it estimates how many rows will match the WHERE clause as it compiles a plan for the query.


While parameter sniffing is certainly a powerful feature, it can cause problems when a procedure’s plan happens to have been kicked out of the procedure cache (or was never in it) just prior to the procedure being called with atypical parameter values. This can result in a plan that is skewed toward atypical use, one that is suboptimal (or even downright slow) when called with typical values. Since, once cached, a query plan can be reused for parameter values that vary widely, the ideal situation is to have a plan in the cache that covers the typical usage of the procedure as much as possible. If a plan makes it into the cache that is oriented toward atypical parameter values, it can have a devastating effect on performance.


An example would probably help here. Suppose we had a stored procedure that returns sales data by country. In our case, three-fourths of our sales occur in the U.S. The procedure takes a single parameter, @country, indicating the country for which to return sales info. It uses this parameter to filter a simple SELECT statement that returns the requested sales data.


Typically, the optimizer would choose to do a clustered index scan when servicing this query because (given that “USA” would normally be passed in for @country) so much of the table would be traversed anyway that scanning it would require less I/O and be faster than repeated nonclustered index lookups. However, what happens if the plan happens to have been kicked out of the cache (let’s say due to an auto-statistics update) just prior to a user calling it with, say, “BEL” (Belgium), where we have almost no sales? Assuming a suitable index exists, the optimizer may decide to use a nonclustered index seek in the new query plan. Subsequent executions of the procedure would reuse this plan, even if they passed in “USA” for @country. This could result in performance that is orders of magnitude slower than the scan-based plan.


Plans can be kicked out of the procedure cache for lots of reasons having nothing to do with the end user. This, coupled with the aforementioned downside of parameter sniffing, can lead to unpredictable performance from stored procedures that are called with widely varying values for their parameters.


SQL Server 2005 has some features to help combat this, but there are no built-in features for dealing with it in prior releases. There is, however, a simple trick that you can use to control what parameter values the optimizer uses when formulating a query plan. This trick works on every release of SQL Server since 7.0. It involves fooling the optimizer into using values you specify in the procedure itself to effectively disable parameter sniffing. Let’s look at some code:


CREATE PROC prSalesByCountry


@country char(3),


@template_country char(3)=”USA”


AS


SET @template_country=@country


SELECT * FROM sales WHERE country=@template_country


GO


This technique is really very simple. We filter the query on a parameter, @template_country, that we never actually pass into the procedure. We always pass in the filter value we want to use via @country and let @template_country retain its default. The optimizer notices that we’re filtering the query against the sales table using @template_country and sniffs its value (which will always be “USA” at compilation time – again, because we never actually pass it, and “USA” is its default value) when creating the query plan for it.


Note the assignment of @country to @template_country. Because this statement doesn’t take effect until the plan is actually executed, the optimizer can’t take it into account during plan compilation. This is a key point to grasp. This technique works because the optimizer cannot factor the assignment of @country to @template_country into the compilation process. As far as the optimizer is concerned, @template_country will always have a value of “USA” when the SELECT statement is processed.


So, this effectively defeats parameter sniffing and allows you to ensure plan stability against parameter sniffing quirks regardless of what happens to a cached plan over time.


Of course, this is great for the typical case, but what if you genuinely wanted a different plan for atypical parameters? You have several options. If compilation isn’t too expensive, you could create or execute the procedure using the WITH RECOMPILE option. You could also split the typical and atypical cases into two different procedures so that you could ensure that they’d get their own plans. There are several options here. The technique above is really intended to address the not uncommon scenario where overzealous parameter sniffing and plan reuse combine to cause a procedure to perform unpredictably.


This approach was first documented in my last book, The Guru’s Guide to SQL Server Architecture and Internals. See the “Parameter Sniffing” section in the Query Processor chapter in that book for more info.


 


Comments (19)

  1. Dis4ea says:

    Hi,

    Great article. We are having similar problems with some of our stored procedures. I guess we’ll have to investigate this solution.

    Great book too btw 😉

  2. Eric Hanson says:

    This is a great posting. Thanks Ken. The feature in SQL Server 2005 to help work around this issue is called the USE PLAN query hint. For example, you can say

    SELECT * FROM sales WHERE country

    OPTION (OPTIMIZE FOR (@country=’USA’)

    to accomplish the same thing. Also check out the "plan guides" feature (sp_create_plan_guide) which allows you to associate query hints with a query even when you can’t change the application. This is great when you’re trying to tune ISV applications you’ve deployed but for which you don’t have the source code. Another plan stability feature in SQL Server 2005 is the USE PLAN <xmlplan> query hint, which lets you force use of a specific plan every time the query is run (if you know that is really what you need).

  3. Eric Hanson says:

    P.S. I’m a program manager in the query optimizer team for SQL Server at Microsoft.

  4. chirs stephenson says:

    Good article,

    I have a similar problem, however i don’t think this approach will help me, i’m stuck with the WITH RECOMPILE option.

    In my situation my query is running against a partitioned view. the cached plan points the query to run against a specified partition, but in reality the next query that tries to use the plan could run against any other partition. the result for me is that the query then can’t use the plan but instead of deriving a new plan it gives up and runs against all partitions (veeery painful).

    it seems to me that this isn’t dealt with well by sql server (2000 in my case) i wonder if 2005 will fare better?

    Chris

  5. Bob says:

    Ken,

    Great trick, definately helps us in a couple trouble spots.

    Would this same trick work for more complicated procs?

    The problem we are facing is a proc with a variable length XML string as a parameter. We then use sp_xml_preparedocument to dump the data from into a "@" table. From there we join off the table for inserts/updates/deletes.

    Wasn’t sure how the optimization engine would handle such a procedure.

    Bob

  6. Lee Everest says:

    Good read for sure. I think Ken England described this first in his SQL 2k in his Performance Tuning handbook (sorry forgot the exact name) but didn’t title it as such.

    I have been testing this some more and find it difficult to see if a database application is even experiencing this phenomena. On the one hand, when you think about it, any stored procedure which accepts input parameters could have parameter sniffing. On the other hand, detecting a bad plan use seems trivial. First thought that comes to mind to see if this is even remotely occuring is to just look at syscacheobjects to see if multiple plans exist. Interestingly, the ones that I found are limited to only the procs that shell out (call) other procs:

    create proc a

    as

    exec dbo.proc b

    exec dbo.proc c

    go

    Thoughts?

    Lee

    ps we’ve already chatted about this just wanted to share with the thread 😉

  7. Your site is very informational for me. Nice work.

  8. don says:

    The situation described here is related to one that I run into frequently, and for which I’ve never found a reliable approach.

    Assume a transaction table (cumulative, not updated) except as noted in a moment, where the transaction needs to pass through a series of states identified by a State column, CHAR(1). Example might be ‘N’ for new, ‘P’ for processing, and ‘C’ for complete. There’s an index on State. Pretty soon the statistics tend toward .01% state ‘N’ or ‘P’, 99.99% ‘C’.

    So the cardinality is 3. In my experience, the query optimizer avoids the index on State like the plague, even though one always queries for state values of ‘N’ or ‘P’.

  9. Hot Rats says:

    Great description – thanks Ken!

    If what is needed is known (such as a State column), one reliable approach is as Eric had suggested (in pseudo-T-SQL):

    IF @ThisState = ‘N’ OR IF @ThisState = ‘P’… USE PLAN…

    Another option is to partition or archive "completed" data (since nodoby is appearently looking for it, anyways…).

  10. don says:

    Yup, but I’m puzzled why the optimizer can’t handle it. I try to use structural solotions as a last resort, beccause they are less maintainable and are subject to Unintended Consequences.

  11. Hot Rats says:

    The optimizer will discover no useful purpose for an index when an indexed column has a "common" value. For example, let’s say that 20 rows are persisted (at the data level), and that 10 indexed columns can be persisted into one index page. That index’s B-Tree thus has one root page, two leaf level index pages, and two data pages. If the Optimizer knows (from statistics) that more than one (of those two) data pages will be required (because that value is common) then the Optimizer has two choices: Use the index or don’t. If the index were to be used, then the optimizer knows that the root page plus both leaf level pages, plus the two data pages, must be read (5 pages total). On the other hand, if a table scan were to be used, then the Optimizer knows that just two pages (the data pages) must be read.

    For the other scenario (where the parameter is the more selective), the Optimizer still knows that more than one data page will be read, and thus the use the index’s B-Tree will still be avoided (5 to 2).

    Thus, when the Optimizer faces such choices, it will avoid an index like the plague (because the index is actually more expensive). Similarly, this is why one often hears that creating an index a bitmap is not too useful:). And this is why (if you think about it) most phone books do not go overboard when indexing variations upon common names (such as Smith); it typically faster for a phone book reader to simply go directly to the ‘S’ chapter and start reading.

    Even so, if a cardinality is fairly selective (where cardinality ~= rows, more or less) , and yet if the distribution of statistics are skewed (i.e., has a distribution of values that are not what one might consider to be presented by a classic bell curve histogram), then parameter sniffing can cause the Optimizer to choose the ‘wrong’ (more accurately: ‘a less commonly useful’) execution plan. In such a situation, I think other options discussed by Ken (such as WITH RECOMPILE) should be considered as a viable solution.

    For me, partitioning is a structural solution that is based upon business needs. For example, I use an archival table (or a column, and the choice of which depends upon those very same statistics) to mark some records as no longer being required. For my business’s most common needs, I simply move those records once or twice a day (currently I have this setup to be run by data entry people, even though they do not know they are causing that archival of previously marked rows:). The unintended consequence of my _not_ having a archival process would be database bloat, and my employer would not be happy to see the business expend management resources upon issues that are caused by my allowing such bloat (at least, eventually… :), such as the time it takes to reindex, update stats, run checkdb, restore the database, etc.