OPTIMIZE FOR UNKNOWN – a little known SQL Server 2008 feature

Using parameterized queries is a well known SQL Server Best Practice. This technique ensures caching and reuse of existing query execution plans (instead of constantly compiling new plans), as well as avoiding SQL injection by mandating that input data be data type safe.

See more about SQL Server parameterization Best Practices here: https://blogs.msdn.com/sqlprogrammability/archive/2007/01/13/6-0-best-programming-practices.aspx

An application that I work with presented me with an interesting dilemma; It wanted to utilize the benefits of plan reuse but the parameter values that the application initially sends to SQL Server are not representative of the values passed in the subsequent re-execution of the statement. SQL Server compiled and cached a ‘good’ plan for the first parameter values. Unfortunately, this had the unintended side effect of caching a poor execution plan for all subsequent parameter values. To make this clearer let’s look at the following example query;

Select * from t where col1 > @P1 or col2 > @P2 order by col1;

Let’s assume for simplicities sake that col1 is unique and is ever increasing in value, col2 has 1000 distinct values and there are 10,000,000 rows in the table, and that the clustered index consists of col1, and a nonclustered index exists on col2.

Imagine the query execution plan created for the following initially passed parameters: @P1= 1 @P2=99

These values would result in an optimal queryplan for the following statement using the substituted parameters:

Select * from t where col1 > 1 or col2 > 99 order by col1;

Now, imagine the query execution plan if the initial parameter values were: @P1 = 6,000,000 and @P2 = 550.

As before, an optimal queryplan would be created after substituting the passed parameters:

Select * from t where col1 > 6000000 or col2 > 550 order by col1;

These two identical parameterized SQL Statements would potentially create and cache very different execution plans due to the difference of the initially passed parameter values. However, since SQL Server only caches one execution plan per query, chances are very high that in the first case the query execution plan will utilize a clustered index scan because of the ‘col1 > 1’ parameter substitution. Whereas, in the second case a query execution plan using index seek would most likely be created.

Unfortunately if the initial parameter values are similar to the first example above, then a ‘table scan’ execution plan gets created and cached, even though most of the following queries would rather use a plan that contains the index seek.

There are a number of ways to work-around this issue;

· Recompile every time the query is executed using the RECOMPILE hint - This can be very CPU intensive and effectively eliminates the benefits of caching queryplans.

· Unparameterize the query – Not a viable option in most cases due to SQL injection risk.

· Hint with specific parameters using the OPTIMIZE FOR hint (However, what value(s) should the app developer use?) This is a great option if the values in the rows are static, that is; not growing in number, etc. – However in my case the rows were not static.

· Forcing the use of a specific index

· Use a plan guide – Using any of the recommendations above.

SQL Server 2008 provides another alternative: OPTIMIZE FOR UNKNOWN

SQL Server 2008 provides a different alternative; the OPTIMIZE FOR UNKNOWN optimizer hint. This hint directs the query optimizer to use the standard algorithms it has always used if no parameters values had been passed to the query at all. In this case the optimizer will look at all available statistical data to reach a determination of what the values of the local variables used to generate the queryplan should be, instead of looking at the specific parameter values that were passed to the query by the application.

Full documentation of optimizer hints can be found here:

https://msdn.microsoft.com/en-us/library/ms181714(SQL.100).aspx

Example:

@p1=1, @p2=9998,

Select * from t where col > @p1 or col2 > @p2 order by col1

option (OPTIMIZE FOR (@p1 UNKNOWN, @p2 UNKNOWN))

Using this new optimizer hint option has allowed the ISV to generate queries that result in the benefits of parameterization; such as plan reuse, while eliminating the problems caused by the caching of queryplans that were created using nontypical initially passed parameter values.

NOTE: This new optimizer hint option, like all optimizer hints, should be used only by experienced developers and database administrators in cases where SQL Server cannot create an optimal plan.

Cross Posted from https://blogs.microsoft.com/mssqlisv