Sometimes the Simplest Solution Isn't the Best Solution (The Optional Parameter Problem)

Programmers should naturally gravitate toward the simplest, most elegant solution. This is because the simplest coding solution is so often the best solution: simple solutions are cheaper to implement; easier for others to understand, maintain, and extend; and less prone to bugs. Simplicity can also be associated with efficient and consistent execution, but performance is one area where Occam's Razor doesn't apply quite as consistently. Performance requirements, more often than many other types of requirement, may demand introducing complexity into what would otherwise be a nice, tidy implementation. I’m going to discuss one such case here.

 

The All-In-One Search Query

Sometimes you need a query or a stored procedure to expose several search parameters but actually do the search only for the subset of the parameters that the caller cares about. For example, consider a [find_customer] stored procedure that can search for a @name value, a customer @id, an @address, or any combination of these. The most obvious solution to this problem looks something like this:

 

    SELECT [id], [name], [address], ...

    WHERE (@name IS NULL OR [name] = @name)

        AND (@id IS NULL OR [id] = @id)

        AND (@address IS NULL OR [address] = @address)

 

This allows the caller to pass non-NULL values for the search criteria they care about, and to pass NULLs for any criteria that they want to be ignored. This pattern is variously referred to as "all-in-one search query", "catch-all query", "dynamic search conditions", or "optional parameters". Here’s a sample stored proc that shows a slightly simplified example with just two available search parameters:

 

    CREATE PROC find_customer

    @name nvarchar(128) = NULL,

        @id int = NULL

    AS

    BEGIN

        SELECT [id], [name], [address], [creation_date]

    FROM [customers]

        WHERE (@name IS NULL OR [name] = @name)

    AND (@id IS NULL OR [id] = @id);

    END;

 

This idiom is simple enough that it is relatively easy to understand. Despite its simplicity, it allows a single query to meet many different needs (in this case: a search on name, or a search on address, or a search on both name and address). It is understandable that queries like this are so popular; it seems to be an elegant solution, and developers like elegance. Unfortunately, it’s usually a bad idea for practical performance reasons. The problems are significant enough that I regard this as an anti-pattern in my own code.

 

What's Wrong With the Obvious Solution?

The first problem is that the OR operators will prevent an efficient index seek-based query plan. If the search parameters are mutually exclusive (the caller is only supposed to provide one parameter value), you can avoid this problem by changing the query to:

 

    SELECT [id], [name], [address], [creation_date]

    FROM [customers]

    WHERE (@name IS NOT NULL AND [name] = @name)

        OR (@id IS NOT NULL AND [id] = @id);

 

That may allow an index seek, but you should expect that it will still perform poorly in some cases because it suffers from other problems that I’ll discuss next. Also note that the query semantics aren't quite the same as the first query.

 

The second problem that affects both the original problem query and the attempted rewrite is that SQL does a poor job with plan costing for queries like this. In general, you should avoid using filter predicates where both operands are constants and one or both are variables or parameters (e.g. “where @var = 1”, or “where @parameter is null”). These have a tendency to screw up the optimizer’s estimates because at plan compile time the optimizer has no idea whether the predicate will qualify all rows, or disqualify all rows. You are essentially guaranteed that its blind guess will be exactly wrong for some parameters. The bad estimate is likely to cause inaccurate costing of plan alternatives, which in turn is likely to cause selection of a suboptimal plan.

 

The third problem is that there is no single plan that will be appropriate for all of the different combinations of parameters, even if the optimizer was smart enough to predict the outcome of “variable = constant”-type predicates at compile time. For example, suppose you had a nonclustered index on the [name] column, and another nonclustered index on [id]. If SQL chose a plan that first did a lookup using the index on the [name] column, this would be a very poor choice whenever the caller passed NULL for @name. Conversely, a plan that first scanned an index on the [id] column would be inefficient when @id was null and @name was non-null. 

 

The only situations where you should use queries like either of those shown above is when (a) you don't care about the performance of the query, or (b) you can guarantee that the table will only contain a handful of rows so that a table scan-based plan will never be noticeably slower than a seek-based plan. 

 

Possible Solutions

One possible solution would be to have a different query for each search parameter (or combination of search parameters, if that is legal input for the procedure). This is shown below. It may be a practical solution if you only have two or three possible search parameters. 

 

    CREATE PROC find_customer

        @name nvarchar(128) = NULL,

        @id int = NULL

    AS

    BEGIN

        IF (@name IS NULL)

        BEGIN

            SELECT [id], [name], [address], [creation_date]

            FROM [customers]

            WHERE [id] = @id;

        END

        ELSE IF (@id IS NULL)

        BEGIN

            SELECT [id], [name], [address], [creation_date]

            FROM [customers]

            WHERE [name] = @name;

        END

        ELSE BEGIN

            SELECT [id], [name], [address], [creation_date]

            FROM [customers]

            WHERE [name] = @name AND [id] = @id;

        END;

    END;

 

Note that parameter sniffing may cause bad plan selection even in this case if each search query is more complex than the queries shown here (if the queries include joins against large tables, for example). You can protect against this problem by assigning the parameters to local variables and using the variables in the queries. You can also avoid the problem by moving the search queries to a different compilation scope (child stored procedures, dynamic SQL, or "OPTION(RECOMPILE)" query hints).

 

The second possible solution uses dynamic SQL to construct the WHERE clause dynamically. 

 

    CREATE PROC find_customer

        @name nvarchar(128) = NULL,

        @id int = NULL

    WITH EXECUTE AS OWNER

    AS

    BEGIN

        DECLARE @search_query nvarchar(max);

        SET @search_query = '

            SELECT [id], [name], [address], [creation_date]

            FROM [customers]

            WHERE 1 = 1 ';

       

        IF (@name IS NOT NULL)

        BEGIN

            SET @search_query = @search_query + 'AND [name] = @name';

        END;

       

        IF (@id IS NOT NULL)

        BEGIN

            SET @search_query = @search_query + 'AND [id] = @id';

        END;

       

        EXEC sp_executesql @search_query, N' @name nvarchar(128), @id int',

            @name = @name, @id = @id;

    END;

 

You should explicitly parameterize the query, as shown here, to avoid the risk of SQL injection attacks. Note the EXECUTE AS OWNER used for this procedure, which may be appropriate to avoid the need to grant the end user SELECT permission on the base table(s). (Caution: always triple-check to ensure that the proc is not vulnerable to SQL injection before enabling EXECUTE AS OWNER.) Also note that the pattern is simplified by always passing all parameters to sp_executesql, even though the query may only make use of a subset of the parameters. I generally don't like to use dynamic SQL where I can avoid it, but I make an exception for this case; the dynamic WHERE clause is actually my preferred solution when I'm faced with this problem on SQL 2005 and earlier. 

 

If you're running SQL 2008 or later, there is a simpler solution that generally works well. More recent versions of SQL are better at predicting the effect of a predicate like "@p1 IS NULL" at compile time. You still have to worry about inappropriate plan reuse (getting a compiled plan that is good for the initial set of parameters but bad for subsequent parameters), though. That problem can be mitigated with an OPTION(RECOMPILE) query hint.  

 

 

 

    CREATE PROC find_customer

    @name nvarchar(128) = NULL,

        @id int = NULL

    AS

    BEGIN

        SELECT [id], [name], [address], [creation_date]

    FROM [customers]

        WHERE (@name IS NULL OR [name] = @name)

    AND (@id IS NULL OR [id] = @id)
OPTION(RECOMPILE)

;

    END;

 

This will generate a new plan on each execution that is optimized for that execution's set of parameters. The code remains very simple. The additional per-execution compile cost can generally be tolerated if the procedure execution rate is low -- say, less than a dozen executions per second. If the proc execution rate is higher than that, you'll need to compare the compile cost to execution cost (you can use SET STATISTICS TIME ON) to see whether the small extra per-execution compile cost is offset by the more consistent performance that you should get with the hint. 

 

I’ve seen the All-In-One search query cause perf problems in code written both inside and outside of Microsoft, and I’ve been burned by the problem in my own code. If you find yourself needing this type of query, my recommendation is to first evaluate OPTION(RECOMPILE), assuming you're on SQL 2008 or later. On SQL 2005 and earlier, or if you're on a more recent version but can't tolerate the per-execution compile cost that the RECOMPILE hint introduces, the dynamic SQL approach would be my next choice in most cases.

 

 

 

UPDATE: Erland Sommarskog has a page on this very topic here: https://www.sommarskog.se/dyn-search-2005.html. He calls it the problem of "dynamic search conditions" (a less awkward phrase than "All-In-One Search Query"). The page provides a very detailed discussion of the topic, and would be a great read if you really want to drill into the details. 

 

 

UPDATE #2: Another good read on a variation of this problem: https://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/ (and, to a lesser degree, https://sqlinthewild.co.za/index.php/2009/09/15/multiple-execution-paths/), both from Gail Shaw. Gail calls this class of query "Catch-All Queries". 

 

UPDATE #3: Another discussion of the optional parameter problem from Conor Cunningham: https://blogs.msdn.com/conor_cunningham_msft/archive/2010/04/22/conor-vs-optional-parameters.aspx

    FROM [customers]