T-SQL Anti-pattern of the day: ‘all-in-one’ queries


Scenario

A common requirement for enquiry queries on an OLTP database is to have search criteria which are very specific (‘get me details for for OrderID = NNNN’) and also the occasional reports which ask for all the orders (‘get me all the orders, no questions asked’.) Here is a sample from AdventureWorks which illustrates the problem:

CREATE PROCEDURE RptOrder(@OrderID int)
AS
BEGIN
    SELECT *
    FROM Sales.SalesOrderHeader
    WHERE (SalesOrderID = @OrderID OR @OrderID IS NULL)
END

What is the meaning of the underlined predicate in the above WHERE clause? It is actually a ‘special case’ where the developer intends to get back all the rows, regardless of the OrderID. This ‘special case’ is triggered by passing in a value of NULL for the @OrderID parameter.

Problem

So while this construct looks good in theory, it lends itself to very poor performance. Take a look at the 2 cases where this procedure is executed.

Case A: with specific OrderID

EXEC RptOrder 43672

Case B: asking for all records

EXEC RptOrder NULL

The plan, it turns out, is the same for both cases and a scan is used! This is despite a seekable index being present on SalesOrderID column for the SalesOrderHeader table:

image

The reason the optimizer chooses to scan the SalesOrderHeader (in this case it chooses a non-clustered index scan) is because it has no way to determine at compile and optimization time, as to what the specific value of @OrderID would be. Hence it has no way to ‘fold’ the (@OrderID IS NULL) expression and therefore has no option but to look at all the records.

Workarounds

‘IF-ELSE’ Workaround: The straightforward workaround in simple cases like the one above is to separate out the 2 cases into an IF-ELSE block:

ALTER PROCEDURE RptOrder(@OrderID int)
AS
BEGIN
    IF (@OrderID IS NOT NULL)
    BEGIN
        SELECT *
        FROM Sales.SalesOrderHeader
        WHERE (SalesOrderID = @OrderID)
    END
    ELSE
    BEGIN
        SELECT *
        FROM Sales.SalesOrderHeader
    END
END

Now, the 2 test cases work as expected. Here are the execution plans:

EXEC RptOrder 43672

image

EXEC RptOrder NULL

image

Dynamic SQL Workaround: However, as the number of predicates in the WHERE clause increase, and if all those predicates (or most of them) have such ‘catch-all’ handling then the IF – ELSE construct becomes unviable. In those cases, a dynamic SQL construct should be considered. Of course, when dealing with dynamic SQL, we must consider security first, including the possibility of SQL Injection and also the Execution Context of the dynamic SQL statement. But that is a topic for another post. Right now, here is how we could handle something like that:

— NOTE: This code is highly simplified and does not provide for any screening

— or protection against SQL injection!!! Provided as-is, confers no warranties.

ALTER PROCEDURE RptOrder(@OrderID int)
AS
BEGIN
    DECLARE @sDynamicSQL nvarchar(4000)
    SELECT @sDynamicSQL = ‘SELECT * FROM Sales.SalesOrderHeader ‘

    IF (@OrderID IS NOT NULL)
    BEGIN
        SELECT @sDynamicSQL = @sDynamicSQL + ‘ WHERE (SalesOrderID = @OrderID)’
    END

    EXEC sp_executesql @sDynamicSQL, N’@OrderID int’, @OrderID = @OrderID
END

Different Code Paths: The cleanest way of course is to consider having separate procedures for each kind of query. For example we can have a procedure called RptSpecificOrder for the case where we are searching by specific OrderID, and another one called RptAllOrders for the ‘get-me-everything’ case. This does have the advantage of clean isolation, but it does not scale easily when the number of predicates are larger. But is does also have the advantage that if we are querying for specific orders 99% of the time, that code path is simplified and optimized accordingly.

Conclusion

Beware of this T-SQL anti-pattern as it is one of the most common ones we see and it does have a huge (negative) impact on query performance. As you can see, if they are not done with these patterns in mind, application design and reporting requirements can have a detrimental effect on OLTP query execution. Separating reporting and OLTP workloads could be the key to solving these kinds of issues. But if separation is not possible, then clever use of separate code paths and stored procedures could help ensure that the most efficient execution plan is selected for each case. For complex queries, dynamic SQL may offer the simplest way out, but due care has to be taken to ensure that permissions and SQL injection issues are kept in mind when dealing with dynamic SQL statements.

Comments (10)

  1. Scott says:

    What kind of performance would using a coalesce produce? For example:

       SELECT *

       FROM Sales.SalesOrderHeader

       WHERE (SalesOrderID =

          COALESCE(@OrderID,[SalesOrderID])

  2. Thanks Scott

    The question you raise is very natural in the above scenario. While the expression you suggest is functionally correct, it suffers from the very same problem but for a different reason: the presence of the COALESCE function in the predicate causes the optimizer to again choose a scan.

  3. andyclap says:

    I don’t use sprocs myself for this sort of thing, but I’m surprised you’ve not mentioned WITH RECOMPILE – is the performance impact of recalculating the plan each time significantly greater than using dynamic SQL?

  4. Thank you for your question and comment, andyclap. I can confirm that for SQL 2008, using OPTION(RECOMPILE) does seem to alleviate this issue. I will update my post in due course after some more research.

  5. Manish Kamath says:

    Hi Arvind,

    In our application we have multiple conditions. so how can we handle such cases…?

       SELECT *

       FROM Sales.SalesOrderHeader

       WHERE (SalesOrderID = @OrderID OR @OrderID IS NULL)

       AND (SalesOrderCode = @OrderCd OR @OrderCd IS NULL)

       AND (SalesOrderName = @OrderNm OR @OrderNm IS NULL)

       AND (SalesOrderQty = @OrderQty OR @OrderQty IS NULL)

  6. Hi Manish

    I believe in your case the efficient method might just be to consider dynamic SQL OR evaluate the usage of OPTION (RECOMPILE) if you are using SQL 2008.

  7. What about the use of the ANSI option: SET ANSI_NULLS OFF ?

    What are the performance issue of such setting?

    When set to off, it is now possible to use SalesOrderID = @OrderId regardless if @OrderId is null or not.

    On my side, I tested this feature on a uniqueidentifier column that allowed a null value (don’t ask why) and it work.

    I only set this setting on the connection context and not database wide.

  8. ANSI_NULLS being set to OFF is a deprecated setting. In future it will ALWAYS be ON and hence you should not rely on it. Please see technet.microsoft.com/…/ms143729.aspx which states:

    ANSI_NULLS, ANSI_PADDING and CONCAT_NULLS_YIELDS_NULL will always be set to ON.

  9. mahesh says:

    Thanks Arvind for the great article, is that possible for you to share the DMV to find same pattern from whole database?
    your response will be greatly appreciated..

    1. Hi Mahesh – you can’t really use a DMV (at least nothing that exists today – June 2016) to find out all instances of this pattern from code. You could however use a parser based approach to programmatically uncover these using a static code analysis approach. You can refer to https://blogs.msdn.microsoft.com/arvindsh/2013/10/30/sql-pass-appdev-virtual-chapter-meeting/ which can get you started on this path.

Skip to main content