Conversion and Arithmetic Errors: Change between SQL Server 2000 and 2005

In this post from last week, I gave an example of a query with a conversion where the optimizer pushes the conversion below a join.  The result is that the conversion may be run on rows that do not join which could lead to avoidable failures.  I ran this query on SQL Server 2005.  After I published that post, a reader pointed out to me that my example query generates a different plan on SQL Server 2000:

CREATE TABLE T1 (A INT, B CHAR(8))
INSERT T1 VALUES (0, '0')
INSERT T1 VALUES (1, '1')
INSERT T1 VALUES (99, 'Error')

CREATE TABLE T2 (X INT)
INSERT T2 VALUES (1)
INSERT T2 VALUES (2)
INSERT T2 VALUES (3)
INSERT T2 VALUES (4)
INSERT T2 VALUES (5)

SELECT T1.A, CONVERT(INT, T1.B) AS B_INT
FROM T1 JOIN T2 ON T1.A = T2.X

Here is the SQL Server 2000 plan:

**  |--Compute Scalar(DEFINE:([Expr1004]=Convert([T1].[B])))
**       |--Hash Match(Inner Join, HASH:([T1].[A])=([T2].[X]), RESIDUAL:([T2].[X]=[T1].[A]))
            |--Table Scan(OBJECT:([T1]))
            |--Table Scan(OBJECT:([T2]))

And here is the SQL Server 2005 plan:

  |--Hash Match(Inner Join, HASH:([T1].[A])=([T2].[X]), RESIDUAL:([T2].[X]=[T1].[A]))
**       |--Compute Scalar(DEFINE:([Expr1008]=CONVERT(int,[T1].[B],0)))
**       |    |--Table Scan(OBJECT:([T1]))
       |--Table Scan(OBJECT:([T2]))

The result is that the SQL Server 2000 plan runs without any errors - the "bad" row is filtered out by the join - while the SQL Server 2005 plan fails when it tries to convert the "bad" row.  It turns out that this change in behavior is documented in this Books Online entry under the Transact-SQL section and "Expressions in queries" feature.  This entry states that queries that worked on SQL Server 2000 could fail on SQL Server 2005.  It also states that the change enables two benefits: "the ability to match indexes on computed columns" and "the prevention of redundant computation of expression results."  Let's investigate these two benefits.

To understand the first benefit ("the ability to match indexes on computed columns"), suppose we create a new table with a computed column that matches the conversion in our query and then create an index on that computed column:

CREATE TABLE T3 (A INT, B CHAR(8), C AS CONVERT(INT, B)
CREATE INDEX T3C ON T3 (C, A)

INSERT T3 VALUES (0, '0')
INSERT T3 VALUES (1, '1')

SELECT T3.A, CONVERT(INT, T3.B) AS B_INT
FROM T3 JOIN T2 ON T3.A = T2.X

This computed column is sometimes referred to as a "persisted" computed column because the index stores the actual values of the computed column.  For computed columns on complex or expensive expressions, using such an index could be faster than evaluating the expression anew.  Note that, because the computed column must be evaluated to maintain the index, we cannot insert a "bad" row into T3.  Any attempt to insert a "bad" row immediately fails with a conversion error.

I've deliberately included column A as part of the index so that the index covers all of the columns in the query.  If we omit column A from the index, SQL Server cannot use the index without performing a bookmark lookup.  In this particular example, that would prevent SQL Server from choosing the plan that I'm trying to demonstrate.  Also, note that, on SQL Server 2005 we could use the INCLUDE keyword instead of making column A an index key, but I wanted this example to work on both SQL Server 2000 and SQL Server 2005.

Now let's compare the plans using the new table.  First, the SQL Server 2000 plan:

  |--Compute Scalar(DEFINE:([Expr1004]=Convert([T3].[B])))
       |--Hash Match(Inner Join, HASH:([T3].[A])=([T2].[X]), RESIDUAL:([T2].[X]=[T3].[A]))
            |--Table Scan(OBJECT:([T3]))
            |--Table Scan(OBJECT:([T2]))

Compare that to the SQL Server 2005 plan:

  |--Hash Match(Inner Join, HASH:([T3].[A])=([T2].[X]), RESIDUAL:([T2].[X]=[T3].[A]))
       |--Compute Scalar(DEFINE:([T3].[C]=[T3].[C]))
**       |    |--Index Scan(OBJECT:([T3].[T3C]))
**       |--Table Scan(OBJECT:([T2]))

Notice that SQL Server 2000 uses the same plan as we saw above while SQL Server 2005 uses the new index and eliminates the conversion.  This type of index matching is not limited to conversions and works for many other expressions and computed columns.

I researched the second benefit ("the prevention of redundant computation of expression results") but I'm unable to come up with a compelling explanation for it.  In some cases, it is certainly true that pushing the conversion below a join reduces the number of times that the conversion is evaluated.  For instance, returning to the original example above, suppose we add many duplicate values to T2 such that each T1 row joins multiple times.  In the SQL Server 2005 plan, the conversion is evaluated before the join and so it is evaluated exactly once for each T1 row regardless of how many times each row joins.  In the SQL Server 2000 plan, the conversion is evaluated after the join and so it is evaluated multiple times for each T1 row - once for each time that the row joins.  Of course, it is also possible that none of the rows from T1 join.  In this case, the SQL Server 2005 plan unnecessarily evaluates the conversion on every T1 row while the SQL Server 2000 plan does not evaluate the conversion at all.

Moreover, you may recall from my last post, that SQL Server defers the execution of most scalar expressions until they are actually evaluated.  I used the following nested loops join example to illustrate this point:

SELECT T1.A, CONVERT(INT, T1.B) AS B_INT
FROM T1 JOIN T2 ON T1.A = T2.X
OPTION (LOOP JOIN)

  |--Nested Loops(Inner Join, WHERE:([T2].[X]=[T1].[A]))
**       |--Compute Scalar(DEFINE:([Expr1008]=CONVERT(int,[T1].[B],0)))
**       |    |--Table Scan(OBJECT:([T1]))
       |--Table Scan(OBJECT:([T2]))

In this example, the compute scalar is below the join in the plan but the conversion is still executed after the join.  Thus, pushing the conversion below the join does not change the number of times that it is executed.