More on Implicit Conversions

Yesterday, a reader posted a question asking me to comment on SQL Server's algorithm for choosing implicit conversions.  When SQL Server encounters an expression with mismatched types, it has two options.  It can execute the query with an implicit conversion or it can fail with an error.  Before digging into the implicit conversion scenario, let's briefly consider the error case.

If an implicit conversion is not possible, there are two possible errors that SQL Server can generate.  If the two data types are entirely incompatible (that is, if no conversion, implicit or explicit, is permitted between the two types), SQL Server generates the following error:

DECLARE @a INT
DECLARE @b DATE
SET @a = @b

Msg 206, Level 16, State 2, Line 3
Operand type clash: date is incompatible with int

An attempt to remedy this situation with an explicit conversion, fails with the following error:

DECLARE @a INT
DECLARE @b DATE
SET @a = CONVERT(INT,@b)

Msg 529, Level 16, State 2, Line 3
Explicit conversion from data type date to int is not allowed.

On the other hand, if the two data types are compatible, but require an explicit conversion, SQL Server generates the following error:

DECLARE @a INT
DECLARE @b DATETIME
SET @a = @b

Msg 257, Level 16, State 3, Line 3
Implicit conversion from data type datetime to int is not allowed. Use the CONVERT function to run this query.

Introducing an explicit convert, resolves the problem:

DECLARE @a INT
DECLARE @b DATETIME
SET @a = CONVERT(INT,@b)

This Books Online page includes a compatibility matrix showing how SQL Server handles all of the possible data type combinations.

If SQL Server introduces an implicit conversion into an expression, it must choose which side of the expression to convert and it must choose the target data type of the conversion.  In principle, there are three options for comparison expressions:

  1. Convert the right hand side to the type of the left hand side.
  2. Convert the left hand side to the type of the right hand side.
  3. Convert both sides to a third type.

For assignments, it only makes sense to convert the right hand side to the type of the left hand side.  If this conversion is not permitted, the assignment must fail.  It does not make sense to write:

DECLARE @a INT
DECLARE @b DATETIME
SET CONVERT(DATETIME,@a) = @b

Msg 156, Level 15, State 1, Line 3
Incorrect syntax near the keyword 'convert'.

In practice, SQL Server chooses one of the first two options.  Specifically, SQL Server ranks all types and converts from the lower ranked to the higher ranked type.  For example, since INT is lower ranked than DATETIME, the following query converts the INT variable to DATETIME:

DECLARE @a INT
DECLARE @b DATETIME
SELECT 0 WHERE @a = @b

  |--Compute Scalar(DEFINE:([Expr1000]=(0)))
       |--Filter(WHERE:(STARTUP EXPR(CONVERT_IMPLICIT(datetime,[@a],0) =[@b])))
            |--Constant Scan

This approach does have some drawbacks.  Specifically, while some conversions are lossless, others are lossy and some conversions fail outright.  For example, converting a SMALLINT to an INT is lossless.  The set of values that can be expressed by an INT is a superset of the set of values than can be expressed by a SMALLINT.  Converting an INT to a REAL is lossy.  There are large INT values that cannot be expressed with a REAL:

DECLARE @a INT
DECLARE @b REAL
DECLARE @c INT
SET @a = 1000000001
SET @b = CONVERT(REAL,@a)
SET @c = CONVERT(INT,@b)
SELECT @a, @b, @c

 a                    b             c
-------------------- ------------- --------------------
1000000001           1E+09         1000000000

Converting a REAL to an INT can be lossy (an INT cannot represent a fraction) and can fail outright:

DECLARE @a REAL
DECLARE @b INT
SET @a = 1e13
SET @b = CONVERT(INT,@a)

Msg 232, Level 16, State 3, Line 4
Arithmetic overflow error for type int, value = 9999999827968.000000.

The type ranking used by SQL Server tries to minimize the risk of lossy or failed conversions, but sometimes data loss or failures are unavoidable as in the case of REAL and INT.

Conversions between string types (e.g., CHAR or VARCHAR) and numeric types (e.g., INT, REAL, or DECIMAL) are especially problematic.  SQL Server ranks string types lower than numeric types and will always convert from a string type to a numeric type.  This conversion can easily fail.  Clearly, there are many strings that contain non-numeric data and cannot be converted.  However, converting in the reverse direction is arguably worse.  There are many ways to represent a numeric value as a string.  All of the following strings represent the same number: ‘1', ‘˽1', ‘1˽ ‘, ‘+1', ‘1.0', and ‘01'.  While SQL Server could certainly choose a canonical format for numeric to string conversions, without converting the string to a numeric type, there is no easy way to compare a string that is not already in a canonical format with a numeric value.  For example, consider the results of the following comparisons:

DECLARE @a INT
DECLARE @b CHAR(4)
SET @a = 1
SET @b = @a
SELECT @a AS a, @b AS b,
    CASE WHEN @a = '1'  THEN 'True' ELSE 'False' END AS [a = '1'],
    CASE WHEN @a = '+1' THEN 'True' ELSE 'False' END AS [a = '+1'],
    CASE WHEN @b = '1'  THEN 'True' ELSE 'False' END AS [b = '1'],
    CASE WHEN @b = '+1' THEN 'True' ELSE 'False' END AS [b = '+1']

 a           b    a = '1' a = '+1' b = '1' b = '+1'
----------- ---- ------- -------- ------- --------
1           1    True    True     True    False

Notice that, when compared as integers, 1 = +1 but, when compared as strings, ‘1' <> ‘+1'.

Returning to the three implicit conversion options above, SQL Server never implicitly converts both sides of an expression to a third type.  In some cases, it may be possible to explicitly convert two sides of an expression to a "super" type such that both conversions are lossless, but there are cases where no such "super" type exists (at least in the SQL Server type system).

At times these conversion rules and behaviors may be inconvenient or quirky.  In reality they evolved over many years (even decades) and many applications have been written based on these assumptions.  To make changes now would certainly cause many existing applications to fail.

In general, wherever possible, new code should avoid using expressions with mismatched types both to avoid the above quirks and error cases and to maximize performance.