Connect and Implicit Casts

Hi Folks,

Perhaps it is surprising, based on the content here, that I'm not a full-time spatial head here in SQL Server land.  (We leave that job to Ed.)  I have, of course, been spending a lot of time on spatial, but it turns out I have other responsibilities as well, particularly around the broader type system.  One of my jobs is to take a look at issues that come in through Connect that have to do with the type system and make sure they get handled appropriately.

(If you don't use Connect, please do!  Issues filed there funnel directly into our bug-tracking system, giving you a direct line to the engineering team.  It is the place to go with that pesky bug or feature request.)

These Connect issues are an interesting mix. some good, solid bugs, and some not bugs at all.  It's these latter ones I find most interesting, since the majority of them point out quirks in our behavior that, while correct (and often even documented) are still somewhat confusing.  I'm going to try to start writing a bit more about common issues here in the hopes of clearing at least a little of that confusion.

Which brings us to implicit casts.  An implicit cast is a cast that happens without any explicit direction from the user.  If you use CAST or CONVERT, you're using an explicit cast, and that's not what we're talking about here.  We're talking about code like this:

 SELECT '1'
UNION
SELECT 37

Since the UNION operator has to produce a consistent result, the types of each of its inputs must be the same.  Of course, that's not the case here: the first SELECT yields a varchar, the second an int.  Faced with this, the system inserts an implicit conversion, attempting to coerce the varchar to an int.

But why convert the varchar to an int and not the other way around?  After all, it could convert 37 to '37' instead.  First, we have to understand that while this case is simple---the values are literals---this need not be the case. and so it may be very hard for the system to determine up-front what the actual values are.  E.g.:

 SELECT T.a -- a varchar column
UNION
SELECT U.b -- an int column

Or worse:

 SELECT M.a FROM -- some horribly complex query returning a varchar
UNION
SELECT N.b FROM -- another equally horrible query returning an int

In order to base this decision on the actual data, SQL Server would have to accumulate each side of the UNION, inspect all of the values, and hope to find a type that each of the values could be coerced to.  Perhaps worse, if the type depended on the actual data retrieved, then it could be very hard to determine what the result type of the query would be; it could even change if the data were updated.

So SQL Server doesn't do this.  Instead of basing the conversion on the data itself, it bases it on the type of the data.  So, when the server looks at any of these examples, all it really sees is:

 SELECT <varchar>
UNION
SELECT <int>

The server then decides which one to convert, in this case it converts the varchar to an int.  How does it make this choice?  It simply has a precedence list of types, which we can see in the Data Type Precedence topic of Books Online.  The abbreviated version is:

1. user-defined data types (highest)
2. sql_variant
3. xml
...
16. int
...
27. varchar
...

Int lands higher than varchar, and so the varchar is converted to an int.  The actual data returned by each select is not used at all in making this decision, only the types.  The system can make this decision with relatively simple logic, and the type of the result is consistent as well.

This brings us to our pop-quiz.  What will this do?

 SELECT 'a'
UNION
SELECT 37

 

 

 

I'll give you a minute...

 

 

 

Got it?

 

 

 

Okay, here's the result:

 Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the varchar value 'a' to data type int.

Make sense?

Cheers,
-Isaac