Breaking Change in Linq Queries Using Explicitly-Typed Range Variables

There's a change coming in .NET Framework 3.5 Service Pack 1 that will affect some programs containing queries that explicitly specify the type of the range variable. The affected queries are those whose range variable type differs from the element type of the sequence being queried and the element type cannot be converted to the range variable type via reference conversion or boxing/unboxing conversion. Whew, that was a bunch of spec-speak. To help understand, consider the following query.

 var floats = new ArrayList { 2.5f, 3.5f, 4.5f };
var ints = from int i in floats
            select i;

Iterating over this query yields some surprising results, {2, 4, 4} . Why not {2, 3, 4} as one might expect? To see why this happens, let's start with the compiler's translation of this query into a series of method calls. The above query expression is rewritten into the following.

 var ints = floats.Cast<int>().Select<int,int>(i => i);

Follow the flow of type information through this query. The source sequence "floats" is an ArrayList implementing IEnumerable. Cast<int>() takes this sequence as IEnumerable and returns a sequence implementing IEnumerable<int> . Select<int,int>() acts upon that sequence and returns another sequence of IEnumerable<int> . Now look at the signature of Cast<T> .

 public static IEnumerable<T> Cast<T>(this IEnumerable source)

This method's purpose is to convert a non-generic IEnumerable sequence of some type T (or boxed T as the case may be) to IEnumerable<T> for use as an argument to the subsequent sequence operators which must know the compile-time type of the sequence elements. It sounds simple enough, and it should be, but due to a late-game foul up in development, it's not.

The body of Cast<T> should effectively have these semantics: roll through the sequence converting each element to the target type T, iterator style. Something like this.

 foreach (object obj in sourceSequence) yield return (T)obj;

Now, looking back at the original query, if Cast<T> were implemented with these semantics, a runtime exception would occur at (T)obj, the cast from boxed float to int. Can't do that. You have to convert from boxed float to float. Then you can convert to int.

But this isn't the shipping semantics of Cast<T> , and "magically" you can convert the sequence of boxed floats to a sequence of ints, you just get, uh, Banker's rounding as opposed to truncation. Banker's rounding (round to even) is not the C# user's expected behavior when converting float to int. I'm not sure it's anyone's expected semantics, but, sadly, it is what we shipped.

The fix

In .NET Framework 3.5 Service Pack 1 (SP1) we are going to return Cast<T> to its intended semantics described above. Not only is the current behavior not intuitive, it's slow as Christmas. But fixing this is obviously a breaking change. Once you get SP1 you may find that queries which once worked now throw exceptions. That's not great, but it's something that can easily be dealt with by developers - change the type of the range variable to match the collection element type and then, as necessary, add casts where the range variable is used.

But one important thing to understand about this change is that the breaking change is in the .NET Framework libraries, not the compiler. Cast<T> is a framework method. This means that if your application contains a problematic query and has been distributed to users, it will begin to throw when your user gets SP1.

Avoid the problem altogether by omitting the range variable type

The call to Cast<T> in the above query expression was introduced by the rewriter in response to the presence of an explicitly-typed range variable. That's how the syntactic rewrite rules are specified. But if you omit the "int" in "from int i" no call to Cast<T> is generated.

Specification of the type of the range variable is optional in the query syntax, but if you're using a collection that only implements IEnumerable, you've got to specify it. On the other hand, when using a collection that implements IEnumerable<T> you can, and should, omit the range variable type. Not only does it avoid this entire can of worms, but it has the performance benefit of omitting an unneeded iterator in the chain of iterators mentioned before.