.NET Framework 3.5 SP1: LINQ perf improvements (LINQ to Objects and LINQ to SQL)

There are three perf improvements in the just released SP1. As always, I will let you run your own microbenchmarks or more meaningful app-level benchmarks.

LINQ to Objects:

 

Specialized enumerable: The new implementation recognizes queries that apply Where and/or Select to arrays or List<T>s and fold pipelines of multiple enumerable objects into single specialized enumerables. This produces substantial improvement in base overhead of common LINQ to Objects queries (at times 30+%).

Cast<T> breaking change: This is a bug fix and a breaking change (see this post for background). The intended use of the NET FX 3.5 Cast<T> extension method is querying over non-generic collection types, whose elements require either a reference conversion or an unboxing step to be used in a generic query context. A late change VS 2008 cycle allowed the cast to succeed in more situations than intended, such as converting float values to int, where it should instead be throwing an InvalidCastException. The breaking change reverts the beta2 behavior and improves perf by simplifying the implementation of CastIterator<T>. Value conversions and explicitly-defined user conversions cause an InvalidCastException instead of being allowed (as in RTM).

var stringList = new ArrayList { "foo", "bar" };
var intList = new ArrayList { 3, 4, 5 };

var strings = from string s in stringList
              select s;

var ints = from int i in intList
           select i;

The above queries compile to

var strings = stringList.Cast<string>();
var ints = intList.Cast<int>();

You can imagine a simplified implementation

static IEnumerable<T> CastIterator<T>(IEnumerable source)
{
   foreach (object obj in source) yield return (T)obj;
}

LINQ to SQL:

 

This too is a bug fix. The original intent was to optimize id-based queries that are expected to return singletons. If an entity with a matching key value is already in the DataContext identity cache, then translating the query to SQL and executing it against the database is a pure waste of time since the retrieved row is promptly thrown away to avoid stomping on user's existing object. Now that bug has been fixed. So an id-based query will not cause a trip to the database. This results in a dramatic perf improvement (one hash table lookup instead of SQL translation + SQL query execution) in an admittedly narrow but common scenario.

BTW, as mentioned in a previous post, I haven't worked on either component for SP1. But I have been deeply involved in them for 3.5 RTM so I can't resist tracking such sweet changes. Besides, I am working on a LINQ book that keeps me very involved with the components.

Dinesh