Linq underneath the covers

I had known about the LINQ project for some time now but I hadn't really realized the true power that LINQ brings to .Net until I watched the PDC keynote where Anders and Don and the rest took a lap around LINQ and all the other cool new shiny stuff. What's not immediately evident is all the things that you can do with LINQ. One of the samples that ships with the techpreview has a Prolog

Being the nosey little parker I am, I *had* to find out how this thing worked on the inside, take it apart and look at its innards.

To side-track for a bit here, I see a lot of blogs talking about LINQ as if it is a C#-only thing - it isn't. It is a '.Net' thing and frankly, I'm thinking of calling myself a VB programmer now after looking at all the cool new things in Vb 9.0. Scott Swigart managed a scoop and has an awesome interview with Paul Vick and Amanda on Visual Basic and where it is headed.

 

Let's dig down and get dirty with LINQ. One disclaimer - I don't work on the C# team and this post comes from an evening spent looking for the IL generated for my newbie LINQ programs.

I'm going to take a modified version of the program that Don and Anders used in their keynote

var query = from p in Process.GetProcesses()
where p.Threads.Count > 5
orderby p.Threads.Count descending
select new { Name = p.ProcessName, Threads = p.Threads.Count }; 

            foreach(var proc in query)
Console.WriteLine(" {0} {1}", proc.Name, proc.Threads);

 

This spits out all processes which have greater than 5 threads and sorts them by threadcount. [1]

I disassembled this using Reflector. The IL generated (or rather, the amount of magic the compiler is doing) is enough to leave you mentally scarred if you're not Don Box, so I'll spare you the trauma and show you the equivalent C# code.

  if (Program.<>9__CachedAnonymousMethodDelegate7 == null)
{
Program.<>9__CachedAnonymousMethodDelegate7 = new Func<Process, bool>(Program.<Main>b__0);
}

      if (Program.<>9__CachedAnonymousMethodDelegate8 == null)
{
Program.<>9__CachedAnonymousMethodDelegate8 = new Func<Process, int>(Program.<Main>b__2);
}

      if (Program.<>9__CachedAnonymousMethodDelegate9 == null)
{
Program.<>9__CachedAnonymousMethodDelegate9 = new Func<Process, Program.<Projection>f__4>(Program.<Main>b__3);
}

      IEnumerable<Program.<Projection>f__4> enumerable1 =

Sequence.Select<Process, Program.<Projection>f__4>(
Sequence.OrderByDescending<Process, int>
             (Sequence.Where<Process>(Process.GetProcesses(), Program.<>9__CachedAnonymousMethodDelegate7) ,Program.<>9__CachedAnonymousMethodDelegate8),
Program.<>9__CachedAnonymousMethodDelegate9);

      using (IEnumerator<Program.<Projection>f__4> enumerator1 = ((IEnumerator<Program.<Projection>f__4>) enumerable1.GetEnumerator()))

      {
while (enumerator1.MoveNext())
{
Program.<Projection>f__4 f__1 = enumerator1.get_Current();
Console.WriteLine(" {0} {1}", f__1.Name, f__1.Threads);
}
}

 

Doesn't look very friendly, does it? But right of, you can notice several oddities. What are all those <Projection> and f__4s and b__2s that the compiler is sneaking in? There lies the true magic of Linq.

The C# compiler generating types for language features is not something new - in fact, that is how anonymous delegates are implemented in C# 2.0 (the gory details of which you can find out here).

First off, a few anonymous delegates are created. These correspond to the threads count check, the new object creation and the threads count property - and they're used inside the select query later.

We then run into the hero of the entire show - the Program.<Projection>f__4. This guy is nothing but an 'anonymous type'. When we said new {Name =...} , we created a new type without specifying the name of the type. This type has 2 properties - a Name and a ThreadsCount.

The first clause that executes is the 'where p.Threads.Count> 5'. In IL, this is actually represented through a compiler-generated static method of the Program class

   bool  <Main>b__0(Process p){       (p.Threads.Count > );}

(Sequence.Where<Process>(Process.GetProcesses(), Program.<>9__CachedAnonymousMethodDelegate7) actually executes this function (the cached anonymous delegates just wraps around the static method above. We then orderby descending using another static function which tells us *what* to order by (in this case, the threads.Count property).

We now have a list of processes which meet our criteria - we now need to spin up a new object of type Program.<Projection>f__4 for each Process object. This f_4 type has only 2 properties - Name and Threads. To package each Process object into a f__4 object, we use a compiler generated static function

 [CompilerGenerated]  Program.<Projection>f__4  <Main>b__3(Process p){      Program.<Projection>f__4 f__1 =  Program.<Projection>f__4();      f__1.Name = p.ProcessName;      f__1.Threads = p.Threads.Count;       f__1;}

Since Sequence.Select returns a handy-dandy enumerator, we can now use that to iterate over our f__4 objects and print them out to the console.

This just scratches the surface of what Linq can do and the contortions the C# compiler goes through to make this magic work. With lambda functions, anonymous types , type inference and a whole galaxy of mouth-watering features, querying is just one aspect of what's new in .Net land :-)

 

Notes:

1. In case you are curious, svchost had the most number of threads on my machine at 78