So what's the deal with this whole C# 3.0 / Linq thingy? (Part 2)

In the last post i  discussed a little bit of background on why we wanted to introduce Linq, as well as a bit of info on what some basic C# Linq looked like.  In this post i'm going to dive in a little bit deeper to some other interesting things we're introducing as well

Here's the current example we've been using to drive the discussion along

         Customer[] customers = GetCustomers();
        var custs = customers.Where(c => c.City == "Seattle").Select(c => c.Name);

Now, so far that's a very C#-centric way to do queries over data.  However, it's still a little bit heavyweight.  What about a more query-like syntax to do the same that's far more convenient?  Well, it turns out htat we have that as well:

    var q = 
      from c in customers
      where c.City == "Seattle"
      select c.Name;

This new query syntax is in fact just syntactic sugar that uses patterns to transform itself into the *exact* same C# query that i listed above.  In fact, this is the same way that we handle foreach (specifically by transforming it into a loop with calls to MoveNext, Current, Dispose).

Now, when looking at this you'll almost certainly notice how it looks *almost*, but not quite like SQL.  And, you'll probably be asking: "can't you just make it look like SQL if it's that close!  Is this just MS wanting to be a pain just for the heck of it??"  In this case, the answer is "No".  One of hte problems with the straight SQL like approach is that we'd have to put the "select" first.  "Ok... what's wrong with that" you say.   Well, let's take a look:

    var q = 
      select c<dot>

Now, at this point, you're constructing the final shape for this query.  You know you want to write "c.Name" and you'd like to utilize handy features like IntelliSense to help speed you up with typing that.  But you can't!  Because you haven't even stated where your data is coming from, there's no way to understand what's going on this early in the expression.  This is because in SQL the scope of a variable actually flows backwards.  i.e. you use variables before you've even declared this.  However, in C# you can only use something after it's been declared.  So in order to better fit within this model (which has some very nice benefits), we made it so that from has to come first.  Beyond statement completion there are also issues of being able to construct large hierarchical queries in an understandable way.  Having the scope flow from left to right, top to bottom, makes that much simpler and brings a lot of clarity to your expressions.

Now what about projections?  They're incredibly common operations in SQL.  You're aways doing things like "select a, b, c" and in essence projection out the information you care about into these columns.  So how would we go about doing this sort of thing in C# 3.0?  Well, you could do this:

    var q = 
      from c in customers
      where c.City == "Seattle"
      select new NameAndAge(c.Name, c.Age);

but that's a real pain.  Any time i want to project any information out, i need to generate a new type and fill all it's gunk in.  That means writing the class somewhere.  Creating a constructor for it.  Creating fields and properties.  Implementing .Equals and .GetHashCode.  etc. etc.  yech.  Far too much work, error prone and causes API clutter.  So what can we do to alleviate that?  Well, in C# 3.0 a new feature called "Anonymous Types" comes to the rescue.  We can now write the following:

    var q = 
      from c in customers
      where c.City == "Seattle"
      select new { c.Name, c.Age };

What this is doing is projecting the customer out into a new structural type with two properties "Name" and "Age", both of which are strongly typed and which have been assigned the values of their corresponding properties in "c".  What's the type of Q at this point?  Well, it's an IEnumerable<???> where ??? is some anonymous type with those two properties on it.  BTW, it should now seem somewhat more obvious why the "var" keyword was added to the language.  In this case you cannot actually write down the type of "q", but you need some way to declare it.  "var" comes to the rescue here.

So i could now write:

    foreach (var c in q) {
      Console.WriteLine(c.Name);
   }

and that would compile and run just file.

Now "wait a minute!" you're saying.  "Is this some sort of late-binding thang where we're using refelction to pull out this data?"  No sir-ee.  In fact, if you were to try and write:

    foreach (var c in q) {
      Console.WriteLine(c.Company);
   }

then you would get a compiler error immediate.  Why?  Well, the compiler knows that the anonymous type which you've instantiated only has two members on it (Name and Age), and it's able to flow that information into the type signature of 'q'.  Then when foreach'ing over 'q', it knows that the type of 'c' is the same structural anonymous type we created earlier in the 'select'.  So it will know that it has no "Company" property and appropriately inform you that your code is bogus.  All the strong, static typeing of C# is there.  You are just allowed to exchew writing the type now and instead allow inference to to take care of all of it for you.  Users of languages like OCaml will find this immeditely familiar and comfortable.

Now, one thing that's quite common in the object world is the usage and manipulation of hierarchical data.  i.e. objects formed by collection of other objects formed by collections of... you get the idea.  Now, say you wanted to query your customers to get not only the customer name, but information about the orders they've been creating.  You could write the following very SQL-esque query:

    var q = 
      from c in customers
      where c.City == "Seattle"
      from o in c.Orders 
      where o.Cost > 1000
      select new { c.Name, o.Cost, o.Date };

We've now joined the customer with their own orders.  This would get the job done, but maybe it's not really returning the information in the structure you want.  For one thing, the data isn't grouped by customer.  So for every order made by the same customer you're going to get a new element.  So let's take it a little further:

    var q = 
      from c in customers
      where c.City == "Seattle"
      select new {
          c.Name,
          Orders = 
              from o in c.Orders
              where o.Cost > 1000
              select new { o.Cost, o.Date }
      };

Voila.  We've now created a hierarchical result.  Now, per customer you'll only get one item returned.  And that item will have information about all the different orders they've made that fit your criteria.  Now you can trivially create queries that get you the results you want in the exact shape you want.

Next up!  Drill downs into many of the specific new features that we're bringing to the table.

But first: a teaser!  Say you have the following code:

    var customers = GetCustomersFromDataBase();
   var q = 
      from c in customers
      where c.City == "Seattle"
      select new {
          c.Name,
          Orders = 
              from o in c.Orders
              where o.Cost > 1000
              select new { o.Cost, o.Date }
      };

   foreach (var c in q) {
       //Do something with c
   }

Did you know that you will be able to write that code in C# 3.0 and DLinq will make sure that that query executes on the DB using SQL?  It will then only suck down the results that matched the query, and only when you foreach over them.   That's right.  That entire "from ... " expression will execure server side.  And it didn't need to even be in "from" form.  If you'd written it as "customers.Where(c => c.City == "Seattle").Select(c => c.Name)" then the same  would be true.  How's that for cool.  Stay tuned and a later post will tell you how that all works!