So what’s the deal with this whole C# 3.0 / Linq thingy?


I’ve been mulling over the best way to talk about the new C# 3.0 stuff we’ve been working on.  I presented the post on how you could use the new C# 3.0 features to go beyond the basic query functionality we’ve been targetting it at.  The was to help give an appreciation about how we’ve added strong query support through the addition of several new smaller features that can be used for more than query (although that’s the formost area that we’re trying to attack).  However, i then realized that it was somewhat interesting that i would present the post on “what *else* you can do with C# 3.0” before anyone even had idea of what you “could” did with C# 3.0 first.


I could do a fairly detailed drill down of the new C# features, but i actually thought a more holistic approach would be better in this case.  So i’m actually going to talk about the general problem space we’re confronting, and i’ll try to provide some running examples to help carry me through this.


So what is Linq?  Well, Linq is the culmination of a number of techniques we’re producing to help deal with the large disconnect between data programming and general purpose programming languages. Linq stands for Language INtegrated Query, and simply put, it’s about taking query, set operations and transforms and making them first class concepts in the .Net world.  This means making them available in the CLR, in .Net programming languages, and in the APIs that you’re going to be using to program against data in the future.  Through all this you can get a completely unified query experience against objects, XML, and relational data.  i.e. the most common forms of data that will appear in your application.  And, what’s best, if you happen to have your own form of data that doesn’t fit into those different models, then you can use our extensible system to target that model as well.  After all, our XML and relational data access models (called XLinq and DLinq respectively) are just APIs built on top of the core Linq infrastructure.  As such, i’m not going to dive too deeply into those specific models.  I’m going to let the individual teams who are responsible for that (and who know those APIs far more intimately) to give you all the information at their disposal.


So, let’s first talk about data access today and how our new approach most likely differs from that you’ve been used to.  If you’re accessing a database somewhere in your application, then there’s a good chance that you’ve embedded some bit of SQL somewhere.  Maybe you’ve kept it fairly clean and abstracted away, or maybe you have SqlCommand’s left rigth and center all with their own “select *”‘s or other raw SQL commands stored hither.  Of course, when writing this code you had no compile time checking that your SQL strings were well formed, no IntelliSense, etc.  Because, effectively, you are using two completely different languages in an environment that only understands one.  This is pretty bad, but really only begins to scratch the surface of the deep mismatch between this relational data domain and the object domain.


Through and through you have mismatches between objects and relational data and XML in your system.  Different types.  Different operations.  Different programming models.  Your code which works on XML won’t work on relational data.  You code which works on relational won’t work on objects. etc.  But there’s a better way.  Now we can allow you to work with all these different data systems right within C# (or VB).  This means using the same syntax, the same types, and the same programm ing models to query and manipulate all these different forms of data in a unified manner.  And, because support for these models has been built on top of an extensible system, it means that if necessary you can do the same as what we’ve done to bring this strong query support anywhere you need to it go where we don’t currently have an offerring.


To ground this discussion a little, let’s start looking at a simple example of C# 3.0/Linq in action.  (Note: this example might look very familiar.  That’s because many demos and examples are made to run against the Northwind DB.  This allows us to all talk about the same thing and have consistent and clear names for entities).   You start with a simple list of Customers:

        Customer[] customers = GetCustomers();

Nothing magic going on here.  Nothing up my sleaves.   Just a regular .Net array initialized from some source.  Now, to make things a little simpler (especially for later examples) we can then write that as:

        Customer[] customers = GetCustomers();
var custs = customers;

What’s going on in that second line? Well, “var” is are way of introducing “local variable type inference”.  It’s a new C# 3.0 feature that allows you to save space by not writing the type of a local variable, while also having the type inferred from the expression that initializes the variable.  So, in the above code, “custs” is known at compile time to be a “Customer[]”.  If you were to write:

        var i = 10;
var b = true;
var s = “hello”;

then it would be the *exact* same as writing:

        int    i = 10;
bool b = true;
string s = “hello”;

We’ll see later on why this can be quite a handy thing.  Now, let’s extend our code a bit further to start querying that array of customers:

        Customer[] customers = GetCustomers();
var custs = customers.Where(c => c.City == “Seattle”);

Here we’re simplying querying all our customers for the set of customers that are from Seattle.  And “custs” will be an IEnumerable<Customer>.  We can even carry that a little further in to the following query:

        Customer[] customers = GetCustomers();
var custs = customers.Where(c => c.City == “Seattle”).Select(c => c.Name);

Here we’re projecting out the name of all our customers from Seattle.   So custs will be an IEnumerable<string>.  Now, what the heck is this code.  This isn’t your daddy’s C# anymore.  What are those funky arrows?  And where did the “Where” and “Select” methods come from??  They’re certainly don’t seem to be defined on array type when i look at it in ILDasm!  Well, to answer the first question, the funky => arrow the new C# 3.0 syntax that allows you to create a lambda expression. You can think of a lambda expression as a natural evolution of the anonymous methods introduced in C# 2.0.  Lambda expressions benefit from simpler syntax and the ability to use inference.  So now you can write:

        c => c.City == “Seattle”  //instead of
        delegate (Customer c) { return c.City == “Seattle”; }

As you can see, the C# 2.0 method just drowns you in syntax and it makes it a rather poor choice to use in queries (heck! there’s a 2x increase in query size between the two).  However, the new C# lambda expression succitly encapsulates the test we want to perform, with only about 5 characters overhead.


That answers the first question, but what about the second?  Where, oh where did “Where” come from?  This is an example of another new C# 3.0 feature we call “extension methods”.  Extensions are a way to allow you to add operations to existing types that aren’t under your control.  While that may give you the heebie-jeebies, rest assured, you’re not actually modifying the actual type.  Rather, you’re being allowed to use succint syntax to in effect execute a method as if it existed on this type.  Specifically, extension methods are static methods that look like so:

namespace System.Query {
public static class Sequence {
public static IEnumerable<T> Where<T>(this IEnumerable<T> e, Predicate<T> p) {
foreach (T t in e) {
if (p(t)) {
yield return t;
}
}
}
}
}

This declares an “extension method” on the IEnumerable<T> type.  When you import the namespace by writing “using System.Query”, you now gain the ability to call teh “Where” method on anything that implements IEnumerable<T> (like Arrays).  With these extension methods we can now compose powerful query functions together to manipulate data easy.


So at this point we’ve seen three new C# 3.0 features that can be used together to build a powerful base for querying objects.  In future posts i’ll include information about the rest of the new language features, and i’ll give a more comprehensive view of how sophisticated our query support is.


Comments (59)

  1. Joe says:

    I think the general idea is fantastic … I think as a general mechanism to replace OR systems, particularly when digesting the problem domain of business objects and logic, it is missing some key elements … and unfortunately a design change is required to fix (kinda why the XmlSerializer class is effectively useless).

    I’ll put it up on my blog and post back.

  2. damien morton says:

    Dammit – youve done it again.

    I havent even started coding in C# 2.0 and Im getting all excited about C# 3.0.

    What kind of broad-strokes timeframe is C# 3.0 seen as being released in? Are we talking 1, 2 or 3 years here?

  3. Sam says:

    *sniff* I was so looking forward to c# 2.0, and now you just made it look stale in comparison.

    You big ol’ meanie, you!

  4. Dagy says:

    I’m not an ruby specialist but have been working with it about one week and this whole thing seems to me very similar like thing works in ruby world.

    try this

    http://www.rubyonrails.com

    especialy ActionRecord stuff

    bye

  5. I must confess I am very impressed by this. It’s about time something like this was introduced, it’ll make life a lot easier on those of us who do a lot of work with databases.

  6. kfarmer says:

    Hey, think you could add ForEach() to IEnumerable<T>? It’s currently limited to List.

    This would cover the case of:

    var catalog = GetBooks();

    var foo = from catalog select book;

    foo.ForEach(delegate(Book book) { Console.WriteLine(book); } );

    .. which I currently have written as an extension method.

    Of course, fixing the lambda-with-statment would help to:

    foo.ForEach(book => Console.WriteLine(book));

  7. Gabe says:

    namespace System.Query {

    public static class Sequence {

    public static void ForEach<T>(this IEnumerable<T> e, Func<T> f) {

    foreach (T t in e) {

    f(t);

    }

    }

    }

    }

  8. chris says:

    So will Linq also work with SQL Server?

    Will GetCustomers() return my whole customer table or will it ‘magically’ select only those I specify in my where clause?

  9. Mabsterama says:

    Ok, all the developers (myself included) out there are totally going bananas over the revelations from…

  10. Ryan Heath says:

    class System.Query.Sequence is not mentioned in the code

    Customer[] customers = GetCustomers();

    var custs = customers.Where(c => c.City == "Seattle");

    What if there was another class with an extension method

    public static IEnumerable<T> Where<T>(this IEnumerable<T> e, Predicate<T> p);

    How would we specify which extension method is applicable?

    // Ryan

  11. Minh says:

    Will LINQ support querying something besides IEnumerable in the future? How would I tell LINQ to query a Binary Space Partition? (requiring LINQ to know about the data container)

    Also, in the NorthWind example (Channel 9 Anders’ video @ http://channel9.msdn.com/Showpost.aspx?postid=114680 is LINQ doing a full table scan every time? Or does it take advantage of existing indexes? But you’re not duplicating the SQL Server engine inside the CLR … are you?

    Also, will LINQ warns me of malform cross joins? Where I’m about to return 200 billion rows? Who of us haven’t done that, right? Am I right? Hello?

  12. As you’ve probably already heard, at long last we’ve announced the new features that we’re planning…

  13. CyrusN says:

    kfarmer: "Hey, think you could add ForEach() to IEnumerable<T>? It’s currently limited to List. "

    Gabe hit it right on the head with:

    namespace System.Query {

    …public static class Sequence {

    ……public static void ForEach<T>(this IEnumerable<T> e, Func<T> f) {

    ………foreach (T t in e) {

    …………f(t);

    ………}

    ……}

    …}

    }

  14. CyrusN says:

    chris: "So will Linq also work with SQL Server? "

    Absolutely. That’s what Dlinq is all about.

    "Will GetCustomers() return my whole customer table or will it ‘magically’ select only those I specify in my where clause?"

    Read more about DLinq. THere’s no magic involved… but rather a cool system where Expression<T> trees are remoted to your DB and executed there. So customers will only pull down the results of the query that’s executed server side. And it will only materialize them when you foreach. So you can build up your query to something huge, and have no cost client side for it. No need to pull down anything extra… you get the idea :)

  15. kfarmer says:

    Reread my original comment — I *already* created that same extension method. My feedback was to actually bake it in.

  16. CyrusN says:

    Ryan: "class System.Query.Sequence is not mentioned in the code "

    My bad. I’ll update the code accordingly.

    "Customer[] customers = GetCustomers();

    var custs = customers.Where(c => c.City == "Seattle");

    What if there was another class with an extension method

    public static IEnumerable<T> Where<T>(this IEnumerable<T> e, Predicate<T> p);

    How would we specify which extension method is applicable? "

    Then there would be a conflict if you called .Where (just like a regular overload conflict). HOwever, you could always specify each method with with it’s full name. i.e.: Sequence.Where(customers, c => c.City == "Seattle")

  17. kfarmer says:

    Ryan:

    You must explicitly import the namespace the extension method is defined in. Even if it’s defined in the current namespace.

    Yeah, that last part is a bit silly, but these are just the preview bits.

    Cyrus:

    Is there a real feedback place?

  18. CyrusN says:

    Minh: "Will LINQ support querying something besides IEnumerable in the future? How would I tell LINQ to query a Binary Space Partition? (requiring LINQ to know about the data container) "

    You need to look into all the DLinq stuff that’s happening :)

    And send feedback if it’s not going to meet your needs. I’m less savvy about DLinq as it just builds on top of the basic query work that we’re doing.

    "Also, in the NorthWind example (Channel 9 Anders’ video @ http://channel9.msdn.com/Showpost.aspx?postid=114680 is LINQ doing a full table scan every time?"

    No. DLinq is doing nothign but providin the infrastructure to remote queries over to the server. THen only when iterated by the client are results materialized into the CLR world. Then when changes are made, they’re local changes that can be sent back to your DB with commits.

    " Or does it take advantage of existing indexes? But you’re not duplicating the SQL Server engine inside the CLR … are you? "

    Not at all (duplicating SQL server that is). All queries run server side and naturally take advantage of whatever indicies you have.

    "Also, will LINQ warns me of malform cross joins? Where I’m about to return 200 billion rows? Who of us haven’t done that, right? Am I right? Hello? "

    Not sure about that. HOpefully you’ll test your stuff first and realize that you’ve done something really bad like that :)

  19. kfarmer says:

    Minh:

    I was worried about this as well, but figure they had a solution.

    So to add to what Cyrus points out, the DLinq spec (see msdn) says that, whenever possible, it will attempt to build the database command for you. That is, it’ll translate:

    from book in books

    where book.Title == "Foo"

    select book.Author.Name

    into

    select author.Name

    from books, author

    where books.AuthorId = author.AuthorId

    and books.Title = ‘Foo’

    .. and send this to the server. Of course, there are limits to what it can do; check the spec.

  20. Sean M says:

    If I download the C# LINQ preview, will that give me Extension Methods in Visual Studio 2005?

    What time frame are we talking before Orcas is ready?

    Go!

  21. Sean Chase says:

    Hey Cyrus, a great post in the future would be some details on using lamdas in C#. I get your example, but not having much a LISP background, I end up having to re-read the code a few times to get my arms around it. :)

    Cool stuff!

  22. CyrusN says:

    Sean M: Yup, the linq preview comes with a little installer for VS 2005 to give it *alpha* support for this stuff! Cheers!

    No idea on Orcas though :) (check the net though, i’m sure someone must have said something already).

  23. CyrusN says:

    Sean Chase: "Hey Cyrus, a great post in the future would be some details on using lamdas in C#. I get your example, but not having much a LISP background, I end up having to re-read the code a few times to get my arms around it. :)

    Cool stuff! "

    Absolutely! I plan on blogging a lot about this. :)

  24. Lorenzo says:

    Wow! Great stuff! I was waiting for something similar, knowing Comega and Xen..

    I think your solution is very elegant, and I always liked very much the lambda expression, back at the university..

    I also like the extensibility of this solution.

    Now the only think missing to c# is concurrency..do you plan to add it in a Comega way? With another message based solution? Or with another shared memory solution? I you want to, I have a little project based on Rotor to share.. =)

    Also, a last question: how this will cope with third party languages and compilers (especially dynamic languages?)

    Great work!

  25. Mabsterama says:

    Ok, all the developers (myself included) out there are totally going bananas over the revelations from…

  26. Daniel Chait says:

    There’s a <a href="http://blog.lab49.com/?p=93">discussion of some C#3 features</a> over at <a href="http://blog.lab49.com">Lab49 Blog</a> talking (and arguing) about extension methods, functional programming, etc. Interesting if you like that sort of thing.

  27. After the declaration of C# 3.0 I went ahead and installed the PDC bits. After reading through the language…

  28. After the declaration of C# 3.0 I went ahead and installed the PDC bits. After reading through the language…

  29. After the declaration of C# 3.0 I went ahead and installed the PDC bits. After reading through the language…

  30. Thomas Eyde says:

    I think the => operator is not only funky, but also clunky. It’s so C++ish.

    I guess the combination is easy to type on an English keyboard, but on a Norwegian one those two characters are almost as far from each other as they can possibly get.

    I would like a colon better. The colon is already used as a separator in the switch statement:

    var custs = customers.Where(c : c.City == "Seattle");

  31. Thomas Eyde says:

    I think the => operator is not only funky, but also clunky. It’s so C++ish.

    I guess the combination is easy to type on an English keyboard, but on a Norwegian one those two characters are almost as far from each other as they can possibly get.

    I would like a colon better. The colon is already used as a separator in the switch statement:

    var custs = customers.Where(c : c.City == "Seattle");

  32. CyrusN says:

    Thomas: I’ll take your feedback to the Language design team. Thanks!

  33. In the last post i&amp;nbsp; discussed a little bit of background on why we wanted to introduce Linq, as…

  34. In the last post i&amp;nbsp; discussed a little bit of background on why we wanted to introduce Linq, as…

  35. In the last post i&amp;nbsp; discussed a little bit of background on why we wanted to introduce Linq, as…

  36. damien morton says:

    Im with Thomas about not liking the => operator so much. Its not clear that a sense of directionality is usefull when reading lambda expressions.

    Id also rather see enclosing brackets be mandatory – its makes the functional nature of the resulting expression more clear.

    Maybe something like this:

    var sum = (x,y,z)(x+y+z)

    var sqr = (x)(x*x)

    c.f

    var sum = x,y,z => x+y+z

    var sqr = x => x*x

    Note how the overuse of the equals sign tend to blur the distinctions between = and =>. To my eye, I tend to see this associativity (if only because => is ‘bigger’ than = ):

    (var sqr = x) => x*x

  37. Daniel Moth says:

    Blog link of the week 37

  38. CyrusN says:

    Damien: "var sum = (x,y,z)(x+y+z)

    var sqr = (x)(x*x) "

    Bleagh :)

  39. Confused says:

    Anyone how would you write something like this???

    select o.customerID, c.CompanyName, count(o.OrderID) NumOrders

    from Customers c,

    Orders o

    where c.CompanyName LIKE ‘A%’

    and o.CustomerID = c.CustomerID

    group by o.CustomerID, c.CompanyName

  40. damien morton says:

    Id also like to know how joins are meant to be handled.

  41. Ken Overton says:

    Damien: "var sum = (x,y,z)(x+y+z)"

    Cyrus: "Bleagh :) "

    Would curly-braces make it more comfy?

    var sum = (x,y,z){x+y+z}

    I personally liked Thomas’s ‘:’ the most. And Damien, you know why ‘=>’ implies associativity to you, you just don’t want to admit you’ve soiled yourself with Perl ;-P

  42. CyrusN says:

    Confused:

    "Anyone how would you write something like this???

    select o.customerID, c.CompanyName, count(o.OrderID) NumOrders

    from Customers c,

    Orders o

    where c.CompanyName LIKE ‘A%’

    and o.CustomerID = c.CustomerID

    group by o.CustomerID, c.CompanyName"

    Like this:

    var q = from c in Customers, o in Orders

    ……..where c.CompanyName.StartsWith(“A”) &&

    …………..c.CustomerId == o.CustomerId

    ……..group o by new { o.CustomerId,

    …………………….c.CompanyName } into g

    ……..select new { g.Key.CustomerId,

    …………………g.Key.CompanyName,

    …………………NumOrders = g.Group.Count() }

  43. damien morton says:

    Ok, so now we know how to write nested loop joins.

    Is it the case that anything more sophisticated than a nested-loops join requires more than is delivered in System.Query.Sequence?

  44. damien morton says:

    Damien: "var sum = (x,y,z)(x+y+z)"

    Cyrus: "Bleagh :) "

    Ken : "Would curly-braces make it more comfy?"

    curly braces would be reserved for lambda blocks

    var f = (x,y,z) {

    while (x<y) { z += x; x += 1;}

    return x;

    }

    Ken: "… you’ve soiled yourself with Perl"

    Actually no, I always mamanged to somehow avoid doing any Perl. I even got Python installed on a few systems way back when Python was in versions 1.x.

  45. dhchait says:

    I just posted on Lab49 blog ( http://blog.lab49.com/?p=132 ) asking about the overall rationale for this, as well as some specific shortcomings in the implementation.

    The main rationale for Linq seems to be "Anders thinks its a good idea", which frankly is enough for me :-) But yet I’m still not quite seeing it as the amazing new breakthrough that others seem to think it is.

    Perhaps you (or someone) could post more on the (real-world) use cases/scenarios/patterns/architectures that would incorportate Linq?

    Thanks – Daniel

  46. Mabsterama says:

    Ok, all the developers (myself included) out there are totally going bananas over the revelations from

  47. As you’ve probably already heard, at long last we’ve announced the new features that we’re planning on