PDC and Linq, two great tastes...

I had the great pleasure of helping to represent one of Microsoft's new technologies at the PDC this year, its called LINQ; Language INtegrated Query.  I hope to provide a number of posts about the details of LINQ and XLinq over the next few weeks, but I thought I should start with a simple introduction first, for everyone who was not able to make it to this event.

LINQ is a collection of new language features (which are planned to be included in future versions of C# and VB.Net) and new framework APIs that allow you to query and transform "data" from directly within the language.  There are two interesting things to note about this sentence:

  • Notice the finger-quotes around the term "data" in the sentence above.  The core of LINQ is a collection of new language features that allow you to query almost any kind of data (as long as it is exposed in a certain way) without ever having to leave the comfort off your favorite programming language (we will get to the details of what a data provider needs to do in order to work with LINQ later). 
  • When Microsoft ships LINQ, it will also ship new API's for some kinds of data sources (the plan now is to ship APIs for XML and SqlServer, and LINQ already works with the current .Net collection classes) that can be queried using LINQ.  That's why I was there, to represent the LINQ-enabled XML API.

In other words, LINQ is a syntax for querying data directly in the language , but it does not specify WHAT KIND of data is being queried (this statement is slightly inaccurate, because it is not technically a 'syntax' but rather a collection of language features which are implemented with different syntax for different languages, but we can gloss over that for the moment).  Anyone can choose to implement their APIs in a way that allows LINQ language features to query that data.  Now you may be thinking that this is already true with today's technology.  After all, if you implement an OLE DB provider for your data, then you can query it using SQL.   There are (at least) three major issues with this:

  • You effectively need a post-graduate degree in the OLE DB architecture before you would consider writing an OLE DB provider 'easy'.  In LINQ, you expose you data as IEnumerable's, just like the existing collection classes.  This makes exposing your data in a LINQ friendly way very easy. 
  • In fact, if your data is already represented as an in-memory object model, there is a good chance your data already is in the correct format.
  • SQL is designed around providing access to 'rectangular' data (rows and columns) whereas some data (like XML) is better expressed in other ways.  Since LINQ is designed to access many different types of data, it does not make assumptions about the shape or structure of your data.
    Since your computer language's compiler knows nothing about SQL, you get no compile-time syntax checking.  Since LINQ is a part of the programming language, you get a much better compile-time experience.

To put it another way:

  • If you've ever wanted to do a join between an xml file and data in your database, then LINQ is for you. 
  • If you get tired every time a new data format comes out because that's another query language you will burn 3 weekends learning, then LINQ is for you. 
  • If you wish that your programming language's compiler validated your queries for you, rather then having to run each query to make sure you got the syntax right, then LINQ is for you.
     

Ok, so LINQ is all about enhancing developer productivity when working with data.  So what does a LINQ query look like? The following simple example shows us one way to query some xml data using LINQ and the new LINQ-Enabled XML API:

      IEnumerable<XElement> transactions =
      from 
          transaction in myXmlElement.Element("AllTransactions").Elements("Transaction")
      where
          (double)transaction.Attribute("amount") > 1000.0
      select
          transaction;

                   
This query returns all the 'Transaction' elements that are children of an "AllTransactions" element which is itself a child of myXmlElement and that have an 'amount' attribute whose value is greater then $1000.00.  This could also be expressed with the following XPath:

      AllTransactions/Transaction[@amount>1000.00]

Now if you already understand XPath, then you are probably thinking 'Why would I move to a verbose syntax like LINQ when the XPath is so much more compressed?' Which is a fair statement, but it misses the point.  The point is that I can use the same syntax to query my database:

     IEnumerable<Individual> individuals = 
     from
         individual in myDatabase.Individuals
     where
         individual.Role=="Supreme Commander"
     select
         individual;

And you can use the same syntax to query collections of in-memory objects:

     IEnumerable<char> firstInitials =
     from
         name in myNameList
     where
         name.Length<0
     select
         name[0];

Or you can do a join between xml and your database (sorry for the complicated example here):

     IEnumerable<XElement> transactions =
     from 
         individual in myDatabase.Individuals,
         transaction in report.Element("AllTransactions).Elements("Transaction")
     where 
         individual.Name ==  (string)transaction.Attribute("name")
         &&
         individual.Role=="Supreme Commander"
         &&
        (double)transaction.Attribute("amount") > 1000.0
     select
         transaction;              
                   
The point is, LINQ is a general purpose language for querying and transforming data, so of course there will be cases where a domain-specific query language (like xslt) will be more powerful, more compact, or more expressive.  Our hope though, is that for those common cases where you don't need the extra power, this will be (as one PDC-goer put it) 'the last weird query language you'll ever have to learn.'