Erik Meijer on LINQ, XML, and the future at XTech 2007

Erik Meijer gave a talk at the XTech 2007 conference on LINQ, XML, and his vision for "LINQ 2.0". The presentation covers the formal underpinnings of LINQ to XML, the additional XML features in VB9, and some thoughts about the next steps.

To summarize, LINQ 1.0 provided the necessary (but not sufficient) technology for radically simplifying distributed data-intensive applications. With LINQ 2.0 we use will take things to the next level by stretching the standard .NET programming model to cover the Cloud.

Jeni Tennison has an interested-but-skeptical review.  She definitely understands the most fundamental point of LINQ: "A big distinction between previous attempts to work across paradigms is that the data doesn’t get converted, but the queries do."  As Erik puts it in the paper:

One school of thought is to pick one data model as the universal one and map all other data models into it. The most popular option is to view all data as XML and then use XQuery as the glue language. Others try to come up with a new data model that encompasses all previously know data models. We do not believe that this universal data model approach is the right solution to the impedance mismatch problem. Instead of trying to unify at the data model level, a better approach is to unify at the level of algebraic operations that can be defined the same way over each data model. This allows us to define a single query language that can be used to query and transform any data model. All the data model need to do is to implement a small set of standard query operators, and each data model can do so in a way natural to itself.

Addressing her request for clarification on a couple of points: "LINQ syntax and it gets mapped on to SQL to query your SQL database, or on to XQuery (I guess) to query your XML document".  There is no mapping to XQuery, in this release anyway; the compiler generates code that queries the LINQ to XML axes (analogous to XPath axes) as "monad comprehensions".  As Erik explains in the paper:

Mathematicians recognized this “design pattern” many decades ago, and say that each data model defines a monoid, or more generally, a monad, and each query is an example of a monad comprehension.

Using monads and comprehensions to query arbitrary data sources was first introduced in the functional language Haskell, where they are primarily used to deal with imperative side-effecting computation. Many people wonder what the connection is between side-effects and collections, but if you think about it, it is not far-fetched to consider a whole side-effecting computation that returns say strings, as a some kind of collection that contains strings that yields a new string each time the computation is executed. The idea of using monads and comprehensions to bridge the impedance mismatch between data models is also the basis of the Language Integrated Query (LINQ) technology developed by Microsoft.

Also, "This all seemed to assume data-oriented information: I have no idea, yet, how or whether mixed content gets handled."  LINQ to XML works with both data-oriented and document-oriented XML; the XText class was created largely to support mixed content and you can query for mixed content via one of the Nodes axes. 

Finally, "XML is a “first class datatype” in LINQ, so to create static XML you just write XML in your program (a bit like in XQuery). "  Just to be clear, that is only true in Visual Basic 9; C# 3.0 does not build XML into the language, but both VB and C# work with a new LINQ-compatible XML API in the System.Xml.Linq namespace in the .NET framework.  Yes, that is confusing, sorry!