Project LINQ and XML – Some reflections


 I’m now here at my first Microsoft Professional Developers
Conference.  This is going to be especially interesting for me
because we can finally talk about the Language Integrated Query (LINQ)
technology that Jim Allchin outlined this morning. Soumitra Sengupta
explains why we think this changes the whole game for XML
processing.  Dave Remy  has done the program management heavy
lifting on getting XML and LINQ to work well together, so I’ll let him
handle the details.  I just want to share a few personal reactions
here.

Needless to say, nobody said anything about what we now call LINQ [1]
when I interviewed at MS almost a year ago, but one big reason I took
the job was the excitement I could feel about something big  going
on to define a next-generation processing approach for XML.  I got
the basic introduction when I actually started — Microsoft is taking
on two of the nastiest problems that XML developers face:  First
is the impedance mismatch between XML processing on one hand,
relational databases on another,  and object-oriented programming
on yet another.  The second is the fact that the DOM-based APIs
are not very popular with developers and are based on programming ideas
of the 1990s, so it’s time to rethink how programmers can interact with
XML.

What really blew me away was to learn that the key language designers
here had decided it was time for the .NET languages to get friendly
with XML rather than simply keeping it at arms’ length in an API. 
That’s why it’s called “language integrated” – querying, over objects,
relations, and XML instances, is built into the actual languages. 
When all this stuff has matured into shipping products, mainstream
developers using C# and Visual Basic will have XML capabilities at
their fingertips that are now essentially available only to
geeks.  Those who have mastered a language such as XSLT or XQuery,
or to those who have taken the trouble to master the rather complex and
confusing XML APIs and can grok how to map them onto OO concepts and
SQL can do these things today, to be sure, but this is a hard slog for
people who encounter XML now and then while trying to get their work
done.

Also, in the long run we expect the more declarative, set manipulation
style of programming that LINQ inherits from its relational ancestors
will prove very powerful. In other words, LINQ programs have a “what to
do” as opposed to “how to do it” flavor, much like nonprocedural
languages such as SQL, XQuery, or XSLT.  This approach gives
implementations, e.g. query optimizers, much useful information and
much flexibility to actually perform the query in the most efficient
way.

As a participant in the W3C Working Group that developed the DOM API
Recommendation, I was particularly struck by a couple of features of
LINQ’s XML support:

– XLinq has no text nodes, my least favorite “feature” of DOM. 
Text nodes, especially whitespace-only text nodes, are the source of
immense confusion.  In LINQ, one extracts the value of an element
by casting it to the desired type, be it a string, a numeric, a date,
or whatever.

– XLinq doesn’t put the document node at the center of
everything.  DOM does this partly  because it was designed to
accommodate a single application using different implementations of DOM
[2], and the document provides the appropriate context.  XLink
puts XElement objects at the center of attention, and allows them to
arranged or rearranged into trees.

– XLinq takes a conceptually cleaner approach to XML namespaces — a
namespace is the URI, and that URI is explicitly associated with each
name.  Prefixes are merely serialization syntax sugar and are not
exposed in the API.  We believe that this will make
namespace-aware code easier to write and less fragile to maintain, even
though it does require a bit of busy work to keep the namespaces
around.

This doesn’t mean that the DOM-based APIs in System.Xml or MSXML are
going away once LINQ is productized. They have served the industry
well  and will continue to play a role when maintaining legacy
systems, when code must be ported from one platform to another, in the
browsers (especially now that the original vision of what DOM would be
is starting to become reality in the AJAX paradigm!), and
elsewhere.  We suspect, however, that the LINQ approach will prove
more popular for developers who only occasionally use XML and don’t
need to learn a whole stack of technologies.



There is much more to come on this, at PDC and beyond.  We really
want to hear your thoughts after you’ve seen and discussed the material
we’re presenting.  It’s the end of a very long day so I’m not
going to look up the various links right now, but stay tuned … And
keep in touch.



[1] Let’s get one unpleasant matter out of the way:  The XML
features of LINQ are called “XLinq”, and yes we’ve spread the word up
the hierarchy that this sounds like “XLink”, an unrelated
technology.   Try not to be concerned about this; LINQ is a
project name, not a product label, and I’m confident that the eventual
“official” name will be less confusing.


Comments (11)

  1. tzagotta says:

    How does LINQ and XLinq impact the possibility of adding XQuery into .NET in the future?

  2. Michael Rys says:

    First, we have released the September SQL Server 2005 CTP for download. This is the last CTP before the…

  3. XmlTeam says:

    On the question of how XLinq affects the chances for XQuery in the .NET client …. First, XLinq is a statement of a vision, not a concrete product plan. The idea is to get feedback from outside the rather small group that has worked on it so far. For example, it would be interesting to hear from XQuery fans on what XQuery does that LINQ doesn’t, other than run on multiple platforms of course.

    Ultimately, decisions such as this are BUSINESS decisions; no matter how geeked up we might be about LINQ/XLinq, if the market wants XQuery they’re gonna get XQuery. The market spoke (well, actually it SCREAMED) that it wants XSLT 2.0 as soon as the spec is final and we can release a good implementation. So far, the market seems to be whispering that it might possibly be interested in an XQuery client-side implementation, someday.

    I know that *my* loss of interest in trying to convince Soumitra that XQuery should go back in .NET coincided with learning about what we now call LINQ once I started at MS. It will be very interesting to see if other people have a similar reaction or not.

  4. mikechampion’s weblog : Project LINQ and XML – Some reflections I have specifically held off from posting commentary regaring LINQ and XLINQ because I simply don’t have ANY time at the moment (if you only knew how serious ‘ANY time’ truly is… but I’m loving every second of it and hopefully you will too when you see the results…) to…

  5. XML was everywhere at PDC, even if you didn’t notice it. Bill Gates said in his keynote something like “In…

  6. XML was everywhere at PDC, even if you didn’t notice it. Bill Gates said in his keynote something like “In…

  7. XML was everywhere at PDC, even if you didn’t notice it. Bill Gates said in his keynote something like “In…

  8. We’re nearly code complete on the next version of Visual Studio, and will soon be releasing a Community