Project LINQ and XML - Some reflections

 I'm now here at my first Microsoft Professional Developers
Conference.  This is going to be especially interesting for me
because we can finally talk about the Language Integrated Query (LINQ)
technology that Jim Allchin outlined this morning. Soumitra Sengupta
explains why we think this changes the whole game for XML
processing.  Dave Remy  has done the program management heavy
lifting on getting XML and LINQ to work well together, so I'll let him
handle the details.  I just want to share a few personal reactions
here.

Needless to say, nobody said anything about what we now call LINQ [1]
when I interviewed at MS almost a year ago, but one big reason I took
the job was the excitement I could feel about something big  going
on to define a next-generation processing approach for XML.  I got
the basic introduction when I actually started -- Microsoft is taking
on two of the nastiest problems that XML developers face:  First
is the impedance mismatch between XML processing on one hand,
relational databases on another,  and object-oriented programming
on yet another.  The second is the fact that the DOM-based APIs
are not very popular with developers and are based on programming ideas
of the 1990s, so it's time to rethink how programmers can interact with
XML.

What really blew me away was to learn that the key language designers
here had decided it was time for the .NET languages to get friendly
with XML rather than simply keeping it at arms' length in an API. 
That's why it's called "language integrated" - querying, over objects,
relations, and XML instances, is built into the actual languages. 
When all this stuff has matured into shipping products, mainstream
developers using C# and Visual Basic will have XML capabilities at
their fingertips that are now essentially available only to
geeks.  Those who have mastered a language such as XSLT or XQuery,
or to those who have taken the trouble to master the rather complex and
confusing XML APIs and can grok how to map them onto OO concepts and
SQL can do these things today, to be sure, but this is a hard slog for
people who encounter XML now and then while trying to get their work
done.

Also, in the long run we expect the more declarative, set manipulation
style of programming that LINQ inherits from its relational ancestors
will prove very powerful. In other words, LINQ programs have a "what to
do" as opposed to "how to do it" flavor, much like nonprocedural
languages such as SQL, XQuery, or XSLT.  This approach gives
implementations, e.g. query optimizers, much useful information and
much flexibility to actually perform the query in the most efficient
way.

As a participant in the W3C Working Group that developed the DOM API
Recommendation, I was particularly struck by a couple of features of
LINQ's XML support:

- XLinq has no text nodes, my least favorite "feature" of DOM. 
Text nodes, especially whitespace-only text nodes, are the source of
immense confusion.  In LINQ, one extracts the value of an element
by casting it to the desired type, be it a string, a numeric, a date,
or whatever.

- XLinq doesn't put the document node at the center of
everything.  DOM does this partly  because it was designed to
accommodate a single application using different implementations of DOM
[2], and the document provides the appropriate context.  XLink
puts XElement objects at the center of attention, and allows them to
arranged or rearranged into trees.

- XLinq takes a conceptually cleaner approach to XML namespaces -- a
namespace is the URI, and that URI is explicitly associated with each
name.  Prefixes are merely serialization syntax sugar and are not
exposed in the API.  We believe that this will make
namespace-aware code easier to write and less fragile to maintain, even
though it does require a bit of busy work to keep the namespaces
around.

This doesn't mean that the DOM-based APIs in System.Xml or MSXML are
going away once LINQ is productized. They have served the industry
well  and will continue to play a role when maintaining legacy
systems, when code must be ported from one platform to another, in the
browsers (especially now that the original vision of what DOM would be
is starting to become reality in the AJAX paradigm!), and
elsewhere.  We suspect, however, that the LINQ approach will prove
more popular for developers who only occasionally use XML and don't
need to learn a whole stack of technologies.

There is much more to come on this, at PDC and beyond.  We really
want to hear your thoughts after you've seen and discussed the material
we're presenting.  It's the end of a very long day so I'm not
going to look up the various links right now, but stay tuned ... And
keep in touch.

[1] Let's get one unpleasant matter out of the way:  The XML
features of LINQ are called "XLinq", and yes we've spread the word up
the hierarchy that this sounds like "XLink", an unrelated
technology.   Try not to be concerned about this; LINQ is a
project name, not a product label, and I'm confident that the eventual
"official" name will be less confusing.