More on "Status of XQuery in the .NET Framework 2.0"

As  Soumitra Sengupta and Charlie Heinemann have officially announced on MSDN, and several of us have blogged about previously Microsoft will not ship an implementation of XQuery in the .NET Framework version 2.0.  This decision has generated a certain amount of discussion, and perhaps some confusion which the MSDN article attempts to clear up.  This post offers my own understanding of the underlying rationale in somewhat more detail and offers a personal perspective on the situation.

XQuery enables transformation and querying of large volume of XML data using a declarative and typed language.  Since there are a large number of customer use cases for this capability, we will ship an XQuery implementation in SQL Server 2005.  Michael Rys has outlined the rationale for supporting XQuery in the database engine even before the W3C Recommendation is final:

1. SQL Server (unlike the .Net Framework) already has mechanisms to deal with future changes that break backwards-compatibility (the database compatibility levels). Thus, if we implement something that is not according to the final standard either on purpose or the standard changes after we ship, we can provide a way to users to chose the compatibility level, when we align the behavior with the standard in a future release. …
2. Since we are planning on using XQuery to query XML data, we did not want to provide a stop-gap language such as XPath 1.0. That would have cost us almost as much in implementation cost and would have created instant legacy….
3. By scoping the XQuery implementation to a subset, we minimize the risk of non-anticipated breaking changes while still providing values to our customers. It also allows us to grow our implementation into what users really need in XQuery instead of wasting resources in implementing/testing and documenting functionality that is rarely used.

These arguments, however, do not apply to XQuery support in the next version of the .NET Framework.  We provided an implementation of XQuery in Whidbey Beta 1 as a preview technology and got feedback from our customers.  Based on this feedback and on updated assessments of the XQuery timetable at W3C,  it was clear that we should remove this code from Whidbey Beta 2 and the final .NET Framework 2.0. 

The most important consideration is timing: One can realistically expect that XQuery will become a finalized Recommendation in the early 2006 timeframe, whereas .NET Framework 2.0 and Visual Studio 2005 will ship in the summer of 2005. Microsoft has learned the hard way that supporting draft W3C Recommendations in core technology components is simply a bad idea.. As many readers will recall, Microsoft supported what was expected to be a nearly-final version of XSLT 1.0 in IE 5, and that turned out to be a mistake when the really final version incorporated several incompatible changes.  That created a de facto standard flavor of XSLT that nobody wanted, and which created considerable confusion and support costs that linger to this day.  “Never Again!”  seems to be the watchword  here whenever the subject of shipping implementations of draft W3C specifications comes up. In short, supporting XQuery in .NET 2.0 would create an unacceptable risk of repeating the IE 5.0 XSLT fiasco.  Considering that releasing XQuery in the .NET Framework now means baking it into the OS for the Longhorn release in 2006, supporting a preliminary draft is clearly not the right thing to do for our customers.

Another consideration is that SQL Server 2005 will support only a subset of XQuery, and it is important to align support for a given specification across various products.  In other words, we don’t want to create a situation in which XQuery code developed for .NET won’t work with SQL Server.  Supporting the SQL Server subset in .NET is not the right choice because that is not the right subset to solve key client side scenarios. 

Furthermore, we have simply listened to the customers:  While our customers are asking for XML datatype and XQuery support in SQL Server 2005 to enable storing and retrieving semi-structured and marked up data, the are not pointing to any compelling new scenarios for XQuery in the client.  The whole point of the Whidbey Beta 1 preview release was to get feedback from potential customers, and as far as we could tell the feedback indicated a lack of enthusiasm for XQuery in the client / middle tier at this point.

We do support XSLT on the client side and we hear from customers that it solves a large number of important XML application scenarios.  In Whidbey, we are shipping a new XSLT .NET compiler in the client that will meet or beat our existing XSLT performance numbers on the native stack.  The one area where XSLT 1.0  does not subsume the functionality of a scoped-down XQuery is in strongly-typed query support.  While strong typing is necessary in the server for query optimization, our customers tell us that it is not as critical in the client where most transformations and aggregations are done against un-typed XML.

Thus, the WebData XML group weighed the risk of shipping XQuery in the .NET platform against the risk of being out of alignment with the W3C standard and Microsoft's server implementation,  and determined the right thing to do for our customers is not to ship it in the client at this point.  We are committed to completing the XQuery / XSLT2 standards work in W3C, and created the position I hold in order to support that commitment.  Working with my colleagues Paul Cotton and Michael Rys, a big part of my responsibility is to help ensure that the XQuery becomes a W3C Recommendation as soon as humanly possible. 

So, to summarize:
1.    We will ship a subset of XQuery in SQL Server 2005.  This will enable important customer scenarios for storing and retrieving data using the new XML datatype.  This implementation will be part of Yukon B3 as well. 
2.    We will ship our new compiled XSLT implementation in .NET Framework 2.0 and brand new XSLT debugger in Visual Studio 2005. These will enable customer scenarios for filtering and transforming XML on the client side.
3.    We will continue to drive the XQuery standards in the W3C. We will also actively monitor the progress of  XSLT 2.0 in the W3C and its uptake by the XML developer community.  We will remain deeply engaged with our customers regarding improving our query and transformation story in the frameworks and tools to determine the right strategy and product plans. 

OK, that’s more or less the consensus around here.  Moving on to my personal perspective…

•   There is no doubt in my mind that XQuery is going to be successful as a query language for XML data stores. While some first-generation XML database products got by with offering XPath 1.0 – based query languages, XQuery offers several important advantages over XPath 1.0. These include the ability to do joins across XML collections, the ability to query on data types rather than just text representations, and the ability to restructure output within the query environment.  XQuery actually has very little competition in this niche: theoretically XSLT would fit the bill as both a query and a transformation language, but very few people have taken the idea seriously.  Alternatively, SQL extended with XPath can do this, but in practice the mis-match between the relational and XML data models makes this very messy.  (As I understand it, the next version of the SQL standard will reference XQuery normatively rather than try to define an alternative).

•    There is a LOT of doubt in my mind about XQuery’s future on the client side or middle tier as a data integration language and/or a replacement for XSLT as a transformation language  The WebData XML group bet heavily on this idea a few years ago, and it didn't work out for the reasons noted in the blog posts referenced above.  That's not to say that the official use case for XQuery as a  way of integrating across the relational and XML worlds is misguided, but simply to argue that this is not at all proven in the real world.    Right now the corner cases where SQL, programming language, and XQuery data types do not mesh cleanly (dates are a notorious example), and the common cases where tricky semantic alignments are needed to integrate real-world data, are best handled by procedural code that handles these in a domain-specic manner. A couple of companies have bet heavily on XQuery as a framework for a general solution in this area, and perhaps they will make it work.  Dana Florescu, who has contributed greatly to the development of XQuery over the years, offers an enthusiastic perspective in a recent interview.  It is quite possible that this vision will be realized in the next few years, we shall see.
Still, I’m afraid I have to agree with Dr. Florescu’s colleagues at the at the CIDR conference who (as she notes in the interview) gave her an award for the "idea the world is least ready for" :-)

•    I am growing increasingly skeptical that XQuery-based applications will be easily portable across implementations.  Part of this skepticism is theoretical, based on the sheer size of the XQuery spec and the reality that no commercial DBMS vendors have  implemented the whole thing. Conversely, since XQuery 1.0 will not implement insert/delete/update operations, all DBMS implementers have to add proprietary extensions in order to meet obvious customer needs. But another part of my skepticism is  based on the reality that few real-world SQL applications are portable across products.  XQuery is at least as complex as SQL and forged in the same competitive environment, so it is unclear to me why we can expect it to be any more portable across implementations than with SQL.

•     I've given up on the idea of XQuery as an XML-aware general purpose programming language for real-world developers. I very much like the vision of a development language that can integrate the typed object, RDBMS, and XML worlds, and at one time it looked as though XQuery could hit a sweet spot there. I suspect, however, that XQuery missed its window of opportunity; now that dynamic languages with built-in XML libraries have been accepted into the mainstream, the problems with the XSD type system on which XQuery tries to build become increasingly obvious, and the prospect of conventional languages extended to handle XML natively is becoming tangible, it's just not as exciting an idea as it once sounded.

•    I've also rethought my previous position that XQuery is easier for ordinary mortals to learn than XSLT.  Part of the reason for that is a recent month-long debate on the xml-dev mailing list brought out a lot of people who passionately admire and know how to exploit XSLT, and only a few testimonials (from stakeholders!) for XQuery as anything beyond an XML DB query language.  Furthermore, I've been exposed to the XSLT debugger in the next version of VisualStudio.NET -- I think that once people can watch a stylesheet execute, they will come to grok XSLT's oddly powerful paradigm and learn to apply it to their data manipulation problems.

So, several of us in the WebData XML group have explained why we collecvtively and individually have concluded that XQuery shouldn't be supported in the .NET framework at this point.  What do you think, and what about in the future?  Are there any passionate admirers of XQuery as something other than an XML database query language who think that MS should be seriously considering client and middle-tier use cases  for XQuery once XQuery is a Recommendation?  We’re waiting to hear from you!