What LINQ to XML will NOT do

Article
06/30/2006

One of the worst pitfalls a design team can fall into is trying to do too much. The principle is captured by the well known quote:

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away. - Antoine de Saint-Exupery

So, what has been taken away from LINQ to XML (aka XLinq) in the pursuit of simplicity (if not perfection)? I'm in the process of documenting the "non-goals" for XLinq, and thought it would be good to share them and get some feedback.

I discussed the non-goal of replacing XSLT as a tool for processing unstructured documents and replacing XQuery as a database query language in a previous post. Some other non-goals include:

Guaranteeing that an XLinq tree in memory meets the well-formedness constraints is a non-goal; this job is delegated to the XmlReader and XmlWriter. We plan to add more in-memory well-formedness checking than is present in the May CTP release, but will not go so far as XOM does to "make no compromises on correctness". Basically, we can't meet the goal of making XLinq as fast or faster than DOM for most cases if we perform the extensive character by character checking needed to guarantee well-formedness after every successful XLinq operation. If you need that guarantee, you can serialize an XLinq subtree to force the well-formedness check (and choose to bear the performance cost).
Support for XML 1.1 is a non-goal, but so is forbidding XLinq to be used with an XML 1.1 reader/writer. Microsoft doesn't support XML 1.1 for reasons Michael Rys noted in mourning the day it became a recommendation, but the XLinq team doesn't necessarily think it is an abomination. If you like XLinq and need XML 1.1 support, you can write (or support a vendor or open source project that writes) a .NET XmlReader and XmlWriter implementation; the flip side of XLinq not enforcing the XML 1.0 well formedness rules is that it won't reject XML 1.1 content if some non-default reader/writer does not.
Anything beyond the barest minimum support for DTDs is a non-goal. XLinq (actually the XmlReader underneath) will read the DTD internal subset in an XML instance, expand any references to entities declared in the DTD, and round-trip the DTD internal subset. There is no object model for the DTD information, it is just saved as a string property. You can change that string, but you are on your own as far as ensuring that the XML well-formedness constraints are preserved. Again, you can explicitly parse the value to check for well-formedness.
Thus, it is a non-goal to preserve the syntactic fidelity of XML documents loaded / saved by XLinq. For example, a character entity reference defined in the DTD internal subset will NOT be re-entitized on save because XLinq (following the Infoset) has no “memory” of the XML entity that defined a particular Unicode character.
XSLT allows non well-formed results to be generated (e.g. HTML or text); XLinq does not offer this capability.
There is no guarantee that XLinq classes can be subclassed effectively, although there are currently no plans to seal them. The recommended way for applications to add functionality to XLinq is by using the annotation feature to add application-defined objects to XLinq tree objects. In other words, internal experience with building on top of XLinq has shown that the aggregation design pattern works better than inheritance to extend its functionality. This is not firm guidance, just advice that we have a real goal of supporting extensibility via annotations and a non-goal of supporting extensibility via inheritance. This is, however, an area that is very much in flux and we would be particularly interested in hearing your use cases and experiences, e.g. in writing XLinq extensions that support one or more of these non-goals.

If these non-goals of XLinq don't meet your requirements, Microsoft offers alternatives that can be used separately or in conjunction with XLinq. For example, DOM has an object model that supports entities and entity references. The XmlReader and XmlWriter APIs are available to work with XML text in its raw form. In the Orcas release of LINQ to XML, there are plans to add "bridge" classes to allow XLinq to work in better harmony with the rest of System.Xml, e.g. to invoke XSLT over an XLinq tree to work around the non-goal of creating non well-formed output.

I'll update this post as other non-goals are identified. We would, as usual, very much like to hear from prospective XLinq users as to whether these non-goals will clash with your goals or not. It's possible we have taken away things that shouldn't have been taken away ... or maybe there are still things we can remove in in the pursuit of API perfection

[Updated 7/1 with the XML 1.1 point and an elaboration on the extensibility point]

What LINQ to XML will NOT do

Additional resources