XML Performance in System.Xml

There has been some discussion with the Webdata XML team about Sun's published performance report on the relative comparison between a set of Java XML classes and System.Xml in the .NET framework. Whilst I would not defend to the death the performance of the V1.0 and V1.1 releases of System.Xml, the report does not provide any sufficient detail or real world scenarios. Typically it is possible create a set of performances test that favor a system that you understand, rather than one that you do not and often small changes in the settings can have a  significant impact on performance. Also performance is a very arbitary and subjective characteristic since it depends on what your application is doing. I have talked to hundreds of customers who use the XmlTextReader class in their code and only a very, very few have had issues with performance that meant they have had alter their application. This means that for the vast majority of people using System.Xml today the performance is easily good enough.

Reports like this also miss the real issue of usability. The V1.0 release of System.Xml changed the programming model for working with XML by introducing the easier to use pull model XmlTextReader parser (subsequently copied by JSR 173 on the Java platform), the XmlTextWriter as most intuative API for writing XML documents using a top down creation approach rather than the more difficult bottom up creation with the DOM API and the innovative XPathNavigator that provides an XPath engine that can be easily implemented over any store of data and hence integrated with the XSLT processor. In essence API usability is as important as performance and the Java APIs on the whole have significantly lagged behind System.Xml in this respect.

It is also interesting that this performance report avoided XSLT processing. MSXML has been the leader here for many years and although the System.Xml XsltTransform class is not yet reaching the speeds of MSXML 4.0, it is again still sufficient for the majority of users. We tend to have more issues reported on the performance of the XSLT processor in System.Xml, as this is often compared to MSXML 4.0, but other than a couple of serious bugs the majority of issues can be solved by rewriting parts of the stylesheet. As Joshua points out, competition is a great incentive and being first in performance as well as usability is always our goal. For the “Whidbey” V2.0 release of System.Xml performance was a #1 goal established from the start of development. Aaron Skonnard details in his XML Report from the Microsoft PDC 2003 how the performance across XML parsing, writing and transformation has been dramatically improved. This has also been driven by aggresive users of System.Xml such the XML messaging needs from “Indigo” who require high throughput on small messages. This is a real world performance example that can be quantified. Interestingly our performance improvements are going far beyond testing discrete components and when you add XML schema into the mix further optimization can be applied. In V2 the XmlReader, XmlWriter and importantly the XPathDocument classes all have schema information stored. This means that when you load an XML document from a validating reader with an associated schema, we are able to store the XML types as CLR types. For example if the XML schema indicates that the values are of type xs:int these are stored as CLR int types in the XPathDocument, rather than as untyped strings. Not only does this enable you to work with the types in your CLR language of choice, but it reduces the storage and working set of the document loaded into memory, dependent your type of data of course. Importantly if you apply an XSLT or XQuery to the XPathDocument and use this to generate another XPathDocument, these CLR types are “flowed” between components in that they are not first copied to string values and then reparsed through a text XML parser. This provides a significant performance improvement when chaining XML components together that utilize schema type information. Show me this real world scenario with a set of Java classes that is comparable in performance and then I will be impressed. And with a closing word on usability, there are a significant number of improvements in System.Xml V2 that further reduce the development time and ease of programming when working with XML on the .NET platform.