XML Performance in System.Xml


There has been some discussion with the Webdata XML team about Sun’s published performance report on the relative comparison between a set of Java XML classes and System.Xml in the .NET framework. Whilst I would not defend to the death the performance of the V1.0 and V1.1 releases of System.Xml, the report does not provide any sufficient detail or real world scenarios. Typically it is possible create a set of performances test that favor a system that you understand, rather than one that you do not and often small changes in the settings can have a  significant impact on performance. Also performance is a very arbitary and subjective characteristic since it depends on what your application is doing. I have talked to hundreds of customers who use the XmlTextReader class in their code and only a very, very few have had issues with performance that meant they have had alter their application. This means that for the vast majority of people using System.Xml today the performance is easily good enough.


Reports like this also miss the real issue of usability. The V1.0 release of System.Xml changed the programming model for working with XML by introducing the easier to use pull model XmlTextReader parser (subsequently copied by JSR 173 on the Java platform), the XmlTextWriter as most intuative API for writing XML documents using a top down creation approach rather than the more difficult bottom up creation with the DOM API and the innovative XPathNavigator that provides an XPath engine that can be easily implemented over any store of data and hence integrated with the XSLT processor. In essence API usability is as important as performance and the Java APIs on the whole have significantly lagged behind System.Xml in this respect.


It is also interesting that this performance report avoided XSLT processing. MSXML has been the leader here for many years and although the System.Xml XsltTransform class is not yet reaching the speeds of MSXML 4.0, it is again still sufficient for the majority of users. We tend to have more issues reported on the performance of the XSLT processor in System.Xml, as this is often compared to MSXML 4.0, but other than a couple of serious bugs the majority of issues can be solved by rewriting parts of the stylesheet. As Joshua points out, competition is a great incentive and being first in performance as well as usability is always our goal. For the “Whidbey” V2.0 release of System.Xml performance was a #1 goal established from the start of development. Aaron Skonnard details in his XML Report from the Microsoft PDC 2003 how the performance across XML parsing, writing and transformation has been dramatically improved. This has also been driven by aggresive users of System.Xml such the XML messaging needs from “Indigo” who require high throughput on small messages. This is a real world performance example that can be quantified. Interestingly our performance improvements are going far beyond testing discrete components and when you add XML schema into the mix further optimization can be applied. In V2 the XmlReader, XmlWriter and importantly the XPathDocument classes all have schema information stored. This means that when you load an XML document from a validating reader with an associated schema, we are able to store the XML types as CLR types. For example if the XML schema indicates that the values are of type xs:int these are stored as CLR int types in the XPathDocument, rather than as untyped strings. Not only does this enable you to work with the types in your CLR language of choice, but it reduces the storage and working set of the document loaded into memory, dependent your type of data of course. Importantly if you apply an XSLT or XQuery to the XPathDocument and use this to generate another XPathDocument, these CLR types are “flowed” between components in that they are not first copied to string values and then reparsed through a text XML parser. This provides a significant performance improvement when chaining XML components together that utilize schema type information. Show me this real world scenario with a set of Java classes that is comparable in performance and then I will be impressed. And with a closing word on usability, there are a significant number of improvements in System.Xml V2 that further reduce the development time and ease of programming when working with XML on the .NET platform.

Comments (9)

  1. Kent Tegels says:

    Thanks for the response, Mark!

  2. Take Outs: The Digital Doggy Bag of Blog Bits for 24 February 2004

  3. Matt says:

    I recall the performance charts we used to post when developing System.Xml 1.0. For some bits we lagged behind MSXML and in some cases we pushed past them. Yet, those ingenious devils working on MSXML fought back and pushed MSXML 4.0 to even higher levels of performance. Those where the days. Besides, we also had charts showing performance of all competing xml products, many of these were java based solutions. I recall we were an order of magnitude faster. So, I applaud the fact that those competing implementations may have finally been spruced up a bit. But, come on, do you really believe they can do better than us?

  4. Gia says:

    In my test .NET was garbage-collecting during whole test (mem usage steady under 9M). Java VM on another hand was configured aggresively by test authors (-XX:+AggressiveHeap -Xms1024M -Xmx1024M) and ran at 413MB.

    So java test did run 28% faster but it consumed 4500% more memory.

    At this point I felt frustrating feeling of bein cheated and didn’t even bother to see what java would do within same memory constraints as CLR.

  5. Kevin Jones says:

    I have no interest in .NET Vs Java performance arguments but these two statements caught my attention.

    Mark said,

    "MSXML has been the leader here for many years and although the System.Xml XsltTransform class is not yet reaching the speeds of MSXML 4.0, it is again still sufficient for the majority of users."

    Most XSLT benchmarks have indeed shown MSXML to normally be a very good performer but only of processors *tested* and only often by *slim* margins to the next best.

    Matt said,

    "Those where the days. Besides, we also had charts showing performance of all competing xml products, many of these were java based solutions. I recall we were an order of magnitude faster."

    An order of magnitude is a large difference. The only way this would be true is by comparing MSXML against one of the worst Java processors. Hardly a fair comparison.

    Ironically, it this type of over zealous support for one platform that used to (and still does) annoy me about the Java community. Looks like we are in for more of the same.

    Kev.

    Disclaimer: I work for Sarvega on XSLT processor development and benchmarking along with other things.