XmlNameTable: The Shiftstick of System.Xml


I spent much of today in a customer lab on performance in .NET applications covering best practices for System.Xml. As always the majority of people used XML somewhere in their application and needed to understand the performance implications of using one technique over another. One approach that I covered, among many others, is the use of the XmlNameTable class. This insignificant, yet crucial class, surfaces itself on all the classes in System.Xml that do some form of processing (XmlTextReader, XPathNavigator and XmlDocument) and like the shift stick on it car (gear stick if you live in the UK), is it an implementation detail that allows you to play with the performance of your XML processing.


Here is an example of it in use. Take this portion of an example XML document called invoices.xml that liist a number of LineItems for a given named customer.


<Invoices xmlns=”http://example.invoice/invoices”>


    <Invoice>


      <CustomerName>Levi</CustomerName>


        <LineItems>


            <LineItem>


            <ID>18148</ID>


            <Price>564</Price>


            <Description>A description</Description>


        </LineItem>


      </LineItems>


     </Invoice>



 </Invoices>


 


The following code uses the XmlTextReader with and without an XmlNameTable. The XmlNameTable enables object reference comparison rather than string value comparison and is useful in documents with many repeating known elements, attributes or namespaces which are automatically added to the XmlReaders XmlNameTable, a process called atomization. This allows you to then added your own names to the nametable and perform efficicent object comparisons rather than character by character string comparisons.


 


static void RunPerfNameTable()


{


      Console.WriteLine(“** XmlNameTable vs No XmlNameTable **”);


      // Warm up run


      PerfNoNameTable(“invoices.xml”);


 


      for (int i=0;i<5;i++)


      {


            PerfNoNameTable(“invoices.xml”);


            PerfNameTable(“invoices.xml”);


      }


      Console.ReadLine();


}


 


static void PerfNoNameTable(string filename)


{


     int start = 0, stop = 0, invoicecount = 0, lineitemcount = 0;


     Console.WriteLine(“Reading XML without NameTable comparison”);


 


      start = Environment.TickCount;


      for (int i = 0; i < 80; i++)


      {


            //Create the reader.


            XmlTextReader reader = new XmlTextReader(filename);


 


            while (reader.Read())


            {


                  if (“Invoice” == reader.LocalName)


                  {


                        invoicecount++;


                  }


 


                  if (“LineItem” == reader.LocalName)


                  {


                        lineitemcount++;


                  }


            }


      }


      stop = Environment.TickCount;


      Console.WriteLine(“XmlTextReader document parsing time in ms WITHOUT NameTable: ” + (stop – start).ToString());


}


 


static void PerfNameTable(string filename)


{


      int start = 0, stop = 0, invoicecount = 0, lineitemcount = 0;


 


      NameTable nt = new NameTable();


      object invoice = nt.Add(“Invoice”);


      object lineitem = nt.Add(“LineItem”);


 


      Console.WriteLine(“Reading XML WITH NameTable comparison”);


      start = Environment.TickCount;


      for (int i = 0; i < 80; i++)


      {


            XmlTextReader reader = new XmlTextReader(filename, nt);


 


            while (reader.Read())


            {


            // Cache the local name to the reader.LocalName property


               object localname = reader.LocalName;


               // comparison between object references. This just compares pointers


               if (invoice == localname)


               {


                     invoicecount++;


               }


               // comparison between object references. This just compares pointers


               if (lineitem == localname)


               {


                     lineitemcount++;


               }


          }


      }


      stop = Environment.TickCount;


      Console.WriteLine(“XmlTextReader document parsing time in ms WITH NameTable: ” + (stop – start).ToString());


}


The crucial piece of code shown above is this line which performs two things


                  object localname = reader.LocalName;


 


1) This caches the call to the LocalName property which in the V1 implementation of the XmlReader prevents two virtual method calls one public and the other internal each time this property is accessed.


2) Allows the Localname to be compared as an object reference multiple times via reference comparison such as in this line of code


 


                  if (invoice == localname)


 


The end result is a performance increase for parsing a 230kb XML file on a machine with a  PIII processor and 1Gb memory of around 6-9%. This is not enormous, but in scenarios where there is a high through-put of  XML documents or the documents are large (>200kb) then using the XmlNameTable gives you enough of a performance benefit to make it worthwhile especially if your processing starts to spans multiple XML components in a piplelining scenario and the XmlNameTable is shared across them i.e. XmlTextReader->XmlDocument->XslTransform.

Comments (13)

  1. Jason Reis says:

    Great post! Thanks!

  2. Dmitriy Zaslavskiy says:

    Mark,

    NameTable’s implementation is not very efficient (even Hashtable implements a better algorithm) and there is no way to tell XmlReader(s) not to use any XmlNameTable at all.

    Are any improvement planned in this area?

    Also, is there a place we can express our wishes for the next version?

    For example I would like to have ValueBytes property on reader in addition to Value.

    Consider the case where a binary is encoded in CDATA section and the xml encoding is ascii it’s a huge overkill to convert the whole thing to string.

    Thanks.

    Dmitriy

  3. Jiho Han says:

    Mark,

    Oleg had posted this very topic on his blog here: http://www.tkachenko.com/blog/archives/000181.html

    Basically, my issue with this is that I can’t use switch statement for the node comparison as switch expects a constant expression for each case. Imagine you have a 100 specific nodes you need to test for and perform special processing for – I know it may be a sign of a bad design but bare with me – then in worst case scenario, the code will execute the 100 comparisons before matching on the node name(or reference).

    And finally this may be a stupid question but what is the implication for declaring localname variable as a string type rather than an object? XmlReader.LocalName returns a string type anyway right?

  4. Mark Fussell says:

    Dmitriy – Send your ideas and requests to me for what you would like to see in future versions or post them on this blog.

    The XmlNameTable has be re-written in the V2 release or .NET and is significantly faster. So ‘yes’ to the improvements here.

    With regards to reading data sequential (stream like) there is a ReadChars() method today on the XmlTextReader that allows you to read characters into an array, which works on Element text nodes. In the V2 release we have added typed read methods including one where you can specify the type that this should be read into. For example

    ReadValueAs(typeof(Stream)) or

    ReadValueAs(typeof(TextReader))

    This enables you read the contents of an attribute or element into a stream and to read byte values as opposed to characters. You are still bound by the overall encoding of the file (i.e. you have to have your file in ASCII if you want to read ASCII). This enables you to read binary data that has been encoded as Base64 or BinHex into your text file encoding. What you cannot do is call these methods on CDATA sections node types. What was you scenario that you wanted this over using an element text node that has been escaped?

  5. Mark Fussell says:

    Jiho –

    >>I can’t use switch statement for the node comparison

    Yes this is a limitation of switch. You have to use a conditional statement.

    >>what is the implication for declaring localname variable as a string type rather than an object? XmlReader.LocalName returns a string type anyway right?

    It will attempt to do a string comparison. You need to enforce object reference comparison by casting to object.

    ‘Yes’ I see that Oleg posted this before me. I never harms to have a best practice repeated! The additional aspect of this post is that caching the local name value from the reader also provides a perf improvement which in V1.1 is nearly as much as the XmlNameTable lookup.

    Thanks. Mark

  6. Dmitriy Zaslavskiy says:

    Mark,

    The scenario I had was I was receiving and xml file with most of it is base64 encoded binary data. Trying to be as general as possible we were passing around XmlReader (as opposed to XmlTextReader even though XmlTextReader was the underlying implementation) So the only way to retreive the data was to call Value (consider ascii file with 100k attachment which now turns into 200k string I never cared about)

    Would you cosider adding some of those methods to XmlReader (ReadBinHex/ReadBase64…)

    And as you said this would not work nicely with CDATA section which is said, because this is just an ecoding detail. ReadBase64 already does some parsing to check if it reached new element why cann’t it handle CDATA.

    On the subject of XmlNameTable. Consider the situation where one knows in advance all the elements/attribute names you will ever care about and the rest you will ignore (i.e. XmlSerialization) or when one reuses XmlNameTable between documents. In such as case it possible to optimize XmlNameTable by producing a better hash function or maybe even to keep the list of names sorted. Would msft consider providing such option.

    I just installed M04, so I’ll play with it first

    Thanks

  7. Dmitriy Zaslavskiy says:

    Mark,

    What about providing concrete class derived from XPathExpression. So that XPathExpression(s) could be cached.

    Currently the only portable way to cache XPathExpression(s) is to create a dummy class that derives from XPathNavigator create a dummy instance and call Compile on that. The resulting XPathExpression(s) can be now reused (call SetContext)but some of them leak for example XPathDocuments. So my solution was to clone them.

    Anyway it would be nice to have a clean way of doing it.

    Thanks again

  8. Jiho Han says:

    Mark,

    Thanks for clarifying the object reference question.

    Also, I never meant to say that since your post appeared later than Oleg’s that it is any less helpful and if it did appear that way I apologize. I merely meant it as an informational link so that you or other visitors may gain more exposure. It is actually rather helpful to see multiple postings on the same topic/issue since it confirms that the pattern discussed is indeed valid.

    Thanks and keep up the good work!

  9. Mark Fussell says:

    Dmitriy – I raised a design request to look at an easier way to create XPathExpression, probably with a static method on the XPathExpression class rather than an instance.

    Equally there is a very good chance that ReadBinHex/ReadBase64 will be added to the XmlReader since we have had this request from others. I am looking into this for .NET Beta 2 planning in the next few weeks.

    I have raised another design request for the ReadBinHex and ReadBase64 methods to work on CDATA sections. This is a new request (no one has asked for this before) so I will have to see how this goes.

    Thanks for the feedback and look forward to V2.