Combining the XmlReader and XmlWriter classes for simple streaming transformations


The XmlReader and XmlWriter classes can often be combined to provide simple streaming transformations rather than resorting to XSLT which requires a the document to be loaded into memory. This class combination is often faster and uses less memory, although it requires more code and is less flexible in the types of transformations possible. However for many scenarios it is ideal. Say for example you wanted to add a new element in several repeating places to an existing document. The XmlWriter.WriteNode method is useful in pulling from an XmlReader and pushing to an XmlWriter to achieve this, but it does have a limitation in that it writes the current node and all its children to the XmlWriter without providing more fine-grained control.

The code below shows a method called WriteShallowNode, which writes individual nodes to the XmlWriter from the XmlReader. In this way you can change individual nodes during the transformation process.

 

//This method is useful for streaming transformation with the

//XmlReader and the XmlWriter. It pushes through single nodes in the stream

 

static void WriteShallowNode( XmlReader reader, XmlWriter writer )

{

      if ( reader == null )

      {

            throw new ArgumentNullException("reader");

      }

      if ( writer == null )

      {

            throw new ArgumentNullException("writer");

      }

     

      switch ( reader.NodeType )

      {

            case XmlNodeType.Element:

                  writer.WriteStartElement( reader.Prefix, reader.LocalName, reader.NamespaceURI );

                  writer.WriteAttributes( reader, true );

                  if ( reader.IsEmptyElement )

                  {

                        writer.WriteEndElement();

                  }

                  break;

            case XmlNodeType.Text:

                  writer.WriteString( reader.Value );

                  break;

            case XmlNodeType.Whitespace:

            case XmlNodeType.SignificantWhitespace:

                  writer.WriteWhitespace(reader.Value);

                  break;

            case XmlNodeType.CDATA:

                  writer.WriteCData( reader.Value );

                  break;

            case XmlNodeType.EntityReference:

                  writer.WriteEntityRef(reader.Name);

                  break;

            case XmlNodeType.XmlDeclaration:

            case XmlNodeType.ProcessingInstruction:

                  writer.WriteProcessingInstruction( reader.Name, reader.Value );

                  break;

            case XmlNodeType.DocumentType:

                  writer.WriteDocType( reader.Name, reader.GetAttribute( "PUBLIC" ), reader.GetAttribute( "SYSTEM" ), reader.Value );

                  break;

            case XmlNodeType.Comment:

                  writer.WriteComment( reader.Value );

                  break;

            case XmlNodeType.EndElement:

                  writer.WriteFullEndElement();

                  break;

      }

}

 

Performing simple transformations with this method is very straightforward. The code below (written for .NET v2.0) reads the movies.xml document and for those dvd elements whose attribute genre is an action type, a new publisher element is written to the output.

 

Input document:

<?xml version="1.0"?>

<dvdstore>

  <dvd genre="action" publicationdate="1990" >

    <title>T2</title>

    <review>This film was <b>impressive</b> in its creativity.</review>

    <stats>

       <price>8.99</price>

       <id>123-456</id>

    </stats>

  </dvd>

….

</dvdstore >

 

static void AddElementWithWriteShallowNode()

{

   XmlWriterSettings settings = new XmlWriterSettings();

   settings.Indent = true;

 

   using (XmlReader reader = XmlReader.Create("movies.xml"))

   {

      using (XmlWriter writer = XmlWriter.Create(Console.Out, settings))

      {

            while (reader.Read())

            {

                  if (reader.IsStartElement("dvd") &&

                        reader.GetAttribute("genre") == "action")

                  {

                        //Write the dvd element

                        WriteShallowNode(reader, writer);

                        //Now add a new publisher element to the output

                        writer.WriteElementString("publisher",

                              "metal.sword.com", "Samurai Films");

                  }

                  else

                        WriteShallowNode(reader, writer);

            }

       }

   }

}

 

Output document:

<?xml version="1.0"?>

<dvdstore>

  <dvd genre="action" publicationdate="1990" >

    <p:publisher xmlns:p='metal.sword.com'>Samurai Films</p:publisher>

    <title>T2</title>

    <review>This film was <b>impressive</b> in its creativity.</review>

    <stats>

       <price>8.99</price>

       <id>123-456</id>

    </stats>

  </dvd>

….

</dvdstore >

 

Without this method you have to use the XmlWriter WriteXXX methods to re-write the XML document to the output if all you wanted to do was inject a single publisher element into the tree.

For an excellent and complete application example of using this approach see Dan Wahlin’s Generate Dynamic Maps and Flight Routes with XML and SVG

Comments (5)

  1. Guy S. says:

    Still – XSL have its own benefits – and its worth to raise it here again. In situations when u have to filter a loaded xml data on the client XSL can be our answer. XSL logic will be useable on the client but also on the server, so its let u maintain the logic once even though u can use it on both – server and client.

    Its also a good choice when u want to move processing resources/time to your client instead of using your server resources.

    It will be interesting to see a comparison table that show the use of memory / speed / line of codes when processing different XMLs using these two methods XSL and XMLReader/Writer.

  2. Andy Neilson says:

    It seems to me that there are many cases were relatively simple transformations need to be done on an XML stream, and where XSLT is overkill. There are seem to be a number of projects that have attempted to tackle this as a subset of XSLT (e.g., STX).
    <br>
    <br>Streaming transformations seem like an obvious can common enough thing to want to do efficiently. I find it strange that there isn’t more interest in this in the .NET framework, or indeed at the standards level.
    <br>
    <br>Your example shows that it is simple enough to do the transformation at a low level, but if you had a significant amount of such code, you’d probably want a more managable declarative representation like some XSLT subset.

  3. A fellow MVP asked if there is a way to dump XML content while reading it from a stream without buffering the whole XML document. Here is a scenario – an XML document being read from a HttpWebResponse stream and needs to be passed as an XmlReader to an XmlSerializer…

Skip to main content