Traversing in the Open XML DOM

For the past few posts, I have been concentrating on showing you guys solutions to real world scenarios. Today, I am going to change pace a bit and continue Ali's discussion on the basics of the Open XML SDK. In this post I am going to cover the basic techniques of traversing in the Open XML DOM tree using the Open XML SDK.

Again, it's just XML...

We designed the Open XML Low Level DOM to be an XML wrapper of the Open XML schemas. In other words, this component of the SDK allows you to work directly with strongly typed objects and classes that represent the underlying XML nodes. Essentially, by using the SDK, you are still working with just XML.

Traditionally, there are several technologies which allow you to traverse XML files. When designing the Open XML SDK we wanted to include functionality from DOM and LINQ in order to bring you a rich set of functionality. Our Low Level DOM component tries to marry concepts from .NET's DOM API and LINQ to XML. Let's talk about some of these concepts.

Traversing Down the XML Tree

One of the most common tasks when reading XML files is to traverse below a particular XML node, like the root. You would usually want to traverse downward if you are looking for something specific. For example, suppose you wanted to change content for all table rows within a document. One way to accomplish this scenario is to traverse downward starting at the document root element finding table and table row elements until you reach the end of the document. The Open XML SDK makes this task very easy.

As mentioned in Ali's post, all Open XML elements are based on the abstract OpenXMLElement class. The OpenXMLElement class provides the following methods and properties to traverse downwards within the DOM:

  1. OpenXMLElement FirstChild { get; }
  2. OpenXMLElement LastChild { get; }
  3. IEnumerable<OpenXmlElement> Elements ()
  4. OpenXmlElementList ChildElements { get; }
  5. IEnumerable<OpenXmlElement> GetEnumerator ()
  6. IEnumerable<OpenXmlElement> Descendants ()

There are a few core differences between each of these approaches. To demonstrate these differences let's say you want to read elements under a Table object, which contains table properties, table grid information, and row content:

<w:tbl>

<w:tblPr>

...

</w:tblPr>

<w:tblGrid>

...

</w:tblGrid>

<w:tr>

...

</w:tr>

...

<w:tr>

...

</w:tr>

</w:tbl>

FirstChild() and LastChild() methods are pretty straight forward, they return the first and last child, respectively.

The Elements() method allows you to read all the children elements of Table by using this code snippet:

foreach (OpenXmlElement el in tbl.Elements())

{

//DO SOMETHING

}

The ChildElements property is a bit special. Calling it directly like tbl.ChildElements will return a list of Open XML elements that are the children of a table. In this regard it is pretty equivalent to tbl.Elements(). What differentiates ChildElements from Elements() is that you can specify an index. For example, if you wanted the fourth child of the Table object you can call tbl.ChildElements[3].

Similarly to the other two approaches, the GetEnumerator() method provides support for iteration over all the child elements.

The Descendants() method, on the other hand, allows you to iterate over all children and descendants under a particular node. Taking this Table object as an example, calling Descendants() will allow you to not only see the table rows, but the cells within the rows as well.

Rather than iterating through all child elements, a more common scenario is finding child or descendant elements of a certain class type. For example, let's say you only want to find table row elements underneath the Table object. Instead of iterating through all the child elements and checking to see if the element is a table row, you can simply use either of the following methods:

  1. IEnumerable<T> Elements<T> ()where T : OpenXmlElement
  2. IEnumerable<T> Descendants<T> ()where T : OpenXmlElement
  3. T GetFirstChild<T> ()where T : OpenXmlElement

The first two methods will only return elements whose type is T or derived from T. Going back to the example above, you would use this code snippet to find all rows within a table:

foreach (TableRow tr in tbl.Elements<TableRow>())

{

//DO SOMETHING

}

The last method allows you to get the first child that matches a particular type. In the case of the table example, if I wanted to get the first row I could have used tbl.GetFirstChild<TableRow>().

Traversing Up the XML Tree

Similarly to traversing down the XML tree, you can traverse upwards. This functionality provides flexibility when traversing the DOM tree. For example, let's say you are searching for some text and you want to understand more of its context, like if the text is contained within a table or not. To accomplish this scenario you would need to traverse upwards. The Open XML SDK provides the following methods and properties to traverse upwards within the DOM:

  1. OpenXmlElement Parent
  2. IEnumerable<OpenXmlElement> Ancestors ()
  3. IEnumerable< T > Ancestors<T> () where T : OpenXmlElement

The Parent property will return the immediate parent element of a particular node. Calling Parent on a table row object will return the Table object.

The Ancestors() methods are very similar to the Descendants() methods., except that they traverse upwards instead of downwards.

Traversing Siblings within the XML Tree

What if you want to traverse the XML tree by exploring siblings? Well, the Open XML SDK can take care of this scenario as well. The Open XML SDK provides the following methods for traversing by siblings:

  1. OpenXmlElement PreviousSibling ()
  2. T PreviousSibling<T> () where T : OpenXmlElement
  3. OpenXmlElement NextSibling ()
  4. T NextSibling<T> () where T : OpenXmlElement
  5. IEnumerable<OpenXmlElement> ElementsBefore ()
  6. IEnumerable<OpenXmlElement> ElementsAfter ()

The first four methods return the closest sibling before or after the current element, while the rest of the methods enumerate all the sibling elements before or after the element under the same parent.

Summary

Hopefully this post shows some of the common ways of traversing the Open XML DOM tree. With this functionality you should be able to find what you are looking for in just a few lines of code.

Zeyad Rajabi