Using LINQ to XML with OpenXML Documents

(November 14, 2008 - I've updated my approach for querying Open XML documents using LINQ to XML.  You can see my new approach here: Open XML SDK and LINQ to XML. )

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC

Given my focus on LINQ to XML over the last couple of years, I have to say that LINQ to XML and Open XML are a marriage made in heaven. The expressiveness of LINQ to XML allows you to write incredibly powerful queries and transformations with a minimum of code. Your code really shows your intent without interjecting a huge amount of plumbing/housekeeping. Of course, we’ll continue to demonstrate Open XML programming with other XML programming APIs, but as someone who has used a whole pile of XML programming technologies, I can say that LINQ to XML beats all the others hands down. I think you will see what I mean by some of the examples that I’ll be presenting here over the next few months (starting today).

I first started using LINQ to XML with Open XML while writing the LINQ to XML docs. I needed a non-trivial source of XML for a tutorial on pure functional construction and transformation. It was a no-brainer to use Open XML as the source of XML for the tutorial. (If you are not familiar with pure functional transformations, you can go through this tutorial.)

I’ve been spending bus rides, nights and weekends coming up to speed on Open XML - reading the specs, writing a number of examples, and so on. Following are a list of blog pages that contain some code that helps to deal with Open XML documents.

Packages and Parts

The first issue that you encounter when dealing with Open XML is that of packages and parts. Open XML documents are stored in packages (essentially zip files), that contain parts (the files in the zip files). Most of the parts are XML documents, with exceptions such as images, video, or music. There is plenty of info online on how to program with packages and parts, but I've summarized the basics of Packages and Parts in this page. I include a small C# app to dump the relationships, relationship types, content types, and more from the package.

OpenXmlDocument Class

The OpenXmlDocument class creates a hierarchical object graph on which it is easy to write LINQ to XML queries. All parts that contain XML have been serialized from the Package, and an XDocument XML tree created for each of them. This allows us to write LINQ to XML queries over multiple parts easily.

Using the OpenXmlDocument Class

This topic shows the simplest use of the OpenXmlDocument class to dump the relationships and part information. In addition, it counts the number of descendant nodes in the populated XDocument object.

Writing LINQ to XML Queries using the OpenXmlDocument Class

Finally, this topic shows some non-trivial (but relatively short) queries that use multiple parts of the Open XML document. The queries retrieve the paragraphs of the document, the style of each paragraph, and the text of each paragraph.