Using the Open XML SDK

[Blog Map]  [Table of Contents]  [Next Topic]

Open XML Packages

To follow this tutorial, you don't need to delve into all of the details of working with packages.  This topic presents a small chunk of code that you can use as boilerplate code – it opens a word document and retrieves the main part, the style part, and the comment part.  It uses LINQ to XML to count the XML nodes in the three parts, and prints the counts to the console.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCThe boiler plate code uses the Open XML SDK, a set of managed classes for .NET that provides more convenient access to Open XML documents.  Using the SDK, you can get the main part of the document, and navigate to related parts more easily.  It cuts down your code by quite a bit.  This blog post is a summary of the differences between the classes in System.IO.Packaging and the classes in the Open XML SDK.  This example uses the the Open XML SDK v1.0.  This blog post gives lots of information about the Open XML SDK, including where to download it.

Before attempting to compile, don't forget to:

·         Add a reference to the WindowsBase assembly.

·         Download and install the Open XML SDK.

·         Add a reference to the DocumentFormat.OpenXml assembly.

For the interested:

Just a few points about packages.  Various parts in the package are related.  You never rely on absolute paths to retrieve a part, even if you know the path.  Instead, you start from the main part, and use relationships to navigate to the other parts.  As mentioned, many of these parts are XML documents, including files that specify the relationships between parts.  You can access the parts and the relationship files using any conformant XML parser and a library that can open and read from ZIP files.  However, the classes in the namespace System.IO.Packaging (in the WindowsBase assembly) allow you to work with packages in a more convenient way.  You can see a quick summary of how to use relationships to navigate from part to part here.

The following code is attached to this page.  Here is the boiler plate code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Xml;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;

class Program
{
public static XDocument LoadXDocument(OpenXmlPart part)
{
XDocument xdoc;
using (StreamReader streamReader = new StreamReader(part.GetStream()))
xdoc = XDocument.Load(XmlReader.Create(streamReader));
return xdoc;
}

static void Main(string[] args)
{
const string filename = "SampleDoc.docx";

using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(filename, true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
StyleDefinitionsPart styleDefinitionsPart = mainPart.StyleDefinitionsPart;
WordprocessingCommentsPart commentsPart = mainPart.CommentsPart;
XDocument xDoc = LoadXDocument(mainPart);
XDocument styleDoc = LoadXDocument(styleDefinitionsPart);
XDocument commentsDoc = LoadXDocument(commentsPart);
Console.WriteLine("The main document part has {0} nodes.", xDoc.DescendantNodes().Count());
Console.WriteLine("The style part has {0} nodes.", styleDoc.DescendantNodes().Count());
Console.WriteLine("The comments part has {0} nodes.", commentsDoc.DescendantNodes().Count());
}
}
}

[Blog Map]  [Table of Contents]  [Next Topic]

UsingTheOpenXmlSdk.cs