Simplifying Open XML WordprocessingML Queries by First Accepting Revisions

Revision tracking is one of the more involved areas of the Open XML standard.  There are over 40 elements and attributes (some with very involved semantics) that define tracked revisions.  I've written an MSDN article, Accepting Revisions in Open XML Word-Processing Documents, on the exact semantics of revision tracking markup.  By first accepting revisions, you eliminate the need to process those many elements and attributes in order to retrieve the contents of the document.  However, you may not want to modify the document on disk.  It is easy to write code to pull the document into memory and accept revisions without touching the original document on disk.  This post presents a couple of examples that show how to do this using the Open XML SDK.

This is one in a series of posts on transforming Open XML WordprocessingML to XHtml.  You can find the complete list of posts here.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC

Note: The RevisionAccepter class in PowerTools for Open XML provides a complete implementation of accepting revisions.  You can download the RevisionAccepter by going to PowerTools for Open XML, clicking on the Downloads tab, and downloading RevisionAccepter.zip.  PowerTools for Open XML is licensed under the Microsoft Public License (Ms-PL), which gives you wide latitude in how you use the code.

The gist of the technique is to read the document into a byte array, create a resizable memory stream, write the byte array into the memory stream, and then open the document from the memory stream.  By using this technique, we can write queries that don't need to take revision tracking into account, but we won't touch the original document on disk.  The following example shows how to do this:

using System;
using System.IO;
using System.Linq;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using OpenXmlPowerTools;

class Program
{
static void Main(string[] args)
{
byte[] byteArray = File.ReadAllBytes("Test.docx");
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(memoryStream, true))
{
RevisionAccepter.AcceptRevisions(wordDoc);

// Print the markup for the first paragraph after
// revisions have been accepted.
XDocument xdoc = wordDoc.MainDocumentPart.GetXDocument();
XElement para1 = xdoc.Root.Element(W.body).Elements(W.p).FirstOrDefault();
if (para1 != null)
Console.WriteLine(para1);
}
}
}
}

Even though the RevisionAccepter class is written using LINQ to XML, this technique works equally well when used with the strongly-typed object model of the Open XML SDK 2.0:

using System;
using System.IO;
using System.Linq;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using OpenXmlPowerTools;

class Program
{
static void Main(string[] args)
{
byte[] byteArray = File.ReadAllBytes("Test.docx");
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(memoryStream, true))
{
RevisionAccepter.AcceptRevisions(wordDoc);

// Print the markup for the first paragraph after
// revisions have been accepted.
Paragraph para1 = wordDoc.MainDocumentPart.Document.Body
.Elements<Paragraph>().FirstOrDefault();
if (para1 != null)
Console.WriteLine(XElement.Parse(para1.OuterXml));
}
}
}
}