Using the Open XML SDK and LINQ to XML to Remove Comments from an Open XML Wordprocessing Document

This post presents a snippet of code to remove comments from an Open XML Wordprocessing document.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCNote: This post may be of interest to LINQ to XML developers, as it contains some information that helps you write queries that perform better.  In the case of very large documents, the approach described below performs much better than other approaches.

The code is very simple: remove all w:commentRangeStart, w:commentRangeEnd, and w:commentReference elements in the main document part, and then remove the comment part.

The following is the code that removes the above mentioned elements.

// pre-atomize the XName objects so that they are not atomized for every item in the collection
XName commentRangeStart = w + "commentRangeStart";
XName commentRangeEnd = w + "commentRangeEnd";
XName commentReference = w + "commentReference";
mainDocumentXDoc.Descendants()
.Where(x => x.Name == commentRangeStart ||
x.Name == commentRangeEnd ||
x.Name == commentReference)
.Remove();

mainDocumentXDoc

    .Descendants(w + "commentRangeStart")

    .Remove();

mainDocumentXDoc

    .Descendants(w + "commentRangeEnd")

    .Remove();

mainDocumentXDoc

    .Descendants(w + "commentReference")

    .Remove();

Of course, this causes iteration of all of the descendants three times, not very desirable for large documents.

So, keeping this in mind, you might write it like this:

mainDocumentXDoc.Descendants()

    .Where(x => x.Name == w + "commentRangeStart" ||

        x.Name == w + "commentRangeEnd" ||

        x.Name == w + "commentReference")

    .Remove();

This causes iterations of the Descendants axis only once.  However, there is a subtler performance issue here: the names (as expressed by w + "commentRangeStart", etc.) are atomized over and over again for every item in the Descendants axis.  To make the code perform as well as possible, we pre-atomize the XName objects, then we use them in the call to the Where extension method:

XName commentRangeStart = w + "commentRangeStart";

XName commentRangeEnd = w + "commentRangeEnd";

XName commentReference = w + "commentReference";

mainDocumentXDoc.Descendants()

    .Where(x =>

       x.Name == commentRangeStart ||

       x.Name == commentRangeEnd ||

       x.Name == commentReference)

    .Remove();

For more detailed information about atomization and LINQ to XML performance, see Performance of LINQ to XML.

The attached code also has a bool method that indicates whether the document contains comments.

Code is attached.

RemoveComments.cs