Remove Rsid Attributes and Elements before Comparing Open XML Documents

A convenient way to explore Open XML markup is to create a small document, modify the document slightly in the Word user interface, save it, and then compare it with the Open XML Diff utility that comes with the Open XML SDK V2.  However, Word adds extraneous elements and attributes that enable merging of two documents that have forked.  These elements and attributes show up as changed, and obscure the differences that we’re looking for.  An easy way to deal with this is to remove these elements and attributes before comparing documents.  We can safely do so without changing the content of the document.  This post presents a bit of code to do this.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCFor more information on rsid elements and attributes, see Brian Jones’s blog post on them.

This post also contains two of my most commonly used little extension methods – to get an XDocument from an Open XML part, and to save that XDocument back into the word processing document.  The XDocument is stored as an annotation on the Open XML part.

This little program takes any number of files as arguments, and strips these extraneous elements and attributes from each of the files.  Its use:

C:\> RemoveRsid Test1.docx Test2.docx

Here is the listing of this program (code is attached to this post, as well):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Xml;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;

public static class LocalExtensions
{
public static XDocument GetXDocument(this OpenXmlPart part)
{
XDocument xdoc = part.Annotation<XDocument>();
if (xdoc != null)
return xdoc;
using (StreamReader streamReader = new StreamReader(part.GetStream()))
xdoc = XDocument.Load(XmlReader.Create(streamReader));
part.AddAnnotation(xdoc);
return xdoc;
}

public static void SaveXDocument(this OpenXmlPart part)
{
XDocument xdoc = part.Annotation<XDocument>();
if (xdoc != null)
{
using (XmlWriter xw =
XmlWriter.Create(part.GetStream(FileMode.Create, FileAccess.Write)))
xdoc.WriteTo(xw);
}
}
}

class Program
{
// get rid of every rsid attribute/element in the doc.
// they exist to enable merging of forked documents; not something
// we're interested in here. if we don't delete these nodes, they
// show up as changed.
private static void CleanUp(XDocument doc)
{
XNamespace w =
"https://schemas.openxmlformats.org/wordprocessingml/2006/main";
doc.Descendants().Attributes(w + "rsidTr").Remove();
doc.Descendants().Attributes(w + "rsidSect").Remove();
doc.Descendants().Attributes(w + "rsidRDefault").Remove();
doc.Descendants().Attributes(w + "rsidR").Remove();
doc.Descendants().Attributes(w + "rsidDel").Remove();
doc.Descendants().Attributes(w + "rsidP").Remove();
doc.Descendants(w + "rsid").Remove();
}

static void Main(string[] args)
{
foreach (var file in args)
{
using (WordprocessingDocument doc =
WordprocessingDocument.Open(file, true))
{
XDocument xDoc = doc.MainDocumentPart.GetXDocument();
CleanUp(xDoc);
doc.MainDocumentPart.SaveXDocument();

foreach (var h in doc.MainDocumentPart.HeaderParts)
{
xDoc = h.GetXDocument();
CleanUp(xDoc);
h.SaveXDocument();
}

foreach (var f in doc.MainDocumentPart.FooterParts)
{
xDoc = f.GetXDocument();
CleanUp(xDoc);
f.SaveXDocument();
}
}
}
}
}

RemoveRsid.cs