Processing all Content Parts in an Open XML WordprocessingML Document


In Open XML WordprocessingML documents, there are five types of parts that can contain content such as paragraphs (with or without tracked revisions), tables, rows, cells, and any of a variety of content controls:

  • This blog is inactive.
    New blog: EricWhite.com/blog

    Blog TOC
    Main document part
  • Header parts (there can be more than one)
  • Footer parts (there can be more than one)
  • Endnotes (there can be zero or one)
  • Footnotes (there can be zero or one)

There are certain Open XML programming scenarios where you need to process all varieties of parts that contain content:

  • You need to search for specific words in a document, regardless of where those words occur.
  • You need to accept tracked changes anywhere they appear in the document.
  • You need to process content controls anywhere they occur in the document, perhaps to bind them to XML in a custom XML part.

The following example shows how to search for all content controls in a document, regardless of whether those content controls are in the main document part, in the headers/footers, or in endnotes/footnotes.  This example uses LINQ to XML.  If you are using the strongly-typed OM of the Open XML SDK, the code would be identical, except for the code to actually process the content controls.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;

public static class Extensions
{
    public static XDocument GetXDocument(this OpenXmlPart part)
    {
        XDocument partXDocument = part.Annotation<XDocument>();
        if (partXDocument != null)
            return partXDocument;
        using (Stream partStream = part.GetStream())
        using (XmlReader partXmlReader = XmlReader.Create(partStream))
            partXDocument = XDocument.Load(partXmlReader);
        part.AddAnnotation(partXDocument);
        return partXDocument;
    }
}

class Program
{
    private static void IterateContentControlsForPart(OpenXmlPart part)
    {
        XNamespace w = “http://schemas.openxmlformats.org/wordprocessingml/2006/main”;
        XDocument doc = part.GetXDocument();
        foreach (var sdt in doc.Descendants(w + “sdt”))
        {
            Console.WriteLine(“Found content control”);
            Console.WriteLine(“=====================”);
            Console.WriteLine(sdt.ToString());
            Console.WriteLine();
        }
    }

    public static void IterateContentControls(WordprocessingDocument doc)
    {
        IterateContentControlsForPart(doc.MainDocumentPart);
        foreach (var part in doc.MainDocumentPart.HeaderParts)
            IterateContentControlsForPart(part);
        foreach (var part in doc.MainDocumentPart.FooterParts)
            IterateContentControlsForPart(part);
        if (doc.MainDocumentPart.EndnotesPart != null)
            IterateContentControlsForPart(doc.MainDocumentPart.EndnotesPart);
        if (doc.MainDocumentPart.FootnotesPart != null)
            IterateContentControlsForPart(doc.MainDocumentPart.FootnotesPart);
    }

    static void Main(string[] args)
    {
        using (WordprocessingDocument doc = WordprocessingDocument.Open(“Test.docx”, false))
            IterateContentControls(doc);
    }
}

Comments (1)

  1. John Holliday says:

    Eric,

    I'm curious why did you not include the following items:

    doc.MainDocumentPart.CustomXmlParts

    doc.MainDocumentPart.WordprocessingCommentsPart

    -John