Using LINQ to XML to Retrieve Content Controls in Word 2007

Content controls are an effective way to add structure to word processing documents.  You can write a very small LINQ query to retrieve the contents of content controls.  This topic in Office Online provides more information on content controls.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCIn an upcoming post (now written), I’m going to show how you can write a small test harness to test code that is embedded in documents.  This is particularly useful, say, to a program manager who has written a specification that contains a lot of code that shows how to use a programming interface.  As we all know, programming interface designs often change during development, and it is a problem to keep the code in the specification current.  Using this approach, after getting a new drop of the library or assembly being developed, the program manager can run this tool and validate that the code in the specification still works.  In my upcoming post that shows how to test code embedded in documents, I’ll be using content controls to delimit the code to be tested.

Content controls in Word 2007 are useful out-of-the-box – you don’t need to write any code to take advantage of them in a variety of ways.  However, once you add the programmability dimension using Open XML, it opens up a lot of possibilities.  For example, you could write some code so that when you check a document into a SharePoint document library, the document is automatically emailed to various interested parties that are enumerated in a content control, but only if another content control indicates to do so.  Content controls eliminate the need to do some kind of hack, like parsing a document based on paragraph styles.

To add a content control, you must turn on the developer tab in the ribbon.  To turn on the developer tab, open the Word Options dialog box, and click “Show developer tab in the ribbon”.  Then, to add the content control, select the text that you wish to be inside the content control, and click one of the buttons in the developer tab that adds a content control.  To try out the code in this post, add a rich text content control.

After you have added a content control, you can set properties for it (click on the Properties button in the ribbon):

Then, when the insertion point is inside of the content control, you can see the title of it:

I’ll make use of the title of the content control in the upcoming code testing example.

Here is the LINQ to XML code to get the contents of the content control.  Note that this code uses the Open XML SDK.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;

public static class LocalExtensions
{
public static string StringConcatenate<T>(this IEnumerable<T> source,
Func<T, string> func)
{
StringBuilder sb = new StringBuilder();
foreach (T item in source)
sb.Append(func(item));
return sb.ToString();
}

public static string StringConcatenate(this IEnumerable<string> source)
{
StringBuilder sb = new StringBuilder();
foreach (string item in source)
sb.Append(item);
return sb.ToString();
}

public static XDocument GetXDocument(this OpenXmlPart part)
{
XDocument xdoc = part.Annotation<XDocument>();
if (xdoc != null)
return xdoc;
using (StreamReader sr = new StreamReader(part.GetStream()))
using (XmlReader xr = XmlReader.Create(sr))
xdoc = XDocument.Load(xr);
part.AddAnnotation(xdoc);
return xdoc;
}
}

class Program
{
private static XNamespace w = "https://schemas.openxmlformats.org/wordprocessingml/2006/main";
private static XName r = w + "r";
private static XName ins = w + "ins";

static string GetTextFromContentControl(XElement contentControlNode)
{
return contentControlNode.Descendants(w + "p")
.Select
(
p => p.Elements()
.Where(z => z.Name == r || z.Name == ins)
.Descendants(w + "t")
.StringConcatenate(element => (string)element) + Environment.NewLine
).StringConcatenate();
}

static void Main(string[] args)
{
using (WordprocessingDocument doc = WordprocessingDocument.Open("Test.docx", false))
{
var build = doc.MainDocumentPart
.GetXDocument()
.Descendants(w + "sdt")
.Where(e => ((string)e.Elements(w + "sdtPr")
.Elements(w + "alias")
.Attributes(w + "val")
.FirstOrDefault()).ToLower() == "build")
.Select(b => GetTextFromContentControl(b));

foreach (var b in build)
Console.WriteLine(b);
}
}
}

Code and the test document are attached.

ContentControls.zip