Technical Improvements in the Open XML SDK

[Blog Map]  This blog is inactive.  New blog: EricWhite.com/blog

(Note: July 9, 2008 - I've written a new post that shows an even better way to implement functionality like this.) 

Sometimes I get to write a blog post that is really fun to write, and this is one of them.  This particular subject started brewing in my mind last November and December, before I started in my current job.  At the time, I was writing some code to see the most effective and approachable way to access Open XML documents using LINQ to XML.

One of the problems that I ran into is that after I had populated an XML tree from a part, there was no good place to keep that populated XDocument.  It would be possible to keep it in a dictionary, and then look it up from the part every time you need it, but this didn't appeal to me.  However, if the Open XML SDK had annotations, in the style of LINQ to XML, then after populating an XDocument from a part, we can attach the XDocument to the part.  Before populating the XDocument, we first check to see if we already have one.  Well, annotations have been added to the April 2008 CTP of the Open XML SDK.

This makes it easier to deal with the XML contained in the parts.  All a developer needs to do is to load the WordprocessingDocument, and get the XDocument for specific parts as necessary.  If the XDocument has already been loaded, the work to load it will not be repeated.

There are more sophisticated uses of this new feature.  One possible enhancement: automatically reserialize the XDocument objects back to the package if the XDocument was changed.  I'll be blogging more on this.

In the following example, I've written an extension method, GetXDocument, that you can call on any OpenXmlPart.  You can see how this method uses annotations.

public static XDocument GetXDocument(this OpenXmlPart part)
{
XDocument xdoc = part.Annotation<XDocument>();
if (xdoc != null)
return xdoc;
using (StreamReader streamReader = new StreamReader(part.GetStream()))
xdoc = XDocument.Load(XmlReader.Create(streamReader));
part.AddAnnotation(xdoc);
return xdoc;
}

Here is the entire example:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using Microsoft.Office.DocumentFormat.OpenXml.Packaging;
using System.Xml;
using System.Xml.Linq;
namespace OpenXmlSdkExample
{
public class Comment
{
public int Id { get; set; }
public string Text { get; set; }
public string Author { get; set; }
public Paragraph Parent { get; set; }
public Comment(Paragraph parent) { Parent = parent; }
}
public class Paragraph
{
public XElement ParagraphElement { get; set; }
public string StyleName { get; set; }
public string Text { get; set; }
public IEnumerable<Comment> Comments()
{
XNamespace w =
"https://schemas.openxmlformats.org/wordprocessingml/2006/main";
XElement p = ParagraphElement;
var commentIds = p
.Elements(w + "commentRangeStart")
.Attributes(w + "id")
.Select(c => (int)c);
return
commentIds
.Select(i =>
new Comment(this)
{
Id = i,
Author =
Parent.MainDocumentPart.CommentsPart.GetXDocument()
.Root
.Elements(w + "comment")
.Where(c => (int)c.Attribute(w + "id") == i)
.First()
.Attribute(w + "author")
.Value,
Text =
Parent.MainDocumentPart.CommentsPart.GetXDocument()
.Root
.Elements(w + "comment")
.Where(c => (int)c.Attribute(w + "id") == i)
.First()
.Descendants(w + "p")
.Select(run => run
.Descendants(w + "t")
.StringConcatenate(e => (string)e)
+ "n")
.Aggregate(new StringBuilder(), (sb, v) => sb.Append(v), sb => sb.ToString())
.Trim()
}
);
}
public WordprocessingDocument Parent { get; set; }
public Paragraph(WordprocessingDocument parent) { Parent = parent; }
}
public static class LocalExtensions
{
public static string DefaultStyle(this WordprocessingDocument doc)
{
XNamespace w =
"https://schemas.openxmlformats.org/wordprocessingml/2006/main";
XDocument styleXDocument = doc.MainDocumentPart.StyleDefinitionsPart.GetXDocument();
return (string)(
from style in styleXDocument.Root.Elements(w + "style")
where (string)style.Attribute(w + "type") == "paragraph" &&
(string)style.Attribute(w + "default") == "1"
select style
).First().Attribute(w + "styleId");
}
public static IEnumerable<Paragraph> Paragraphs(this WordprocessingDocument doc)
{
// a good convention to use is to name the XNamespace
// variable with the same name as the namespace prefix,
// and to name XName variables with the local name of the element
XNamespace w =
"https://schemas.openxmlformats.org/wordprocessingml/2006/main";
XName r = w + "r";
XName ins = w + "ins";
string defaultStyle = doc.DefaultStyle();
// query for all paragraphs in the document.
return
from p in doc
.MainDocumentPart
.GetXDocument()
.Root
.Element(w + "body")
.Descendants(w + "p")
let styleNode = p
.Elements(w + "pPr")
.Elements(w + "pStyle")
.FirstOrDefault()
select new Paragraph(doc)
{
ParagraphElement = p,
StyleName = styleNode != null ?
(string)styleNode.Attribute(w + "val") :
defaultStyle,
// in the following query, we need to select both
// the r and ins elements in order to assemble the text
// properly for paragraphs that have tracked changes.
Text = p
.Elements()
.Where(z => z.Name == r || z.Name == ins)
.Descendants(w + "t")
.StringConcatenate(element => (string)element)
};
}
public static XDocument GetXDocument(this OpenXmlPart part)
{
XDocument xdoc = part.Annotation<XDocument>();
if (xdoc != null)
return xdoc;
using (StreamReader streamReader = new StreamReader(part.GetStream()))
xdoc = XDocument.Load(XmlReader.Create(streamReader));
part.AddAnnotation(xdoc);
return xdoc;
}
public static string StringConcatenate<T>(this IEnumerable<T> source,
Func<T, string> func)
{
StringBuilder sb = new StringBuilder();
foreach (T item in source) sb.Append(func(item));
return sb.ToString();
}
}
class Program
{
static void Main(string[] args)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open("Test.docx", true))
{
Console.WriteLine(wordDoc.DefaultStyle());
foreach (var p in wordDoc.Paragraphs())
Console.WriteLine("{0}:{1}", p.StyleName.PadRight(20), p.Text);
}
}
}
}