Open XML SDK 2.0 CTP Available

The Open XML SDK 2.0 Community Technology Preview (CTP) is here!

You can find the documentation for it here and the SDK download here.

Zeyad Rajabi, a Program Manager for the Open XML SDK, has begun a series of posts covering the Open XML SDK design goals and architecture, sprinkling in some sample code as he goes along.

Also, Eric White, a Microsoft Technical Evangelist, has a great write up on the new version of the SDK.

This new version of the SDK is an amazing leap forward.The Open XML SDK version 1 greatly simplified working with packages. Developers could manipulate Open XML file format compliant documents at the package and part levels using strongly typed .NET classes. To access the file formats at the element level, you still had to work directly with the underlying XML.

In the Open XML SDK version 2, the development team has taken most of the elements in the various schemas for the Open XML file formats (WordprocessingML, SpreadsheetML, PresentationML, etc., etc.) and made first-class managed objects out of them. Not to mention the fact that they've "linq-ified" the entire API so you have the power of Linq, as well.

This makes things much easier if, like me, working with XML directly isn’t your strong suit. With the Open XML SDK v2, I can work with objects that represent the XML elements instead of having to work in the underlying XML itself (although the SDK also supports LINQ to XML, as well). For instance, suppose I needed to locate the first table in a Word document. I could easily locate the first table present in the document with code like this:

    1: using (WordprocessingDocument theDoc = WordprocessingDocument.Open(location, true))
    2: {
    3:     MainDocumentPart mainPart = theDoc.MainDocumentPart;
    4:  
    5:     Table theTable = mainPart.Document.Descendants<Table>().First();

You can see the advantage of using this new version of the SDK in line 5 of the code snippet where, by specifying that I want to filter for descendants of the Document object (which represents the <w:document> element in the Microsoft implementation of the Open XML WordprocessingML) where the descendant is of type Table (that is, I want all <w:tbl> elements), I can immediately get to the tables in my Word document.  Then, by using the First() extension method, I can select the first table element in the returned list of descendant table elements. All without having to do the detailed work of traversing the underlying XML directly; the API handles the XML work for me.

Or how about trying to get to the text in a specific cell in the Word 2007 table? I’m simplifying things a great deal, but let's assume I know which cell has the data I want (the cell in the 2nd row and 1st column):

    1: TableRow theRow = theTable.Elements<TableRow>().ElementAt(1);
    2:  
    3: TableCell theCell = theRow.Elements<TableCell>().ElementAt(0);
    4:  
    5: string cellText = theCell.InnerText;
    6:  
    7: Console.WriteLine("The 2nd row, 1st cell text is {0}", cellText);

So here I just use the same technique that I used above to find elements of a given type (by using the provided generic methods) in order to find the 2nd row of the table (<TableRow>().ElementAt(1)) and the first cell (<TableCell>().ElementAt(0)). Then I just pull the value of the Cell object's InnerText property and it's done! Although there are other ways to use the SDK to get the same data, you can still see that with only a few lines of code, I'm able to do quite a bit without working directly in the underlying XML. And remember, we can do this type of manipulation WITHOUT running the client application (in this case Microsoft Word 2007).

The Office Client Developer Content team (my team) has some great examples on the MSDN web site here. Check it out!