Announcing the Release of the August 2009 CTP for the Open XML SDK

I'm really happy to announce the release of the 3rd CTP for the Open XML SDK 2.0 for Microsoft Office! So what did we do in this CTP? Well, there were three main improvements we made to the SDK:

  1. Add semantic level validation support
  2. Add markup compatibility/extensibility support
  3. General improvements based on your feedback

Semantic Level Validation Support

Let's go back to the Open XML SDK architecture diagram I showed you when we first announced the Open XML SDK:

As mentioned in a previous post, the April 2009 CTP of the Open XML SDK added schema level validation support for Office 2007 Open XML files. In the August 2009 CTP, one of the big things we added is semantic level validation support for Office 2007 Open XML files:

Semantic level validation goes beyond restrictions or rules defined by schemas. Semantic level validation allows developers to validate files against restrictions defined within the prose of the Open XML documentation. These are restrictions, which cannot be expressed in an XSD language.

Let's look at a semantic level restriction example. Specifically, let's look at the element endnote (Section 17.11.2 of Part 1 in the ISO/IEC-29500 specification). In the standard, it states that the id attribute of endnote, "specifies a unique ID which shall be used to match the contents of a footnote or endnote to the associated footnote/endnote reference mark … If more than one footnote shares the same ID, then this document shall be considered non-conformant. If more than one endnote shares the same ID, then this document shall be considered non-conformant." As you can see, having more than one endnote with the same id value will result in a non-conformant document. This non-conformant document may not be interpreted properly by a consuming application, like Word.

The Open XML SDK can now help you find these types of problems and will report the error to you by giving you the following information:

  1. User friendly description of the error
    • In this case, imagine seeing the following error "Attribute 'id' should have unique value. Its current value '1' duplicates with others."
  2. An Xpath to the exact location of the error
    • In this case, imagine seeing the following path "/w:endnotes[1]/w:endnote[4]," which indicates that the problem exists in the fourth endnote element
  3. The part where this error exists
    • In this case, imagine seeing the following part information "DocumentFormat.OpenXml.Packaging.EndnotesPart"

We hope that you can use this type of information to more easily find and fix problems. I will devote at least one blog post in the future to go into details on the validation functionality.

Markup Compatibility/Extensibility Support

As defined by the ISO/IEC-29500 specification, there are several ways to extend markup within the Open XML formats. Some of the extension mechanisms, like ignorable content and alternate content blocks, may result in differences within the XML tree structure of a document. Here is an example of markup that contains an alternate content block:

<w:document mc:Ignorable="w14 wp14">

<w:body>

<w:p w:rsidR="00FA0A01" w:rsidRDefault="00AF5A8F">

<w:r>

<w:rPr/>

<mc:AlternateContent>

<mc:Choice Requires="wps">

<w:drawing>…… </w:drawing>

</mc:Choice>

<mc:Fallback>

<w:pict>

<v:roundrect id="Rounded Rectangle 1" o:spid="_x0000_s1026" style="position:absolute… " arcsize="10923f" o:gfxdata="…" fillcolor="#4f81bd" strokecolor="#385d8a" strokeweight="2pt">

<v:textbox style="mso-rotate-with-shape:t"/>

</v:roundrect>

</w:pict>

</mc:Fallback>

</mc:AlternateContent>

</w:r>

</w:p>

<w:sectPr w:rsidR="00FA0A01">…… </w:sectPr>

</w:body>

</w:document>

In the example above, the expected child of the run element differs depending on the chosen alternate content choice. The fallback choice is what one would expect from a document created in Office 2007, while the choice requiring the wps namespace is from a document created in Office 2010. Imagine you are a solution developer working with Open XML who has deployed a solution that works perfectly on top of Office 2007 Open XML files. How would your solution work with files coming in from Office 2010? Specifically, would your solution work with documents that contain these types of extension mechanisms?

As part of the August 2009 CTP we have added functionality that allows developers to abstract away some of the difficulty intrinsic with markup compatibility and extensibility. This feature allows you to preprocess the content of Open XML files based on specific Office versions. Using the example above, if we use the August CTP to open the document based on Office 2007 we will only see the following XML markup:

<w:document>

<w:body>

<w:p w:rsidR="00FA0A01" w:rsidRDefault="00AF5A8F">

<w:r>

<w:rPr/>

<w:pict>

<v:roundrect id="Rounded Rectangle 1" o:spid="_x0000_s1026" style="position:absolute… " arcsize="10923f" o:gfxdata="…" fillcolor="#4f81bd" strokecolor="#385d8a" strokeweight="2pt">

<v:textbox style="mso-rotate-with-shape:t"/>

</v:roundrect>

</w:pict>

</w:r>

</w:p>

<w:sectPr w:rsidR="00FA0A01">…… </w:sectPr>

</w:body>

</w:document>

If your solution expected a pict element as a child of a run element, then your solution would work perfectly with this file. In other words, using this feature, solutions won't break when future versions of Office introduce new markup into the format.

General Improvements

First off we want to thank everyone for their feedback and suggestions! Based on your feedback we made the following big changes to the SDK:

  • AutoSave: By default, previous CTPs of the SDK forced you to perform a manual save for changes made to specific parts within the package. We have now introduced the concept of AutoSave, where changes would automatically be saved into the package, without the need to call Save() methods. For those not interested in this functionality, there is a way to turn off this feature
  • Base Classes for Sdt objects: The SDK currently has multiple classes to represent Sdt objects based on the different types of elements specified in the standard. The August 2009 CTP has introduced one base class for each of these objects in order to make it easier for you to develop solutions. In other words, your solution can now just work on the following abstract class:  SdtElement for Sdt objects
  • Simple types for Boolean type attributes: The standard specifies the concept of a simple type called ST_OnOff, which allows for values like "On", "Off", "True", "False", "0", and "1." We have updated the SDK to allow you to directly get/set such attributes using standard C# Boolean values. For example, you can now set attribute values to false or true. Without this enhancement you were forced to compare values using the enum BooleanValues

What's Next?

Our next task for the SDK is to add Office 2010 Office Open XML support. Expect to see another CTP in the next several months released with this functionality. Our goal is to be done with the Open XML SDK 2.0 around the same time as Office 2010 ships (date not public yet).

More Feedback Always Welcome

Please continue to send us your feedback, either on this blog or at our Microsoft Connect site for the Open XML SDK https://connect.microsoft.com/site/sitehome.aspx?SiteID=589. We look forward to hearing from you.

Zeyad Rajabi