Open XML SDK 2.0 Architecture

In my first post on the Open XML SDK, I talked about the overall design of the SDK with respect to goals and scenarios. Today, I am going to talk about the architecture of the SDK in terms of its different components.

The SDK Architecture

The Open XML SDK is designed and implemented in a layered approach, starting from a base layer moving towards higher level functionality, such as validation. The following diagram illustrates an overview of the Open XML SDK components.

The System Support layer contains the fundamental components that the SDK is built upon. The Open XML File Format Base Level layer is the core foundation of the SDK. This layer provides you functionality to create Open XML packages, add or remove parts, and read/add/remove/manipulate xml elements and attributes. The Open XML File Format Higher Level layer is the last layer in our architecture. This layer provides functionality to make it easier for you to code against Open XML formats. For example, one idea is to have this layer contain schema and semantic level validation to help assist you in generating proper and valid Open XML files. These layers and components are described in more detail below.

Note: Version 1.0 of the SDK only provides the Open XML Packaging API component, whereas version 2.0 of the SDK provides all components built on top of the System Support layer.

System Support Layer

The System Support layer consists of the following components:

  • .Net Framework 3.5 – The Open XML SDK leverages the advanced technology provided by .Net Framework 3.5, especially LINQ to XML, which makes manipulating XML much easier and more intuitive
  • System.IO.Packaging – The Open XML SDK needs to be able to add/remove parts contained within Open XML Format packages. Included as part of .Net Framework 3.0 were a set of generic packaging APIs capable of adding removing parts of Open Package Convention (OPC) conforming packages. Given that Open XML Formats are based on OPC, the SDK uses System.IO.Packaging APIs to open, edit, create, and save Open XML packages
  • Open XML Schemas – The Open XML SDK is based on Open XML Formats, which are represented and described as schemas. These schemas make up the foundation of the Open XML SDK. Currently the Open XML SDK is based on Ecma 376. We will add support for IS 29500 as soon as the standard is made public

Open XML File Format Base Level layer

The Open XML File Format Base Level layer provides a platform for Open XML developers to create Open XML solutions and consists of the following components:

  • Open XML Packaging API – This component is built on top of the .Net Framework 3.0 System.IO.Packaging component. Instead of providing generic access to the parts contained in the Open XML Package, this component allows developers to manipulate Open XML parts with strongly typed classes and objects. This component has already shipped as the Open XML SDK v1.0. Below is example code that illustrates using this component to open and manipulate a WordprocessingML document.

//Open and manipulate temp.docx

using (WordprocessingDocument myDoc =

WordprocessingDocument.Open("temp.docx", true))

{

//Access main part of document

MainDocumentPart mainPart = myDoc.MainDocumentPart;

//Add new comments part to document

mainPart.AddNewPart<WordprocessingCommentsPart>();

//Delete Styles part within document

mainPart.DeletePart(mainPart.StyleDefinitionsPart);

//Iterate through all custom xml parts within document

foreach (CustomXmlPart customXmlPart in

mainPart.CustomXmlParts)

{

//DO SOMETHING

}

}

  • Open XML Low Level DOM – This component represents the xml wrapper of the Open XML schemas. You are able to use this component to manipulate the Open XML tree directly by working with strongly typed objects and classes instead of traditional XML nodes that require you to be aware of namespaces as well as element/attribute names. The major advantage of having strongly typed classes and objects is that you can easily see what properties are defined on a given class through intellisense. For example, you will know exactly what properties and children can exist off of a Paragraph object. This component leverages many of the designs of LINQ. Below is example code that illustrates using this component to create a WordprocessingML document with the text "Hello World!"

// Create a Wordprocessing document.

using (WordprocessingDocument myDoc =

WordprocessingDocument.Create(docName, WordprocessingDocumentType.Document))

{

// Add a new main document part

MainDocumentPart mainPart = myDoc.AddMainDocumentPart();

//Create DOM tree for simple document

mainPart.Document = new Document();

Body body = new Body();

Paragraph p = new Paragraph();

Run r = new Run();

Text t = new Text("Hello World!");

//Append elements appropriately

r.Append(t);

p.Append(r);

body.Append(p);

mainPart.Document.Append(body);

// Save changes to the main document part

mainPart.Document.Save();

}

  • Stream Reading/Writing – This component includes stream reader and writer interfaces specifically targeting Open XML elements and attributes. The readers and writers behave similar to XmlReader/XmlWriter, but are easier to use since the interfaces are Open XML aware

All the components mentioned above are available in our first CTP of version 2.0 of the SDK, which you can download here.

Open XML File Format Higher Level Layer

Note: This layer has not yet been implemented in version 2.0 of the SDK

We are still in the thinking process for the Open XML File Format Higher Level layer, but one thought is to have this layer provide functionality to help you debug and validate Open XML files. With this in mind, we might be able to provide the following components:

  • Schema Level Validation – Manipulating Open XML Formats by using the Open XML Base layer makes it much easier for you to work on the Open XML files, but doing so does not guarantee the production of valid Open XML files. This component would assist you in debugging and validating Open XML documents based on the Open XML schemas

  • Additional Semantic Validation – This component would be similar to the Schema Level Validation component except that it would provide additional information based on semantic and syntax constrains as defined by the Open XML standard. For example, for a comment to work as expected within a WordprocessingML document, the comment needs to be defined in the comments part as well as be marked appropriately in the main document story, otherwise the comment is ignored.

    Since this type of information cannot be represented in XSD files, you are also required to leverage the prose within the standard. With this potential layer you would be able to leverage the SDK to cover much of this manual work.

Helper Function Layer

Note: This layer has not yet been implemented in version 2.0 of the SDK

As with the Open XML File Format Higher Level layer we are still in the thinking process for the Helper Function Layer. We envision this layer as a way to provide helper functions or code snippets to make your life a bit easier in creating valid Open XML files. Certain operations within Open XML can be somewhat complex. For example, deleting a paragraph in a WordprocessingML document is not simply just deleting the paragraph node. There are a variety of extra steps required to delete a paragraph and maintain the integrity of a valid Open XML document.

One thought is that the SDK could provide higher level helper functions or code snippets that can deal with common complex file format operations. These helper functions or snippets would make the appropriate xml and part/relationship modifications when performing complex tasks. These helper functions or snippets would not abstract away from the actual xml itself, but rather perform operations on the xml elements by taking advantage of the validation awareness. For example, a potential helper function for deleting a WordprocessingML paragraph would perform this delete operation and do the necessary extra steps to clean the resulting xml to ensure validity. These delete helper functions or snippets could be applied to other elements that are hard to delete, like tables and comments. In other words, these higher level functions or snippets would perform directly on the xml elements and would be constrained, in terms of functionality, by the file format standard itself.

Next Time

Now that we have gone over the basics of the SDK we are ready to talk about solutions and end-to-end scenarios. In my next few posts I am going to walk through solutions to some key scenarios, like document assembly and manipulation.

Let me know if you have any specific questions or comments that you would like me to address here or in future posts. Feel free to send post requests my way related to specific scenarios that you might be interested in learning more about.

Zeyad Rajabi