Open XML Format SDK 2.0


Hello, my name is Zeyad Rajabi and I am a Program Manager on Brian’s team. For the next several posts I will be talking about the Open XML SDK and will show you how to use the SDK to accomplish real world scenarios such as document assembly and document manipulation. Expect to see lots of code samples and demos.


In today’s post, I am going to talk about the overall design of the Open XML SDK with respect to goals and scenarios. In subsequent posts, I will dive more deeply into the architecture of the SDK as well as show you lots of sample code. If you want to jump ahead and get started with the SDK, you can download the latest CTP here. I would also recommend joining the Connect site, found here, to get access to the latest articles, how to topics, and forums.


What is the Open XML SDK?


The Open XML SDK provides a set of .Net APIs that allows developers to create and manipulate documents in the Open XML Formats in both client and server environments without the need of the Office clients. The SDK should make it easier for you to build solutions on top of the Open XML Format by allowing you to perform complex operations, such as creating Open XML packages or adding/deleting tables, with just a few lines of code. Check out the following “hello world” example for a WordprocessingML document:


public void HelloWorld(string docName)


{
  // Create a Wordprocessing document.
 
using (WordprocessingDocument package = WordprocessingDocument.Create(docName, WordprocessingDocumentType.Document))
 
{
   
// Add a new main document part.
   
package.AddMainDocumentPart();

   
// Create the Document DOM.
   
package.MainDocumentPart.Document =
     
new Document
       
new Body
         
new Paragraph
           
new Run
             
new Text(“Hello World!”)))));

   
// Save changes to the main document part.
   
package.MainDocumentPart.Document.Save();
 
}
}


The SDK takes care of both the structure of the Open XML Format as well as the xml contained in each of the parts of the package. In other words, with this SDK, you will be able to add or remove parts within a package as well as manipulate xml constructs, such as paragraphs and tables.


The SDK also supports programming in the style of LINQ to XML, which makes coding against XML content much easier than the traditional W3C XML DOM programming model.


Why Use the Open XML SDK?


Using the Open XML SDK to create solutions that manipulate documents directly has many advantages as compared with automating Microsoft Office applications using macros or VBA. For those of you not familiar with the pains of automating Office applications on the server, check out the following KB article: http://support.microsoft.com/kb/257757.


The major advantage is that the Open XML SDK is fully supported on the server, unlike automating Office applications. That means you can create managed code solutions that are scalable and stable on the server. Imagine being able to write multi-threaded solutions that build on top of the SDK.


In addition, there is a huge performance advantage when developing solutions with the Open XML SDK, which is very evident when dealing with large numbers of documents. You will be able to programmatically generate 1000s of documents based on data from a database within a matter of seconds rather than hours.


Lastly, the Open XML SDK is a dedicated file format API that specializes in the manipulation and creation of Open XML packages. The SDK is fully aware of the structure and schema of Open XML Formats.


The SDK should be the first thing you use when developing Open XML solutions.


What Can’t the Open XML SDK do?


Before we get into the design of the SDK I want to point out a couple of key points of what the SDK will not be able to do:




  • The Open XML SDK is NOT a replacement for the Office Object Model; and provides no abstraction on top of the file formats


    • You need to understand the structure of the file formats to leverage the SDK, it doesn’t hide it from you

  • The SDK does NOT provide functionality to convert Open XML Formats to and from other formats, like HTML or XPS


  • The SDK does NOT guarantee document validity of Open XML Formats when developers use the SDK or if the developer chooses to manipulate the underlying xml directly


    • We are working on providing validation functionality in subsequent CTP releases of version 2.0 of the SDK

  • The SDK does NOT provide application behaviors such as layout (ex. pagination of WordprocessingML documents) or recalculation functionality

Open XML SDK Roadmap


We decided to release the Open XML SDK as two versions:



  1. Version 1.0 – allows for direct manipulation of the Open XML Package at the part level

  2. Version 2.0 – provides strongly typed class support for the underlying XML content contained in each part

In other words, version 1.0 of the SDK deals with the structure or skeleton of Open XML Formats, while version 2.0 of the SDK deals with the xml contained within each of the xml parts. I will show you guys some code of version 1.0 vs. version 2.0 in a later post.


Version 1.0 of the SDK has been fully released with a “go-live” license back in June 2008. With this go-live license you will be able to build and deploy solutions confidently.


A couple of weeks ago we released the first Community Technology Preview (CTP) of version 2.0 of the Open XML SDK. Keep in mind this version of the SDK is still a CTP, so we are expecting to get a lot of customer feedback to polish this API.


Links Related to the Open XML SDK



What Scenarios Does the Open XML Target?


Let’s suppose you are an xml developer, who understands the Open XML standard and are quite comfortable manipulating and creating Open XML files. The Open XML SDK targets the following core scenarios:


Strongly Typed Classes and Objects


Instead of relying on generic XML functionality to manipulate xml, where you need to be aware of element/attribute/value spelling as well as namespaces, you are able to use the Open XML SDK to accomplish the same solutions by manipulating objects that represent elements/attributes/values. All schema types are represented as strongly typed Common Language Runtime (CLR) classes and all attribute values as enumerations. In other words, you do not need to always reference the standard and Open XML schemas for hierarchy, spelling and namespace, but instead can use .Net’s intellisense capabilities to faster and more reliably develop solutions.


Content Construction, Search, and Manipulation


Using the Open XML SDK you are able to continue to take advantage of your LINQ knowledge because the technology is built directly into the SDK. With the SDK you are able to perform functional constructs and lambda expression queries directly on objects representing Open XML elements. In addition, the SDK allows you to easily traverse and manipulate content by providing support of collections of objects, like tables and paragraphs.


Validation


You have the ability to specify which version of the Open XML format you are targeting, and the API will take this into account for validation. Note: This scenario will be added in future releases of version 2.0 of the SDK.


Markup Language Specific Scenario


With the SDK you can perform a variety of tasks to all types of Open XML packages and Open XML markup languages. For example, you can construct tables with dynamic data in a WordprocessingML document, extract and analyze data in a SpreadsheetML workbook, search and report incompliant content in a PresentationML presentation, or change shape colors in DrawingML.


Next Time


In my next post I am going to talk a bit more about the overall architecture of the SDK as shown in the diagram below.



Let me know if you have any specific questions or comments that you would like me to address here or in future posts.


Zeyad Rajabi

Comments (17)

  1. franco says:

    hi, will newly created Office 14 documents be made to conform to the strict conformance IS 29500 specs ?

    thanks

  2. Marais van Zyl says:

    One thing that is interesting to me is that creating Excel spreadsheets is not as easy as creating Word documents, yet there isn’t real examples for Excel.

    There is a package on codeplex – ExcelPackage, but it has got it’s limitations. Why can’t Microsoft release something that works as easy as the COM interop libraries for the binaries formats that focuses on OpenXML?

    Great work has been done with OpenXML and I love it, but I find it hard to programatically create Excel Spreadsheets using the provided SDK’s.

    Thanks,

    Marais

  3. Zeyad Rajabi says:

    @Marais – I will be writing some posts on developing solutions on top of SpreadSheetML in the future. Are there any specific scenarios that you are interested in me covering?

    Zeyad Rajabi (MS)

  4. Christian 2 says:

    Will this also work in MONO?

    Does it use native code?

  5. Gerald says:

    Hi.

    I’m really interrested in this new SDK 2.0

    I seems to be really easier to use than the ‘old’ XML coding by hand …

    As I’m a newby in OpenXML as weel as in this new SDK, I’m looking for samples using this SDK (add a picture or a graph directly in the document for example).

    I also wonder when a fianl release of this SDK should be available.

    Thanks

  6. Zeyad Rajabi says:

    @Christian – The SDK has not been ported to MONO, yet. We are still investigating. As for your question related to native code, the SDK is built entirely on .Net as such is managed code.

    @Gerald – Glad to hear that you are interested in the SDK. I will be posting a lot of same code in the next coming posts. The final version 2.0 of the SDK will be released around the same time as O14.

    Zeyad Rajabi(MS)

  7. In my first post on the Open XML SDK , I talked about the overall design of the SDK with respect to goals

  8. The Open XML SDK 2.0 Community Technology Preview (CTP) is here! You can find the documentation for it

  9. In my first post on the Open XML SDK , I talked about the overall design of the SDK with respect to goals

  10. Bikash Kumar : India says:

    I have seen with this SDK (2.0) I can create .docx very easily which makes my life easy as we are having strongly typed classes. But please let me know how to Merge two .docx files in one file or by creating a third one. Let me know asap, I will be obliged.

    Thanks,

    Bikash

  11. jones206@hotmail.com says:

    Bikash,

    The easiest way to do that is to just use the alt chunk functionality. you can put the two .docx files within the new .docx packages, and then in the document.xml part just put two chunk tags that reference the embedded .docx files.

    I think I have a blog that shows how to do this from a year or so ago… Zeyad also has a demo around this that he’s going to post at some point (after he’s done a couple posts on Excel).

    -Brian

  12. Bikash Kumar : India says:

    Thank you Brian for giving me this update but please provide me with some link/pointer/some code example, how may I use Chunk in Open XML SDK 2.0. Is it a strongly typed class like we have Body, Paragraph etc. Please update me on this.

    Thank You.

    Bikash

  13. Zeyad Rajabi says:

    Bikash,

    I will be writing up a post showing you how to merge documents using altChunk as soon as I am done with a few posts related to Spreadsheet scenarios. Stay tuned.

    Zeyad Rajabi

  14. Doug Mahugh says:

    I’ve fallen a few weeks behind on posting links to various articles and blog posts, so this post is a

  15. I’m really happy to announce the release of the second CTP for version 2 of the Open XML SDK ! Back in

  16. One of the big changes we made in the Open XML SDK v2 April 2009 CTP was improving the Low Level DOM