Microsoft SDK for Open XML


I was so busy with activities related to the announcement of the new SDK yesterday that I didn’t get a chance to blog about it. And now, the morning after, I don’t have much to add to all the great posts that have been written by my colleagues around the globe. Nice work, everyone. Here are some links with lots of information, starting with Brian’s post that broke the news online yesterday:


Brian Jones: http://blogs.msdn.com/brian_jones/archive/2007/06/04/open-xml-api-tech-preview.aspx

Kevin Boske: http://blogs.msdn.com/kevinboske/archive/2007/06/04/open-xml-api-ctp-released.aspx

Erika Ehrli: http://blogs.msdn.com/erikaehrli/archive/2007/06/04/announcing-the-new-open-xml-object-model.aspx
(Erika did a ton of work in the last few weeks to get this API delivered, setting up the documentation site with Frank Rice’s great samples and also handling all the details of setting up the support forum.)

Art Leonard: http://blogs.msdn.com/artleo/archive/2007/06/04/microsoft-document-api-for-openxml.aspx

Wouter Van Vugt: http://blogs.infosupport.com/wouterv/archive/2007/06/05/APIs-for-Office-Open-XML.aspx

Julien Chable: http://blogs.developpeur.org/neodante/archive/2007/06/04/open-xml-le-sdk-open-xml-disponible-en-technical-preview.aspx

Stephen McGibbon: http://notes2self.net/archive/2007/06/04/microsoft-sdk-for-openxml-formats-june-2007-technology-preview.aspx

Chris Bryant (Channel 9 interview): https://channel9.msdn.com/ShowPost.aspx?PostID=313246#313246


What It Is


This new API is something we’ve been wanting to do for a long time. I first heard talk of something like this in March of 2006, when Stephen Peront of Xinnovation met with me, Kevin Boske, and Art Leonard the week of the Office Devcon in Redmond. Stephen was drawing lots of diagrams on a whiteboard in Building 36 while Kevin and Art were talking about “strongly typed parts,” and frankly I was just along for the ride that day. Since then, all three of those guys have lobbied hard from inside and outside Microsoft to make something like this happen. (The emails Stephen sent to our management in support of this API are truly classics.)


The basic idea underlying the new API is pretty simple: we now have .NET types that correspond to the parts of an Open XML document. If you’re familiar with the structure of Open XML documents, then the type names themselves will tell you what you can do with the new API. Here are a few examples of the new types:


  • OpenXmlPackage

  • WordprocessingDocument, SpreadsheetDocument, PresentationDocument

  • MainDocumentPart, WorkbookPart, PresentationPart

  • ImagePart, CommentsPart, PivotTablePart, WorksheetPart, SlidePart, ThemePart, CustomXmlPart, and many others

In terms of the level of abstraction this new API is a higher-level API than the System.IO.Packaging API. The packaging API knows all about the Open Packaging Convention, but doesn’t distinguish one part type from another — to the packaging API, “parts is parts.”


Now you can write could that creates a “ThemePart” or looks for the “CustomXmlPart” within a document. You still have to deal with the XML markup itself, but this API greatly reduces the amount of code you need for various Open XML programming chores. You no longer need to iterate through relationship types that only occur once or things like that — you can just go straight to the part you need and start working with its content.


Note that this is just a typical .NET API, with no dependencies on anything other than the .NET Framework 3.0 itself. So you don’t need Office or VSTO installed, and it works great in a server environment.


Stephen Peront and Doug Mahugh at TechEd


Announcing the API with a hands-on example


I had the pleasure of working with Stephen Peront on our announcement at TechEd this week. Stephen got a copy of the API on Friday, spent two long days writing code (modifying an existing application that had been using the packaging API), and then he got on a plane from Boston to meet me in Orlando Sunday afternoon. We gave a preview of the API to the Regional Directors meeting on Sunday afternoon, and then Stephen wrote a bunch more code late Sunday for a demo he did yesterday morning at my “Open XML Fundamentals” session.


Stephen’s company, Xinnovation, is a leader in large-scall document assembly applications. They did a lot of work with the binary formats, automating the Office clients to get the job done, back when that was the only feasible approach. Then when the packaging API came out, Xinnovation immediately started using it to provide more robust server-side document assembly solutions to their clients.


Now, with Xinnovation’s new XiDocs “XD4” approach, they’re moving document assembly to a whole new level. Using a new technology called AssemblyML, they allow their customers to create templates that can have the rules for document creation within the document itself. These rules are at a very high level, saying in essence “this chart is dynamically generated from SQL data” or “this document should be generated as a presentation.” To demonstrate the power of the new SDK for Open XML, Stephen showed how he had hundreds of lines of code using the packaging API in the old version of their product, and now that code is less than 20 lines with the new API.



AssemblyML is a powerful set of abstractions that allow highly customized document-assembly applications to be built with minimum coding or even no coding in many cases. The document type itself is an AssemblyML attribute, so with just one or two changes to the assembly instructions you can have the same SQL data populate fields in a presentation instead of content controls in a word-processing document. If that sounds like something you’d be interested in, contact Stephen at speront@xinn.com. And thanks, Stephen, for the great demonstration!


Come visit us at TechEd


If you’re at TechEd in Orlando this week, drop in at the Open XML booth. It’s in the green area on the main exhibit floor, across from that band playing all the 70s tunes next to the lunch tables. My colleagues Erika Ehrli of MSDN and Stephanie Krieger (an Office MVP and author of a great book on Office 2007) are helping staff the booth, and we’d love to see you.


Say you saw this post and we’ll give you a free Open XML t-shirt, or — if you’re Erika’s size or smaller — a free Open XML dress. 🙂


See you there!

Comments (13)

  1. Stephane Rodriguez says:

    "They did a lot of work with the binary formats, automating the Office clients to get the job done, back when that was the only feasible approach."

    It’s amazing you keep insisting that there has never been third-parties out there. That we live in a world where there is Microsoft, customers, and nothing else. Since early 90s, there has been a ton of third-parties that provide direct access to file formats without running instances of Word/Excel/Powerpoint.

    One of those third-parties, SoftArtisans, was licensed by the SQL Server team last month for helping address a need with Reporting Services/Office integration. Go figure…

    "Note that this is just a typical .NET API, with no dependencies on anything other than the .NET Framework 3.0 itself. "

    Perhaps you’ve gotten so much used to that you don’t think anymore. What a huge opportunity lost in coming up with something else than a cross-platform SDK. Remember, with this stuff, all regular Office developers (VB, VBA) are still left in the cold. Perhaps that’s the kind of tax Microsoft think is a good thing. That’s just beyond stupidity in my opinion, especially when you are separately trying to demonstrate platform independence.

    Perhaps you might want to take a hint or two from the Windows HD photo team (who has publshed a cross-platform SDK)

    A few more remarks :

    – OPC is also used by XPS. Why this SDK does not create an XPS wrapper is beyond me. Makes zero sense until you realize the Office team never works with the Windows team.

    – this SDK contains a trivial validation layer. We’ve discussed this. That’s just about the only valuable thing it does right now. For the matter of reducing lines of code, you have probably heard of that weird concept called "a function call" ? May be not….

    – you confuse, hopefully this is just an oversight, "programmatic access" and "document instantiation". I’ll be glad to explain you (one more time) what the difference is, but unless you are willing to take a lot of flack for setting expectations so high for no reason, you might want to clear this up, and better explains what this SDK is, and what this SDK is not.

    -Stephane Rodriguez

    ARsT Design CEO

  2. Doug Mahugh says:

    I’ll try to clarify, Stephane: this API is strongly typed parts for Open XML, on the .NET platform.  It is not an XPS or OPC API, nor a cross-platform SDK.  Hope that helps.

  3. KevinBoske says:

    Stephane, how does this SDK prevent VBA developers from writing code?  VBA developers still have access to the files via the Office client object models.  Those libraries (the Office client applications) and solutions provide way more granular access to data within Office documents (binary and Open XML).  This SDK is designed for scenarios where the clients need not be or should not be present on Windows.  Doug has already blogged Julian Chable’s Java API.

    As for XPS, there is already an XPS wrapper produced by the XPS team.  Why would we (not the XPS team) produce another API for that team?

  4. speront says:

    Hi Stephane…

    "It’s amazing you keep insisting that there has never been third-parties out there."

    Doug is clearly referring to a third-party here (i.e. Xinnovation); where does this comment come from?.

    Regarding the SQL team, I believe you are referring to this (http://msdn2.microsoft.com/en-us/library/aa964136.aspx) Does that let users assemble PowerPoint docs or any type of doc in general or is it specifically for reporting purposes?

    "Remember, with this stuff, all regular Office developers (VB, VBA) are still left in the cold."

    There are some great people at MS who are very passionate about meeting these needs; have you had a chance to look at what the Studio team is working on? Are "all regular Office developers" VB and VBA programmers?

    "That’s just about the only valuable thing it does right now."

    We are realizing a lot of value from the SDK (already); I personally am very happy to see this team working hard to implement the "Object Oriented Programming" approach that all the other Tools teams are doing. Great job, MS!

    We are actually really excited about the SDK, quite frankly some people have made a lot of personal sacrifice to get this to market so quickly; is it fair to so quickly judge their efforts without considering the all the upcoming efforts of each of the Tools team.

    Your attacks on Doug’s comments – who has a *very* level head on his shoulders and works hard to provide the best for everyone – are completely uncalled for… Advice and feedback are good, maybe a little respect too!

    Cheers,

    -Stephen

  5. Stephane Rodriguez says:

    Kevin said "Stephane, how does this SDK prevent VBA developers from writing code?  VBA developers still have access to the files via the Office client object models."

    You are confusing two totally different topics.

    .NET 3.0 is not accessible to VB, VBA and what regular office developers use. I don’t think it’s too hard to understand.

    As for accessing Office with the COM object model, well han’t Doug hammering over and over again that it did not work for server scenarios?

    I am speaking devil’s advocate here. You know who I am, no need to play innocent child.

    This new SDK is welcome but the way you guys talk about it and is incorrect. You are setting yourself for a huge disappointment when those guys lurking over at openxmldeveloper.org try it only to realize it won’t solve any of their problem. N O N E.

    Head over at openxmldeveloper.org and see their needs. They are not looking for a wrapper that lets them create a WordDocument object instead of a ZipPackage object.

    As I said before, the only valuable bits in this SDK is the validation layer, which nobody talked about so far. If you remember Kevin, that’s what we’ve discussing on your blog back in last August or something like that.

  6. Stephane Rodriguez says:

    Stephen Peront,

    First off, I am discussing with Doug which I have been discussing with for almost a year now. You need not feel part of the discussion. Also, there are things that I say that he understands very well, that perhaps the lacking context does not let you understand the same. So again, perhaps it’s better to leave it as is.

    I’ll just comment on one thing : third-parties. If you read Doug’s comment, he and many of his colleagues have been hammering since August 2005 (when Jones introduced the new Office 2007 file format in a C9 video) that the only way to process Office documents in server-side scenarios was to use the COM object model. And that it did not work very well, if at all.

    They did not only say from Microsoft perspective, they took it on behalf of the entire office ecosystem.

    Most notably, they chose (for the marketing propaganda agenda) to ignore third-parties, so that their point would perhaps be easier to get across.

    Problem, it’s totally wrong. There are numerous third-parties that have reverse engineered those file formats (SoftArtisans is an old one) and that are to date very real alternatives. I know this information first-hand. So when Doug says above "They did a lot of work with the binary formats, automating the Office clients to get the job done, back when that was the only feasible approach.", I don’t think he’s talking about you here specifically, he’s talking about what their employer asked them specify as the "server-side marketing trick".

    Taking some further distance on this, you’ve got to understand that Microsoft is not a non-profit business and that they have no interest to provide a rich SDK that would make them sell less server licenses. That’s exactly what they are heading towards, that’s their next revenue growth, don’t think they are going to kill it just to make you happy. Hence the difference between "programmatic access" and "document instance".

    Stephane Rodriguez

    ARsT Design CEO

    PS : if you did not know, ARsT Design is responsible for http://diffopc.arstdesign.com and http://xlsgen.arstdesign.com. Among other things.

  7. Kevin Boske says:

    Stephane, I recognized your name from other blog posts by me and Brian, but I had no idea what you did or your products until I looked at your site linked in the last comments.  

    This build of the SDK is a tech preview, a CTP.  It has bugs.  It is missing functionality.  We know there are improvements to be made.  That’s why we released it now, to get feedback and improve.  

  8. Stephane Rodriguez says:

    "That’s why we released it now, to get feedback and improve. "

    So here is my feedback, in a nutshell :

    – wrong architecture. You leave the bulk of developers out in the cold. .NET 3.0 is a non-starter in many environments.

    – wrong "marketing". The MSDN Open XML forum is already filled with questions that this SDK will never be able to answer. Basically, you are willing to be slapped in the face. Here is a typical question an Office developer wants to be able to solve : merge Word document (aka the chunk fragments) without a running Word instance.

    – wrong targeting. By creating yet another website, the MSDN Open XML forum this time, you are fragmenting the "community" even more.

    All the above makes zero sense to me.

  9. Erika Ehrli says:

    If you couldn’t make it to TechEd – or even if you did – check out Virtual TechEd . I loved attending

  10. Doug Mahugh says:

    After nine months of developer feedback on the Open XML SDK , we have some good news today: a roadmap

  11. After nine months of developer feedback on the Open XML SDK , we have some good news today: a roadmap

  12. Pubblicata la roadmap di Open XML SDK. Vi segnalo alcuni link di approfondimento: Open XML SDK download

  13. I’m catching up with a bunch of Open XML blogging from ages ago, so apologies if some of these are old