Open XML Resources for Developers

Like many people, I thought we’d know the official outcome of the DIS 29500 process today, but it looks like we won’t hear the official results until after ISO has had a chance to run them by the national bodies who participated in the review of the specification, which according to Reuters will be Wednesday.

While we wait, I’ve been thinking about how much attention this process has been getting, especially in recent months. Back when Ecma submitted the ECMA-376 standard to ISO at the beginning of 2007 (451 days ago, if my math is right), a relatively small number of people were following the discussion around document format standards. That group has expanded significantly, and there are now many people following the story of Open XML and DIS 29500.

Since some of those people may be developers who didn’t see all of the Open XML content that has been made available in the past, I decided to pull together a list of links to various resources for Open XML developers. The list is included below. I’m sure I’ve left out a few good resources, so please let me know in the comments if you know of a useful Open XML developer resource that I’ve not included here.

The Basics

A good place to start if you’re brand-new to Open XML is the collection of Open XML videos on YouTube. You can see various implementations in action on various platforms (Windows, Linux, iPhone, Treo, etc.), as well as interviews with Open XML developers and other information.

When you’re ready to dig into the details, start with Frank Rice’s introductory paper available on MSDN, which covers Open XML document architecture and also describes many common scenarios for Open XML development projects. Another great article on MSDN is Erika Ehrli’s overview of the DOCX format, which goes into more detail on the most popular of the three document types.

If you’re familiar with document formats in general and want to read about how Open XML compares to other formats, or are curious about how Ecma TC45 sees the Open XML formats, be sure to read Tom Ngo’s whitepaper. Tom is the CTO of NextPage and a TC45 member, and he was a contributor to the conformance clause of the original Ecma spec and also participated in work on multi-part structure and conformance at the BRM last month.

The first book on Open XML development was Wouter Van Vugt’s “Open XML Explained,” which is available as a free download on MSDN. Toshiba’s Yoko Girier has a Japanese book out for Open XML developers as well. Another reference material for developers is the Open XML Developer Map poster, which provides an overview of the schemas and document types.

For a non-technical high-level overview of Open XML’s role in the industry, Oliver Bell has written a paper entitled “Open for Business” that covers that perspective well.

Advanced Content

For more detailed coverage of the schemas, take a look at the videos of the Open XML Developer Workshop. These are videos from a 2-day class on Open XML, and you can also download the content of the workshop, including presentations, sample documents, and hands-on labs with code samples.

If you’ve seen me do an Open XML workshop, you know that there are a couple of concepts I really like to stress. One is the the value of custom schema support for developers who want to create innovative solutions that merge the world of documents and the world of data. Custom schema support opens up a new world of possibilities for document-based business processes, and Open XML allows custom schemas to be used to tag document content, or for discrete “custom XML parts” within a document.

Another favorite topic of mine is how to work with OPC (the Open Packaging Conventions that form the structural basis of the Open XML formats). To ensure reliable interoperability, developers need to write code that properly navigates documents by their relationship structure rather than their physical structure, and this is an easy detail to overlook when you’re getting started with Open XML development.

I’ve blogged about these two topics in the past, so here are links to those posts for more information:

Open XML Portals

The following sites offer a rich set of Open XML content for developers, implementers, policy makers, and others:

  • provides “how-to” information for developers working on many different platforms, and it has a Forums section where you can post questions about Open XML development topics.

  • The Office Open XML Formats Resource Center has links to many comprehensive how-to articles on Open XML development in the .NET environment, as well as whitepapers and other supporting information.

  • has information about Open XML implementations, case studies, IP information, and other non-technical content that may be useful or interesting to Open XML developers.

Developer Tools

Many of the articles I’ve linked to above and below cover developer tools, but here’s a concise list of download links for the most popular tools for various environments:

  • Packaging API. If you’re running Vista you already have the System.OI.Packaging API. If you’re running XP, you’ll need to install the .NET Framework 3.0 to get it.

  • Open XML SDK. The SDK for Open XML formats is a higher-level API for working with Open XML in a .NET environment. All the latest information about the SDK can be found in my recent blog post about the SDK roadmap.

  • Java developers should take a look at the open-source OpenXML4J API.

  • Another great alternative for Java developers is docx4j, an open-source library that creates an in-memory representation of the contents of a DOCX. Jason Harrop and others are building a variety of open-source tools for Open XML developers — see the site for all the details.

  • For PHP developers, check out the PHPExcel API, which provides functionality for easily creating Open XML spreadsheets from PHP applications.

And here are two other tools that many Open XML developers find useful:

  • The Package Explorer is a handy tool for viewing, editing, and validating the contents of Open XML documents.

  • Altova’s XMLSpy supports Open XML, and Altova CEO Alexander Falk’s blog is a good place to learn more about it.

Developer-Oriented Blogs

There are many blogs about Open XML now, including several that provide useful developer content on a regular basis. Here are a few of my favorite Open XML development blogs:

  • Brian Jones covers a variety of Open XML topics, and is the best source of information on the thinking behind Open XML and how Microsoft sees the future of XML-based documents. Brian is a member of Ecma TC45 as well.

  • Wouter Van Vugt is an experienced .NET consultant/trainer who has led numerous Open XML workshops, created Package Explorer, and wrote the “Open XML Explained” book mentioned above. He often posts code samples as well, and is a member of the technical committee that evaluated Open XML for the Netherlands.

  • Jesper Lund Stocholm covers Open XML and ODF development, and is a very active member of the Danish technical committee that evaluated the Open XML spec.

  • Erika Ehrli is the driving force behind most of the Open XML content on MSDN, and she’s also a regular blogger who posts code samples and links to other resources for developers.

  • Julien Chable, the creator of the OPENXML4J API, has a French-language blog with regular posts on Open XML for Java developers as well as .NET developers.

  • Eric White blogs about Open XML, LINQ to XML, and related topics. He focuses on the future of XML development for .NET developers, and posts code samples showing how the latest functional programming concepts can be applied to Open XML.

  • Maarten Balliauw is the creator of the PHPExcel API, and blogs on a variety of development topis including Open XML. He covers ASP.NET development as well as PHP development.

  • James Newton-King is another blogger who covers a wide range of development topics. His posts on Open XML and LINQ to XML are excellent.

  • Rick Jelliffe blogs about markup languages and related topics, and has been a major contributor to the debate around Open XML.

  • Dennis Hamilton has a lifetime of experience in technology standards, and often writes posts that look beyond the present into the world of possibilities that XML-based document formats present for developers.

  • Mauricio Ordonez blogs about Office development topics, including Open XML, and I’m looking forward to some interesting content he’ll have soon for Open XML developers.

  • Finally, this one isn’t really a blog, but the Open XML SDK forum on MSDN is another great place to read developers’ conversations about Open XML development or get your questions answered.

Comments (17)

  1. Doug has a great post today that helps get us back to what really matters in this whole file format discussion

  2. orcmid says:

    This is really great.  I was going to start rounding up this sort of thing and ‘lo, here it is!

    Nice job.

  3. Kevin says:

    Wow, thanks for these links to such truly helpful tools!

    Looking around, I see a crucial tool that is missing, however. For developers, could you please provide a link to a resource providing the full mapping from the legacy formats? You see, I want to implement this format fully and provide those who use my software with the specific added value for which OOXML was created.

    So I need that mapping. Where is it?

  4. Open XML says:

    Point n’est besoin de s’attarder sur le résultat du vote ISO, l’actualité est déja ou sera largement

  5. Kevin says:

    Anon wrote, "Kevin, check"

    Anon, that’s for working with the binaries and for doing reverse engineering on the spec. That’s not what a developer should have to work with!

    Also note at the link you provided:

    "the binary formats have also been made available under the Open Specification Promise"

    Yes, the binary formats have been made available under those terms, but no full, official ECMA or ISO documentation and mapping for them has been made available. Not even an unofficial Microsoft version.

    Dough, I’d appreciate it if you could make some headway towards making sure that these resources are made openly available to all.

  6. Von den 87 National Body Members (stimmberechtigten Ländern) unterstützen 87% die ISO/IEC Standardisierung,

  7. Ben Lincoln says:

    What do you want Kevin?  It’s not really reverse engineering because you have access to the source code and so can see the exact mappings.  I actually prefer this to what would otherwise be a long and dry piece of documentation.  Moreover, in this case most of the work I would need/want to do (translation) is done for me.  

    So I guess the question is, what exactly are you looking for?  I think there is a considerable difference between being open, providing appropriate resources, and then having to actually do all the work for people.  

  8. Ben Lincoln says:

    Also Kevin, the Library of Congress is hosting the binaries, so there is no reason that someone from the open source community couldn’t produce the mapping (if that’s all you want) without fear using the translator.  

    It’s interesting that in Open Source projects not hosted by Microsoft the community is expected to do some of the work.  In contrast, when they do make donations and attempt to be more open, it is expected that they should do everything.  At what point does it stop being open source and start becoming free labor?

    FYI, I am pro open source but also VERY pro about the community dedicating their time and efforts and not just asking for things.  

  9. Te ne avevo già parlato negli scorsi mesi . Microsoft Office 2007 ha introdotto un nuovo formato di file

  10. James Plamondon says:

    Our mission is to establish Microsoft’s platforms as the de facto standards throughout the computer industry…. Working behind the scenes to orchestrate "independent" praise of our technology, and damnation of the enemy’s, is a key evangelism function during the Slog. "Independent" analyst’s report should be issued, praising your technology and damning the competitors (or ignoring them). "Independent" consultants should write columns and articles, give conference presentations and moderate stacked panels, all on our behalf (and setting them up as experts in the new technology, available for just $200/hour). "Independent" academic sources should be cultivated and quoted (and research money granted). "Independent" courseware providers should start profiting from their early involvement in our technology. Every possible source of leverage should be sought and turned to our advantage.

    I have mentioned before the "stacked panel". Panel discussions naturally favor alliances of relatively weak partners – our usual opposition. For example, an "unbiased" panel on OLE vs. OpenDoc would contain representatives of the backers of OLE (Microsoft) and the backers of OpenDoc (Apple, IBM, Novell, WordPerfect, OMG, etc.). Thus we find ourselves outnumbered in almost every "naturally occurring" panel debate.

    A stacked panel, on the other hand, is like a stacked deck: it is packed with people who, on the face of things, should be neutral, but who are in fact strong supporters of our technology. The key to stacking a panel is being able to choose the moderator. Most conference organizers allow the moderator to select the panel, so if you can pick the moderator, you win. Since you can’t expect representatives of our competitors to speak on your behalf, you have to get the moderator to agree to having only "independent ISVs" on the panel. No one from Microsoft or any other formal backer of the competing technologies would be allowed – just ISVs who have to use this stuff in the "real world." Sounds marvelously independent doesn’t it? In fact, it allows us to stack the panel with ISVs that back our cause. Thus, the "independent" panel ends up telling the audience that our technology beats the others hands down. Get the press to cover this panel, and you’ve got a major win on your hands.

    Finding a moderator is key to setting up a stacked panel. The best sources of pliable moderators are:

       — Analysts: Analysts sell out – that’s their business model. But they are very concerned that they never look like they are selling out, so that makes them very prickly to work with.

       — Consultants: These guys are your best bets as moderators. Get a well-known consultant on your side early, but don’t let him publish anything blatantly pro-Microsoft. Then, get him to propose himself to the conference organizers as a moderator, whenever a panel opportunity comes up. Since he’s well- known, but apparently independent, he’ll be accepted – one less thing for the constantly-overworked conference organizer to worry about, right?

  11. Eilne uudis on see, et DIS 29500 ehk Open XML -i standard, mis oli viimasel hääletusel ISO/IEC standardite

  12. I’ve been talking more and more with ISVs and developers who are interested in using Office as a UI platform.

  13. Erika Ehrli says:

    Many of you may have already heard that Office Open XML was approved as an ISO standard ! This is great

  14. You've probably heard the exciting news already – both ECMA and Microsoft have announced it. For

  15. Doug Mahugh, Program Manager bei Microsoft in Redmond, hat eine umfangreiche Liste an Ressourcen zu Open

  16. Some of my old readers would have noticed that I’ve stopped blogging for quite a while now. Thing in