Open XML Resources for Developers

Like many people, I thought we'd know the official outcome of the DIS 29500 process today, but it looks like we won't hear the official results until after ISO has had a chance to run them by the national bodies who participated in the review of the specification, which according to Reuters will be Wednesday.

While we wait, I've been thinking about how much attention this process has been getting, especially in recent months. Back when Ecma submitted the ECMA-376 standard to ISO at the beginning of 2007 (451 days ago, if my math is right), a relatively small number of people were following the discussion around document format standards. That group has expanded significantly, and there are now many people following the story of Open XML and DIS 29500.

Since some of those people may be developers who didn't see all of the Open XML content that has been made available in the past, I decided to pull together a list of links to various resources for Open XML developers. The list is included below. I'm sure I've left out a few good resources, so please let me know in the comments if you know of a useful Open XML developer resource that I've not included here.

The Basics

A good place to start if you're brand-new to Open XML is the collection of Open XML videos on YouTube. You can see various implementations in action on various platforms (Windows, Linux, iPhone, Treo, etc.), as well as interviews with Open XML developers and other information.

When you're ready to dig into the details, start with Frank Rice's introductory paper available on MSDN, which covers Open XML document architecture and also describes many common scenarios for Open XML development projects. Another great article on MSDN is Erika Ehrli's overview of the DOCX format, which goes into more detail on the most popular of the three document types.

If you're familiar with document formats in general and want to read about how Open XML compares to other formats, or are curious about how Ecma TC45 sees the Open XML formats, be sure to read Tom Ngo's whitepaper. Tom is the CTO of NextPage and a TC45 member, and he was a contributor to the conformance clause of the original Ecma spec and also participated in work on multi-part structure and conformance at the BRM last month.

The first book on Open XML development was Wouter Van Vugt's "Open XML Explained," which is available as a free download on MSDN. Toshiba's Yoko Girier has a Japanese book out for Open XML developers as well. Another reference material for developers is the Open XML Developer Map poster, which provides an overview of the schemas and document types.

For a non-technical high-level overview of Open XML's role in the industry, Oliver Bell has written a paper entitled "Open for Business" that covers that perspective well.

Advanced Content

For more detailed coverage of the schemas, take a look at the videos of the Open XML Developer Workshop. These are videos from a 2-day class on Open XML, and you can also download the content of the workshop, including presentations, sample documents, and hands-on labs with code samples.

If you've seen me do an Open XML workshop, you know that there are a couple of concepts I really like to stress. One is the the value of custom schema support for developers who want to create innovative solutions that merge the world of documents and the world of data. Custom schema support opens up a new world of possibilities for document-based business processes, and Open XML allows custom schemas to be used to tag document content, or for discrete "custom XML parts" within a document.

Another favorite topic of mine is how to work with OPC (the Open Packaging Conventions that form the structural basis of the Open XML formats). To ensure reliable interoperability, developers need to write code that properly navigates documents by their relationship structure rather than their physical structure, and this is an easy detail to overlook when you're getting started with Open XML development.

I've blogged about these two topics in the past, so here are links to those posts for more information:

Open XML Portals

The following sites offer a rich set of Open XML content for developers, implementers, policy makers, and others:

  • OpenXmlDeveloper.org provides "how-to" information for developers working on many different platforms, and it has a Forums section where you can post questions about Open XML development topics.
  • The Office Open XML Formats Resource Center has links to many comprehensive how-to articles on Open XML development in the .NET environment, as well as whitepapers and other supporting information.
  • OpenXmlCommunity.org has information about Open XML implementations, case studies, IP information, and other non-technical content that may be useful or interesting to Open XML developers.

Developer Tools

Many of the articles I've linked to above and below cover developer tools, but here's a concise list of download links for the most popular tools for various environments:

  • Packaging API. If you're running Vista you already have the System.OI.Packaging API. If you're running XP, you'll need to install the .NET Framework 3.0 to get it.
  • Open XML SDK. The SDK for Open XML formats is a higher-level API for working with Open XML in a .NET environment. All the latest information about the SDK can be found in my recent blog post about the SDK roadmap.
  • Java developers should take a look at the open-source OpenXML4J API .
  • Another great alternative for Java developers is docx4j , an open-source library that creates an in-memory representation of the contents of a DOCX. Jason Harrop and others are building a variety of open-source tools for Open XML developers — see the dev.Plutext.org site for all the details.
  • For PHP developers, check out the PHPExcel API , which provides functionality for easily creating Open XML spreadsheets from PHP applications.

And here are two other tools that many Open XML developers find useful:

  • The Package Explorer is a handy tool for viewing, editing, and validating the contents of Open XML documents.
  • Altova's XMLSpy supports Open XML, and Altova CEO Alexander Falk's blog is a good place to learn more about it.

Developer-Oriented Blogs

There are many blogs about Open XML now, including several that provide useful developer content on a regular basis. Here are a few of my favorite Open XML development blogs:

  • Brian Jones covers a variety of Open XML topics, and is the best source of information on the thinking behind Open XML and how Microsoft sees the future of XML-based documents. Brian is a member of Ecma TC45 as well.
  • Wouter Van Vugt is an experienced .NET consultant/trainer who has led numerous Open XML workshops, created Package Explorer, and wrote the "Open XML Explained" book mentioned above. He often posts code samples as well, and is a member of the technical committee that evaluated Open XML for the Netherlands.
  • Jesper Lund Stocholm covers Open XML and ODF development, and is a very active member of the Danish technical committee that evaluated the Open XML spec.
  • Erika Ehrli is the driving force behind most of the Open XML content on MSDN, and she's also a regular blogger who posts code samples and links to other resources for developers.
  • Julien Chable, the creator of the OPENXML4J API, has a French-language blog with regular posts on Open XML for Java developers as well as .NET developers.
  • Eric White blogs about Open XML, LINQ to XML, and related topics. He focuses on the future of XML development for .NET developers, and posts code samples showing how the latest functional programming concepts can be applied to Open XML.
  • Maarten Balliauw is the creator of the PHPExcel API, and blogs on a variety of development topis including Open XML. He covers ASP.NET development as well as PHP development.
  • James Newton-King is another blogger who covers a wide range of development topics. His posts on Open XML and LINQ to XML are excellent.
  • Rick Jelliffe blogs about markup languages and related topics, and has been a major contributor to the debate around Open XML.
  • Dennis Hamilton has a lifetime of experience in technology standards, and often writes posts that look beyond the present into the world of possibilities that XML-based document formats present for developers.
  • Mauricio Ordonez blogs about Office development topics, including Open XML, and I'm looking forward to some interesting content he'll have soon for Open XML developers.
  • Finally, this one isn't really a blog, but the Open XML SDK forum on MSDN is another great place to read developers' conversations about Open XML development or get your questions answered.