Implementing document-format specifications

A few folks have pointed out that implementing every detail of the Office Open XML specification would be very difficult. And that's certainly true -- implementing 100% of a document-format specification is a daunting task.

A good example of the complexity of this task can be found in the Intel-sponsored ODF test suite developed by the University of Central Florida. In the Summary section, you'll find links to over 300 specific issues regarding partial or missing implementation of ODF in OpenOffice and KOffice, with screen shots and descriptions of the issues.

In most situations, of course, a developer isn't trying to implement 100% of a spec. For example, Mindjet's integration of MindManager and Word 2007 through the use of Office Open XML only uses a tiny portion of the Office Open XML spec and went from concept to completion in just a few weeks.

Last night I saw another great example: a simple Open XML spreadsheet editor, developed by a college student here in Delhi. It allows the user to open an Open XML shreadsheet, edit values in a grid control or add new rows, and save the result as a valid Open XML spreadsheet. And although it's written in C#, it doesn't use the .NET 3.0 System.IO.Packaging API, instead opening the document as a simple ZIP archive. (I'll write up that application in more detail later when I have a little time, and we'll be covering it on the OpenXmlDeveloper site as well.)

The thoroughness of the Office Open XML specification gives developers all of the information they need to get the job done, and that's a good thing. And there is functionality in the Open XML spec that no other document format provides, such as compatibility with billions of existing Office documents and a variety of ways to support custom-schema interoperability in documents. All of that functionality adds complexity, but most of the details are optional, so implementers don't need to read or understand them. As the creator of the spreadsheet editor mentioned above told me, "I haven't read 6000 pages in my entire life!" Kids these days. :-)

For those who criticize the size of the spec, an interesting rhetorical question -- which I've not seen adressed anywhere -- is "precisely which sections of the spec would you recommend be ommitted?" That would probably lead to an interesting discussion of document-format priorities in general -- to state the obvious, a spec can't offer functionality that isn't specified.

4/28/2008: updated link to ODF test suite.