Mapping documents in the binary format (.doc; .xls; .ppt) to the Open XML format

I wanted to call everyone's attention to a few interesting developments in Ecma's proposed disposition document related to the Office binary formats. There were a few comments from national bodies that asked about the documentation of the Office binary formats and the availability of those documents. We had already been talking about these issues in TC45 where there were a number of existing experts in the binary formats (including Apple, Novell, and Microsoft). Based on the feedback from the national bodies, Microsoft decided last week to take some additional steps in this area.

The first issue National Bodies were interested in was easier availability of the documentation of the binary formats (.doc; .xls; .ppt). It sounded like the main concern here was around the extra steps required to get the binary documentation. The current form of the documentation has been available since 2006, where anyone could get the documentation by sending an email to Microsoft as described as https://support.microsoft.com/kb/840817/en-us. The documents were available royalty-free under RAND-Z. We already have hundreds of companies, including IBM and SUN, as well as government institutions who have the documents. The new proposal we (Microsoft) made to Ecma TC45 was that we'd just get rid of the need to send an e-mail and we'd provide it for direct download under the OSP. TC45 thought this was a good solution, and here was the TC45 response to the national body comments:

Documenting the Microsoft Office "binary" file formats (i.e., .doc, .xls, and .ppt) (the "Binary Formats") is not the intention or in the scope of DIS 29500.

However, Ecma International  discussed this subject with Microsoft Corporation. Microsoft indicated that the documentation of the Binary Formats has been available royalty-free under RAND-Z to anyone who requests it by sending an email to officeff@microsoft.com, as described at https://support.microsoft.com/kb/840817/en-us.  Microsoft indicated that many companies and public institutions have asked for and received the Binary Formats since Microsoft started providing access to this documentation. 

Nevertheless, in response to requests for even easier access to the Binary Formats, Microsoft has agreed to remove any intermediate steps necessary to get the documentation, and will post it and make it directly available for a direct download on the Microsoft web site.  Microsoft will also make the Binary Formats subject to its Open Specification Promise (see www.microsoft.com/interop/osp) by February 15, 2008.

The second issue we had feedback on was an interest in the mapping from the binary formats into the Open XML formats. The thought here was that the most effective way to help people with this was to create an open source translation project to allow binary documents (.doc; .xls; .ppt) to be translated into Open XML. So we proposed the creation of a new open source project that would map a document written using the legacy binary formats to the Open XML formats. TC45 liked this suggestion, and here was the TC45 response to the national body comments:

We believe that Interoperability between applications conforming to DIS 29500 is established at the Office Open XML-to- Office Open XML file construct level only.

Prescriptive guidance on, or tools to enable, transformation from Microsoft Office  "binary" file formats (i.e., .doc., .xls, and .ppt) (the "Binary Formats") to Office Open XML formatted files is not the intention or in scope of DIS 29500.  As a result this request is outside the bounds of this process. 

It is important to note that substantial use is being made of both the Binary Formats and Office Open XML in the marketplace today.  Many products (such as OpenOffice.org) support the Binary Formats. Microsoft has indicated that many companies and public institutions have received the documentation for the Binary Formats, and are working with it at this time, and can create mappings between the Binary Formats and Office Open XML. Translators from the Binary Formats  to XML formats such as ODF have already been developed and are in wide use. For example, the Sun ODF Plug-in for Microsoft Office (https://sun.systemnews.com/articles/112/3/sw/18208) states that  "The plug-in allows users the ability to seamlessly convert Microsoft Office documents to and from ODF. The ODF plug-in supports Microsoft Word, Excel and Powerpoint".

Likewise, there is widespread use of Office Open XML in the marketplace today across platforms and applications.  A few examples include the implementations released by Apple (Mac OS X Leopard, iWork 08, iPhone), Adobe (InDesign), Microsoft (Office 2007, Office 2003, Office XP, Office 2000, Office 2008 Mac OS X), Novell (Suse Open Office), Google (Search / Preview), Mindjet (MindManager), Intergen, OpenXML/ODF Translator (Open Source project on Sourceforge), Dataviz (DocumentsToGo on Palm OS, MacLinkPlus on Mac OS X Leopard), NeoOffice, Altova (XMLSpy), MarkLogic (XML Content Server), Datawatch (Monarch Pro), QuickOffice  (QuickOffice Premier 5.0 on Symbian), Altsoft (XML2PDF Server 2007) and those under development by Corel (WordPerfect), AbiWord, Gnome (GNumeric),  Xandros, Linspire, Turbolinux and others.  These implementations are now available on many platforms, including Linux, the Macintosh, Windows, and handheld devices (PalmOS, Symbian, iPhone, and Windows Mobile).

The widespread use of both  Binary Formats and Office Open XML formats indicates that, at this time, 3rd party can use both formats and build mappings between them.

Nonetheless, Ecma International discussed this subject with Microsoft Corporation, the author of the Binary Formats.  To make it even easier for third party conversion of Binary Format-to-DIS 29500, Microsoft agreed to:

  • Initiate a Binary Format-to-ISO/IEC JTC 1 DIS 29500 Translator Project on the open source software development web site SourceForge (https://sourceforge.net/ ) in collaboration with independent software vendors.  The Translator Project will create software tools, plus guidance, showing how a document written using the Binary Formats can be translated to DIS 29500.  The Translator will be available under the open source Berkeley Software Distribution (BSD) license, and anyone can use the mapping, submit bugs and feedback, or contribute to the Project.  The Translator Project will start on February 15, 2008. 
  • Make it even easier to get access to the  Binary Formats documentation by posting it and making it available for a direct download on the Microsoft web site no later than February 15, 2008.  The Binary Formats have been under a covenant not to sue and Microsoft will also make them available under its Open Specification Promise (see www.microsoft.com/interop/osp) by the time they are posted.

We will modify DIS 29500 to include an informative reference to the SourceForge project.

I think that both of these items are great news for folks interested in documents and document file formats. There will be a lot more information around both of these pieces of work over the coming weeks, but I wanted to make sure people realized that this was already in the works.

-Brian