Binary to Open XML (B2X) Translator: Interoperability for the Office binary file formats

[05/18- Update:
this translator is highlighted in today's Document Interoperability Inititice (DII) event that just happened in London ]

In support of Microsoft’s ongoing efforts to increase the interoperability of its various technologies, we have partnered with Dialogika to create a translator that converts the Microsoft Office binary file formats (.DOC, .XLS, and .PPT) into the Office Open XML standard format (.DOCX, .XLSX, .PPTX).

A majority of the world’s documents are available in the binary Office formats and, for developers working with these formats (including .DOC, .PPT, and .XLS.), Microsoft published the specifications under the Open Specification Promise (OSP) in June 2008.

1

A new version of the Binary to Open XML (B2X) Translator has just been released ; this version adds support for PowerPoint (.PPT) and Excel (.XLS) files:

Supported .XLS Features

Supported .PPT Features

  • Shared Formulas
  • String Formatting
  • Data Type Formatting (number, date, currency, etc.)
  • Cell Formatting

 

  • Textbox Formatting
  • Shapes
  • Animations
  • Notes (including Formatting)

(Detailed features http://b2xtranslator.sourceforge.net/architecture.html#mapping )

From an architectural point of view, the translator can be seen as a series of pipelines during which transformation steps are applied to translate from the binary to Open XML format:

2

(more details on http://b2xtranslator.sourceforge.net/architecture.html )

While it has been possible to manually convert documents between formats by opening the file in the relevant application and saving in the other format, before the release of the translator there was no software tool to automate this task as a stand-alone application, or in a batch mode.

So from the end-user point of view the translator offers two options:

3

While using Windows’ context menus to translate the files is self-explanatory (right-click, convert to…) doing so from the command line warrants a bit more study. The command line utility consists of three separate executables, one for each file type (ppt2x.exe for spreadsheet, doc2x.exe for document, and xls2x.exe for presentation). The executables use the same command line syntax, and support the usual basic command line options:
4
This includes the input filename, output filename, and the level of debug verbosity. The resulting command is easy to include in automation scripts, and batch processes.

The command-line architecture allows the translators to be integrated into existing systems such as document management systems running on a server.

Using the source of B2X translator (ppt2x.exe, doc2x.exe, xls2x.exe), you can rebuilt them using the .NET Framework on Windows or Mono on Linux, thus ensuring portability across operating systems and platforms.

As an open source project, the Translator is a solid foundation for engineering work around the Office binary format. Dialogika’s development team has put together a few “how to” guides, including the Freeform Shapes in the Office Drawing Format guide, that helps to explain the specification and give some valuable tips. For developers and ISVs the code of this translator can be reused in their own applications enabling a wide range of document interoperability solutions.

We’re excited by this latest release making the translators more functional and addressing practical document conversion scenarios. Of course, there’s still work ahead of us! We are currently in the planning stage for the next version. In addition to the goals outlined above, it is very important to us that the translator adequately addresses practical user scenarios. To this end, we would love to hear feedback on this release as well as your feature requests for the next version. Please provide your feedback on the Sourceforge site.