Binary to Open XML (B2X) Translator: Interoperability for the Office binary file formats

[05/18- Update:
this translator is highlighted in today’s Document Interoperability Inititice (DII) event that just happened in London ]

In support of Microsoft’s ongoing efforts to increase the interoperability of its various technologies, we have partnered with Dialogika to create a translator that converts the Microsoft Office binary file formats (.DOC, .XLS, and .PPT) into the Office Open XML standard format (.DOCX, .XLSX, .PPTX).

A majority of the world’s documents are available in the binary Office formats and, for developers working with these formats (including .DOC, .PPT, and .XLS.), Microsoft published the specifications under the Open Specification Promise (OSP) in June 2008.


A new version of the Binary to Open XML (B2X) Translator has just been released ; this version adds support for PowerPoint (.PPT) and Excel (.XLS) files:

Supported .XLS Features

Supported .PPT Features

  • Shared Formulas

  • String Formatting

  • Data Type Formatting (number, date, currency, etc.)

  • Cell Formatting


  • Textbox Formatting

  • Shapes

  • Animations

  • Notes (including Formatting)

(Detailed features )

From an architectural point of view, the translator can be seen as a series of pipelines during which transformation steps are applied to translate from the binary to Open XML format:


(more details on )

While it has been possible to manually convert documents between formats by opening the file in the relevant application and saving in the other format, before the release of the translator there was no software tool to automate this task as a stand-alone application, or in a batch mode.

So from the end-user point of view the translator offers two options:


While using Windows’ context menus to translate the files is self-explanatory (right-click, convert to…) doing so from the command line warrants a bit more study. The command line utility consists of three separate executables, one for each file type (ppt2x.exe for spreadsheet, doc2x.exe for document, and xls2x.exe for presentation). The executables use the same command line syntax, and support the usual basic command line options:
This includes the input filename, output filename, and the level of debug verbosity. The resulting command is easy to include in automation scripts, and batch processes.

The command-line architecture allows the translators to be integrated into existing systems such as document management systems running on a server.

Using the source of B2X translator (ppt2x.exe, doc2x.exe, xls2x.exe), you can rebuilt them using the .NET Framework on Windows or Mono on Linux, thus ensuring portability across operating systems and platforms.

As an open source project, the Translator is a solid foundation for engineering work around the Office binary format. Dialogika’s development team has put together a few “how to” guides, including the Freeform Shapes in the Office Drawing Format guide, that helps to explain the specification and give some valuable tips. For developers and ISVs the code of this translator can be reused in their own applications enabling a wide range of document interoperability solutions.

We’re excited by this latest release making the translators more functional and addressing practical document conversion scenarios. Of course, there’s still work ahead of us! We are currently in the planning stage for the next version. In addition to the goals outlined above, it is very important to us that the translator adequately addresses practical user scenarios. To this end, we would love to hear feedback on this release as well as your feature requests for the next version. Please provide your feedback on the Sourceforge site.

Comments (14)

  1. Ian Easson says:

    I installed it on Windows 7 RC.  The doc2x program crashed as soon as it was started.

  2. suchawla says:

    Hi Ian,

    We are not able to reproduce the problem 32-bit Windows 7 RC (build 7100). Can you please share the details of your environment – the platfom (32 bit vs 64 bit), build of Windows, build of B2X Translator that you are using, and the steps that lead to the error. Also, do you get an error message with the crash?



  3. Ian Easson says:

    I just tried it again.  It failed, but for a different reason.

    I am running 32-bit Windows 7 RC.

    The build of B2X translator is the one available just after your blog posting came out.

    Here is what I did:

    – Installed the software, with no errors.

    – Ran cmd as an administrator.

    – CD’s to a directory with some .doc files in it.

    – Tried the doc2x.exe command, but it wasn’t in my PATH, so it didn’t find it.

    – I didn’t feel like updating PATH variable, so instead I just copied it from the installation directory to the directory I was in with the .doc files

    – Issued the command “doc2x.exe”, with no paramaters.

    The first time I tried it (yesterday), it just crashed with no messages.

    Today, I got the message “doc2x.exe is not recognized as an internal or external command, operable program or batch file.”

    As a result, I checked permissions on it, and everything seems OK.

  4. suchawla says:

    Ian, thanks for the additional information.

    A couple things to check –

    1) doc2x.exe has dependencies that won’t be loaded if you copy just doc2x.exe… can you please try running doc2x.exe from the directory that it is installed in. The default location is "C:Program Files (x86)DIaLOGIKab2xtranslator"

    2. Also, can you also verify that you chose the b2xtranslator_setup_r478.msi package for installing the translator on x86 Win7 RC?



  5. Ian Easson says:

    It worked when run from the installation directory.

    However, I found a bug.  I wanted to mass convert the .doc files under a folder and its subfolders.  There was no command line parameter to do this, but I found that if I did a search for name:*.doc in Windows, the search pane brings up all the .doc files.  If you right-click on one, you can convert it.  I thought I had the mass conversion problem licked, so I selected a screenfull of them, and the right-click menu showed, as expected, the Convert to ,docx command.  I then selected more than a screenful, and found the bug — the Convert to .docx command does not appear if you select more than a screenful of files.

  6. suchawla says:

    Hi Ian, I am glad it worked for you! We’ll look into the right-click selection problem. You may find it easier to create batch script to convert multiple docs.

    If you run into further issues, I request that you start a discussion on the project page on Source Forge. The URL is

    Thanks for using the B2X translator. We appreciate your feedback.


  7. Iain says:

    I just tried doc2x with mono on Ubuntu, and I get (with -v 4):

    5/29/2009 11:15:07 AM [E] Conversion of file /home/iain/path/to/foo.doc failed.

    5/29/2009 11:18:02 AM [D] System.IO.EndOfStreamException: Failed to read past end of stream.

     at System.IO.BinaryReader.FillBuffer (Int32 numBytes) [0x00000]

     at System.IO.BinaryReader.ReadInt32 () [0x00000]

     at DIaLOGIKa.b2xtranslator.DocFileFormat.AnnotationReferenceDescriptorExtra..ctor (DIaLOGIKa.b2xtranslator.StructuredStorage.Reader.VirtualStreamReader reader, Int32 length) [0x00000]

     at DIaLOGIKa.b2xtranslator.DocFileFormat.AnnotationReferenceExtraTable..ctor (DIaLOGIKa.b2xtranslator.DocFileFormat.FileInformationBlock fib, DIaLOGIKa.b2xtranslator.StructuredStorage.Reader.VirtualStream tableStream) [0x00000]

     at DIaLOGIKa.b2xtranslator.DocFileFormat.WordDocument.parse (DIaLOGIKa.b2xtranslator.StructuredStorage.Reader.StructuredStorageReader reader, Int32 fibFC) [0x00000]

     at DIaLOGIKa.b2xtranslator.DocFileFormat.WordDocument..ctor (DIaLOGIKa.b2xtranslator.StructuredStorage.Reader.StructuredStorageReader reader) [0x00000]

     at DIaLOGIKa.b2xtranslator.doc2x.Program.Main (System.String[] args) [0x00000]

    Using doc2x r478 on 32-bit Ubuntu.

  8. suchawla says:

    Hi Iain,

    How was the document created? Are you able to share the document with the project team for the sole purpose of reproducing the issue?

    Please submit your comments and any files you can share with the B2X Translator team on project’s forum at

    We appreciate your feedback.



  9. Microsoft has gone to great length to provide the interoperability of its technologies. This conversion of Microsoft Office binary file formats into the Office Open XML standard format is yet another step. I’m comfortable with the default .doc, .xls and .ppt formats and doesn’t need to go additional step to adopt Open standard format.

  10. Cohen says:

    Maybe something I missed but couldn’t we not already do this with the office compatibility pack/update for non open standard compatible office version?

    I thought there was a command line tool that supported conversion to docx (and xlsx, pptx, …)?

  11. Manoj says:

    Hi, i have problem with doc to docx convertion in amd64 architec.

    my debug message

    Welcome to doc2x (r649)

    Copyright (c) 2009, DIaLOGIKa. All rights reserved.

    02/10/2010 14:18:41 [W] Unexpected length of DOP (544 bytes) in input file.

    02/10/2010 14:18:41 [I] Converting file /tmp/Validation.doc into /tmp/Validation.docx

    02/10/2010 14:18:41 [E] Conversion of file /tmp/Validation.doc failed.

    02/10/2010 14:18:41 [D] System.DllNotFoundException: zlibwapi.dll

     at (wrapper managed-to-native) DIaLOGIKa.b2xtranslator.ZipUtils.ZipLib:zipOpen (string,int)

     at DIaLOGIKa.b2xtranslator.ZipUtils.ZlibZipWriter..ctor (System.String path) [0x00000]

     at (wrapper remoting-invoke-with-check) DIaLOGIKa.b2xtranslator.ZipUtils.ZlibZipWriter:.ctor (string)

     at DIaLOGIKa.b2xtranslator.ZipUtils.ZipFactory.CreateArchive (System.String path) [0x00000]

     at DIaLOGIKa.b2xtranslator.OpenXmlLib.OpenXmlWriter.Open (System.String fileName) [0x00000]

     at DIaLOGIKa.b2xtranslator.OpenXmlLib.OpenXmlPackage.Close () [0x00000]

     at DIaLOGIKa.b2xtranslator.OpenXmlLib.OpenXmlPackage.Dispose () [0x00000]

     at DIaLOGIKa.b2xtranslator.WordprocessingMLMapping.Converter.Convert (DIaLOGIKa.b2xtranslator.DocFileFormat.WordDocument doc, DIaLOGIKa.b2xtranslator.OpenXmlLib.WordprocessingML.WordprocessingDocument docx) [0x00000]

     at DIaLOGIKa.b2xtranslator.doc2x.Program.Main (System.String[] args) [0x00000]

    Please help me on this



  12. Dear team,

    I am currently converting my files with excelcnv.exe and ppcnvcom.exe & wordcnv.exe from the FileConversionPack. Now, your solution is easier to use (context-menu integration), but it seems that you have developed your own conversion engine and that, I quote, "there’s still work ahead of us!" Now, I suppose not all the file features are supported (cfr mapping)?

    My question is, what tool is better to use? Is the end-result the same between the bt2x translator and the converters from the FileConversionPack? If not, which one is better to use to have a 100% OpenXML file?

    Thank you in advance,

    Quentin Denis

  13. yogesh says:


    I am not able to find out the sample code to convert the DOC to DOCX file.

    Please share the sample code if available