OpenXML Document Viewer v1 released: viewing .DOCX files as HTML


[05/18- Update:
this translator is highlighted in today’s Document Interoperability Inititice (DII) event that just happened in London ]


The OpenXML Document Viewer project idea came from the discussions with the participants of the Document Interoperability Initiative (DII) workshops (in particular last year’s Cambridge event). The point was to find a way to simply be able to view Open XML files as HTML. Following up, Microsoft provided funding to start the Open XML Viewer project, an open source project developed by MindTree Limited. The first beta version was unveiled at the last DII in Brussels, giving a first peak of the viewer (see a demo here).


Today I’m excited to announce the version 1.0 of Open XML Document Viewer. It provides direct translation for Open XML Documents (.DOCX) to HTML, enabling access to the information in the Open XML format from any platform with a Web browser. The project, which already includes a plug-in for Firefox IE7 and IE8 and now also offers a plug-in for Opera, allows users to view Open XML documents (.DOCX) within the browser on Windows and Linux platforms without the need to install Microsoft Office or other productivity products.
Check out the demo my colleague Jean-Christophe Cimetiere has recorded to see the Open XML Document Viewer in action from the end user perspective:










For more detail on the supported features go visit the project site http://www.openxmlviewer.com 


In principle, the functionality of the viewer is simply to translate OpenXML files into HTML for direct consumption in a web browser.


Here’s a scenario (the sample document is attached):


· You have an Open XML document (.DOCX). Let’s view it in Office Word 2007 first:


openxmlviewer-wordpreview


· Then, let’s say you email this file to your friend who’s using OpenSUSE Linux. Your friend saves the document on the desktop and drags & drops it into the Opera browser:


openxmlviewer-linux1


· The Open XML Document Viewer kicks off and creates the HTML that’s displayed by the browser:


openxmlviewer-linux2


The experience is similar with Firefox on Linux and and with Internet Explorer 7/8, Firefox 3.0.x, and Opera 9.x on Windows:


openxmlviewer-windows


Next let’s examine the high level architecture:


openxmlviewer-architecture


The core of the project is the Translation Engine that does most of the work, meaning opening the .DOCX document, reading, mapping and transforming to HTML. The Translation engine is exposed as a client side browser plug-in with support for Firefox, Opera, and Internet Explorer, and as a cross platform command line translator for use in server side applications.


The result is a translator that enables Open XML document (.DOCX) visibility within browser applications without the use of any of the usual office productivity or word processing applications, across multiple platforms and environments, as either a server side application or as a client side end user solution. Developers, Independent Software Vendors (ISVs), Solutions Integrators & Mobile Solution providers can use these tools to enable their customers to view Open XML documents on heterogeneous platforms and browser applications. Be sure to check out the Demo web site. It showcases server side document processing scenarios that represent very typical use cases.


We’re very excited with this new version and look forward to your feedback.


Join us at http://www.codeplex.com/OpenXMLViewer


Sumit Chawla, Technical PM/Architect, Microsoft Interoperability Team

Sample OpenXML DOCX.docx

Comments (13)

  1. Irakli Lomidze says:

    I have tried this products.

    BUT there are many mistakes in conversion.

    MS Word makes more correct conversion than this Application.

    Example. Legacy List

    1.1

    1.2

    1.3

    Try to do it with Ms Word and Try do it with this Application.

    My Position Idea to create kind app is grate. It will be more grate to include in in Open XML SDK and MS Office Team Patrisipate to Conversion Project.

  2. Daï says:

    Very good… Did you make this work for pptx files ???

    TIA

  3. suchawla says:

    Hi  Iraki, I was able convert a Word document (containg a list like you described above) to HTML using the OpenXML Viewer 1.0 wihtout any problem. Can you please share a document that exhibits the problem?

    I encourage you to submit a more detailed problem description on the Project’s codeplex discussion forum:

    http://openxmlviewer.codeplex.com/Thread/List.aspx

    With respect to your other question:

    Microsoft Word already has a "Save as HTML" option.The purpose of the OpenXML Viewer open source project is to demonstrate how a converter to HTML can be created using the OpenXML specification and publically available documentation.  From a  practical standpoint, the translator provides the ability to view DOCX files on machines that may not have Microsoft Office.

  4. suchawla says:

    Hello Dai, At this time the OpenXML Viewer only does DOCX files. If there is enough interest in a PPTX viewer, we will consider it for future scope of the project.

    Thanks,

    Sumit

  5. Pessoal, Nova versão do Microsoft Open XML Document Viewer, 1.0, chega ao mercado com tradução direta

  6. iso_question says:

    Hi,

    Does the plugin works with Firefox 2.x?

    Is the plugin compliance with ISO-29500? Just remembering that OOXML, implemented by MSOffice 2007, does not comply with ISO-29500 since OOXML was modified by that commitee.

    Thanks.

  7. Sumit Chawla says:

    Regarding Firefox versions: We have only tested the Firefox plug-in with 3.x versions of the Firefox browser. It may work with 2.x versions but we have not tested with those versions of the Firefox browser.

    Regarding ISO-29500: Our focus for v1 is to offer a practical solution for users who do have Microsoft Office 2007 on their computer to be able to view Word 2007 (DOCX) files. Since Office 2007 implements the ECMA-376 standard, this is the standard that the OpenXML Viewer supports today.

  8. qualitydirectory says:

    @iso_question

    You asked an interesting question. Firefox is a totally open source browser and no license key is needed to upgrade to the latest version. If you set the browser to auto-update, it will automatically update itself to the current version and your worries about the plugin working in Firefox 2.x will be over. Why stick with an old version that has been superseded with versions that have received many security and bug fixes?

  9. Furniture says:

    I have been using this and have seen many errors happen in conversion, hope it gets better!

  10. Nanda says:

    hi … i tried open xml viewer with visual c sharp in my project. but the problem is it’s converting the text which are there in the doc. but not images. images are left empty. do anybody know the solution?…

  11. nanda says:

    images are not converting while converting from docx -> HTML..can anyone tell me the solution

  12. Marko says:

    @nanda: try saving the html files on the same folder with your images.