More information on the Open XML translator and some questions answered

There were a lot of great comments from last week's announcement about the creation of an open source project to transform between the Ecma Office Open XML formats and the OASIS OpenDocument format. Rather than respond to all the comments and questions directly, I thought it would be better just to write up another post to address the general themes people have raised.

Here are the main questions:

  1. Will the translator only work with Office 2007?
  2. Aren't there licensing differences that make ODF and Open XML incompatible?
  3. Will the functionality be easy to find in the UI?
  4. Doesn't this move contradict what you've been saying about Office not supporting ODF?
  5. Will the Ecma Office Open XML formats still be the default in Office 2007?
  6. Why don't you join OASIS and help improve where they are lacking?

What versions of Office will this work with?

Well, first you should all remember that we are making the new Open XML formats backward compatible and providing free updates to Office 2000, XP, and 2003 which will allow all three of those versions to consume and generate files in the Open XML format. The new tool that is now an open project up on sourceforge will convert from the Open XML format into ODF, which means that you can use this tool in combination with the free updates to read and write ODF in all those earlier versions of Office as well.

Aren't there licensing differences between ODF and Open XML?

Actually, this misunderstanding is the unfortunate result of a really strong push by folks who I don't believe quite understand the Open XML story. There are a handful of folks who blog a lot (primarily ODF supporters) who aren't up to speed on the latest policies around the Open XML formats.

Let's address this first misunderstanding. The formats are available without any licensing restrictions. Any IP (patents, etc.) that Microsoft may have behind the formats does not apply to folks who want to implement the formats, because Microsoft made a legal commitment to not enforce that IP. If you hear people complaining about licensing issues, they probably just aren't up to date.

Secondly, the formats are no longer owned by Microsoft, they are owned by Ecma international. They are fully documented and the spec is free to download. A large number of organizations (British Library; Apple; Novell; Microsoft; BP; Intel; etc.) have worked on ensuring that the documentation allows for cross platform implementation.

Will the functionality be easy to find in the UI?

Look for yourself, here's a screenshot of the current prototype:

It's directly exposed in the UI. We're even going to make it really easy to initially discover the download. We already need to do this for XPS and PDF, so we'll also do it for ODF. There will be a menu item directly on the file menu that takes to you a site where you can download different interoperability formats (like PDF, XPS, and now ODF).

Heck, if you wanted to be even more hardcore, the Office object model allows you to capture the save event. So if you wanted to you could make it so that anytime you hit save you always used the ODF format, just by capturing the save event and overriding it. I'm not expecting folks to do that, but it does show just how extensible Office really is.

Doesn't this move contradict what you've been saying about Office not supporting ODF?

I've been pretty clear that I thought third parties would come along and build ODF support into Office if there was interest. That was shown to be the case pretty early on, as there have been a couple different projects announced over the past year. Ironically, one of the most high profile projects was announced by the OpenDocument Foundation but it has turned out to be pretty secretive and closed, which seems to go against all the goals of "openness". I've had folks ask me how they can get a hold to it, but as far as I can tell only a select group of folks have been given access. I saw a quote saying they still hadn't decided if they wanted to charge for it or not, so that may still be holding things up.

With all the mystery around projects like that, we had a number of governments ask us to get involved and actually choose a project to back, as they wanted to know that if any of their constituents used the ODF format, they would be able to view those files.

I think this project is a great example of the openness of both of these formats. We are now going to have an open source implementation that everyone can use. It will of course be freely available to anyone, and will really help show how to use the two architectures of ODF and Open XML.

Will the Ecma Open XML format still be the default for Office 2007?

Yes, this is definitely still the case. While this new translator will help people read and write the ODF format in Office, it will also help make it clear to all why the Open XML format was necessary. The Open XML formats were designed to be 100% backward compatible with the existing set of Office binary formats, and that was really a goal that we can't compromise on. If we went with an XML format that resulted in data loss or poor performance, then the only people that would use it would be folks who actually cared about that specific file format. Since most of our users don't really care about file formats, we needed to create an XML file format that we knew everyone could use, otherwise most people would have just gone back to using the old binary formats, and that doesn't help anyone.

While the ODF format is great in terms of being an open XML format, it's lacking in a number of functional areas that make it not a realistic option for Office to use as a default format. For instance, the format for ODF spreadsheets is much less efficient from Open XML's spreadsheet format. I have a few posts talking about this (and I plan to cover it in greater detail as we move forward):

  1. Design Goals behind SpreadsheetML
  2. Spreadsheet performance - Shared Formulas
  3. Does tag size matter?

There are also a whole host of areas that are left unspecified in the spec (such as spreadsheet formulas), which would have meant we'd either need to extend the format, or wait for it to catch up (and it sounds like they are more than a year out for formulas in particular). There are a number of blog posts out there talking about the incompatibilities between the various applications that have implemented ODF, and a lot of that is due to the lack of clarity on some features in the spec. Look at this comment from the OpenDocument Foundation talking about KOffice's ODF support:

"Our tests show that OpenOffice and KOffice have some problems opening each other's OpenDocument files. Also, support for drawings is a bit incomplete."

The Ecma Open XML format is significantly further along in all of these areas, just look at the differences in the documentation of numbering formats, formulas, etc. The draft of the Ecma spec released back in the spring has over 160 pages on spreadsheet formulas; the ODF spec only has 1 page.

I don't want to be critical of ODF because I think it's great to see applications use open XML formats for their storage. I'm calling attention to these points because I think a lot of folks have mistakenly assumed that once there is a standardized office format, everything is set and you don't need another one. Unfortunately that's not the case, and I want everyone to understand why we couldn't use ODF as our default format. I have no problem with multiple XML standards for documents and I think this is definitely a case where an alternative is necessary. If a single XML file format were the way to go, then we would have just stopped with XHTML (or maybe DocBook).

Most of our customers actually do understand this, and contrary to the news being spread (primarily by people excited about the ODF format), most governments have not adopted policies around ODF exclusively but instead around open formats in general. Most of those governments have also expressed that once the Office Open XML format is approved by Ecma, it would also be viewed as an open format.

For example, the Belgium government is currently being described as "mandating ODF", but that's actually not the case. They even made a public statement last week after we made the translator announcement that clarified this. Here's a small blurb from that:

"The government’s choice for ODF is clear, but not exclusive." ..."If the OpenXML file format (Microsoft’s own contribution in the domain of open standard file formats) receives ISO approval as a standard, then this format will also be eligible for use in the administration of the Belgian government."

Why don't we join the OASIS technical committee to help them along?

I had a few folks asking this question (and saw it on a few other blogs as well). The standardization of Ecma Office Open XML formats is really moving along well, but there is still a bit more work to do here to nail things down. If you've read through the latest draft (all 4000 pages), you've probably noticed how comprehensive of a spec it really is. For example, there are over 160 pages on how spreadsheet formulas works as compared to 1 page in the ODF spec. The ODF spec still has a lot of catching up to do, and according to this post they are still more than a full year from just getting in line on some of the basics (like formulas) that have existed in office documents for decades.

The Ecma Office Open XML spec on the other hand serves as a great base in terms of fully standardizing an XML format that is capable of representing the billions of Office documents that exist today. Once that's done, we (as a community) can then move forward and start to enhance it with new innovations. It's maintained by Ecma, and anyone can join and participate in the standard.

I think that anyone interested in helping to drive the future of office file formats should join us in Ecma and take advantage of the powerful framework for document formats that is being delivered. As I already pointed out, formulas in spreadsheets for example is already close to being fully documented. The same is the case for all the international features and functionality (like the various numbering styles I'd mentioned before). If you don't have the time to participate directly in the working group, you can instead send direct feedback here: mailto:ecmatc45feedback@ecma-international.org

-Brian