Lost in translation


Chris Capossela (head of the Information Worker business group) was over in the UK today and I got to join him in a couple of his briefings.  Chris is a very impressive spokesperson and I learnt a lot listening to him handle questions.  We got into some really good debates and I thought one of the discussions about XML was insightful and which had never occurred to me before in this way.

We were discussing why Microsoft didn’t adopt ODF as the standard for 2007 but instead created our own standard, OpenXML.   Brian Jones has a great post on this last week where he responds to an articulate critical view on his blog – well worth a read.  I had the pleasure of meeting Brian in New York a while back – a laser sharp, very thoughtful, affable and humble guy.

In our discussion today, the arguments we debate were along these lines:

  • We could not use ODF because we have to guarantee backwards compatibility.  There are billions of documents out there from older versions of Office and we need to ensure that all of them can be opened in the new format with full fidelity.  In years to come, our customers tell us they need to be able to read Office documents without needing to dig up some archaic version of Office.  The content needs to become independant from the application that created it.  XML is the way to achieve this.  Since ODF is based on the feature set of OpenOffice which is perhaps on a par with Office 2000, it is not reasonable to expect it to fully describe documents created in officexp or 2003.  ODF does not have a way to fully describe pivot tables for example.  For this reason we could not just use ODF as our chosen XML standard.
  • ODF was not sufficiently described.  Brian Jones’ blog talks more about the thousands of pages of documentation on OpenXML.  At the time when 2007 entered development, ODF had not reached a ratified standard status.  There was insufficient depth making in impossible to implement without unacceptable levels of creative interpretation.
  • Translation between applications is only ever as good as the extent to which those applications overlap.  When Microsoft decided to create strong portability between Word and WordPerfect it took thousands of person years to get that fidelity near perfect.  Even then it was never perfect.  Chris’ view was that translation will never be perfect.  When you translate form French to English, you lose something.  There are words in French that do not exist in English and so something, however subtle, will be lost.  We like to hope that XML creates this Nirvana of interoperability.  It definately helps a huge amount but translation between schemas from different applications will never be absolutely perfect as long as those applications do slightly different things.

I’m no Brian Jones and will never understand the issues as deeply as he does but this helped me understand the reasoning better.  

I encourage you to take a look at his blog if you have never done so because it is an excellent example of great blogging (IMHO).  As I was saying to Chris today, great blogging is about dialogue and Brian does that superbly.  Even if you disagree with him (I find he is very presuasive), you have to applaud his true blogger spirit 🙂

Comments (3)

  1. Aparna Aswani says:

    Nice, I hear that Chris is an extremely great spokeseprson. I enjoyed reading this entry and very useful. 🙂

  2. monkchips says:

    surprised to see such specious arguments actually.

    you can’t guarantee backwards compatibity with openxml either- hence bulkloaders.

    and number of pages is NOT a measure of the quality of documentation. sure you have thousands of pages documentation. but they could be crap, for all we know.  

  3. dstrange says:

    unnecessarily polemic James but since I like you I’ll overlook that and try to respond to your comments.  I must say I find it hard to see how your comments relate to the thrust of my post but anyway..

    You can guarantee backwards compatibility with OpenXML.  We knew that we wanted to move the binary format used since 97 to XML.  The question though is which schema?  Since OpenOffice is a subset of the Office feature set (or a different set at least), the binary format cannot be fully described by ODF.  There is no way to describe some aspects of an office doc in ODF.  If we were starting from scratch, we could have created a new Office application which worked within that described in ODF but that would have involved cutting out the stuff that couldn’t be described.  However, because we have to ensure that Office 2007 apps can open older docs, the schema needs to encompass the full range of features.  OpenXML fully describes the document and since we can open 97-2003 docs, we can save them as OpenXML, fully described.  You can even edit the xml then reopen it in the app and it will work (it is roundtrippable). Sure there is work to convert a binary into XML which might be what you mean by a bulkloader but this has nothing to do with the issue of which schema to choose.  

    I never said number of pages was a measure of quality.  My point is that there is a lot to it and for it to be a standard you can code to it must be complete which ODF was not at the time development began.