Open XML Implementations Part 2


There is a great conversation happening in the comments section of my last post about independent implementations of Open XML. Andy Updegrove asked a few questions (and later made some observations) along with a group of folks who are actively building implementations. I’d encourage you to read that in conjunction with this.

Topics to tackle today (in as few words as possible) based on the earlier thread:

  1. What Is the utility of the Open XML specification in terms of implementation? Is that utility to be measured solely on the result of a competing office suite to Microsoft Office? Or are there other possible benefits?
  2. Are implementations of Open XML that were expressly done to interact better with MS Office somehow “less” than implementations of Open XML in a form completely unrelated to Microsoft Office?
  3. If there are indeed applications stemming from the availability of the Open XML specification that do not have anything to do with MS Office – is it important that they be backwards compatible with MS Office old binary formats?

The Utility of Open XML

It is important to remember that the European Commission (yes, I am quite aware of the CFI decision, and very aware that we have some more work ahead of us to meet expectations on interoperability) expressly recommended that Microsoft standardize the MS Office document formats. This happened in conjunction with our preexisting work on XML formats, customers requesting more openness in formats, the Massachusetts ETRM policy, ISVs looking for more ways to work with XML and our formats, etc., etc. My point here is that many, many people felt there would be great benefit to the standardization of Open XML.

The utility of a fully documented Open XML specification (commented issues and all) is manifest in how the individuals and organizations are making use of it. To me, this is a critical part of the discussion. The desire to have greater openness (I have written about this before) has led directly to increased choice, and a wider range of available solutions. The specification has created new economic opportunity which will ultimately lead to customer benefit. 

These implementations are directly competing with Office while others are utilizing it to work more effectively with MS Office, and there are those that have absolutely nothing to do with MS Office. This is exactly what was being looked for in requesting the format be standardized in the first place.

Working With MS Office – Good or Bad?

There is no reason to disparage an implementation of Open XML for working with MS Office or not. The value of those implementations will play out in the quality of the overall solution in which they are part. This is a big part of what standardization is all about. Does the specification result in market-viable implementations. OSI vs. TCP/IP ring any bells? Standards bodies are constantly on the lookout for work items that are relevant to the marketplace, and offer high value to implementers.

If working better with MS Office were the only way that implementations of Open XML worked, it would be hard to argue that there was no utility in that, nor that it did not create massive economic opportunity for the implementers. Better yet, this is not the case. The opportunity is considerable for non-Office-related work as well as for those with a direct relationship.

Backwards Compatibility

Backwards compatibility with old MS Office binary formats remains a core tenet of the Open XML work at Ecma. Vast amounts of data is stored in the old binary formats and TC45 is clear in the importance of maintaining the bridge back to that data.

But, is it absolutely critical for any app that uses Open XML as its data format to have high-fidelity backwards compatibility? Especially if it has zero relationship to the MS Office suite, or even to any recognizable office automation function? Probably not, but who am I to judge? It may be that they want to take streams of old Excel data into some new-fangled solution – ok. In my book, the option needs to be there, and certainly in the case of the Microsoft Office 2007 implementation it is something that our customers have told us is critical. Outside of MS, I can imagine a document management company building with backwards compatibility in mind, while someone doing a vertical market data analysis solution with a stand-alone reporting engine that pumps out Open XML dynamically to not care a whole bunch about it.

Implementation is often used as a measure of the viability of a standard. For those National Body representatives looking for clear evidence that Open XML is already a successful standard, is already providing the exact opportunity and choice it was supposed to – just observe the comments made to my last posting. The specification most certainly will be improved by the work done considering all of the comments. Maintenance will continue to improve it as well – and that will be done (hopefully) under the proposed joint maintenance agreement with SC34.

Just today I heard of 2-more completely independent implementations in France, plus an additional 8 or so doing improved integration with MS Office.

The fact that there are so many credible, and high quality implementations merits some deep consideration in the ballot resolution process this coming February.

Comments (22)

  1. Swashbuckler says:

    "It is important to remember that the European Commission … expressly recommended that Microsoft standardize the MS Office document formats."

    Don’t spin so much you’ll get dizzy! 🙂

    The recommendations say that document formats should be standardized.  That’s not the same thing.  It could just as easily be read as recommending that Microsoft implement ODF.

    Taking your reading of the recommendation, I could get ECMA to approve Open Swashbuckler Document Format (OSDF) and it would meet their recommendation.  I don’t think that’s what they had in mind — call it a hunch.

    "But, is it absolutely critical for any app that uses Open XML as its data format to have high-fidelity backwards compatibility?"

    Hmmm… It sounds like you’re saying that it’s not important for a product to implement the whole standard.  Is that what you’re saying?  And if so, Microsoft wouldn’t use that against another product, would it?

    "The fact that there are so many credible, and high quality implementations merits some deep consideration in the ballot resolution process this coming February."

    Oh boy!

    Having been a part of standards work in several organizations I can say that there’s no way I’d vote to approve a proposed standard with so many obvious problems in the spec.  Typically, you try to do some implementations (and in the case of the IETF it’s mandatory) to try to find problems in the proposed spec.  In OOXML’s case, problems in the proposed spec just leap out at you.

    In this post, it seems as though you’re saying "Pay no attention to this problematic stuff in the spec, you’re not going to implement it anyway."  That’s just not how standards are supposed to be done.

  2. jasonmatusow says:

    Swashbuckler –

    The recommendation was for us to standardize our format…and we did. For this standard, it was important that we work to bridge the gap between traditional standards work and free software in terms of IP…we did. It was important that the specification be complete enough for independent implementations…it is. It was important for the work to be done by a group that represented broad interests in the industry (partners, competitors, public sector, customers)…it was.

    I respect your passion, and point of view. I just don’t happen to agree with it. That is not spin, that is simply someone with a different opinion than you.

    Microsoft Office has enjoyed significant success in the marketplace. With that success comes benefits and challenges, like anything else. Backwards compatibility as the innovation continues to push the product forward is a serious challenge. As for your question about my thoughts on people implementing the whole standard, I have a simple answer. It is their choice. The terms that govern the IP are structured on purpose to enable partial or whole implementation. I would assume you don’t want us making that decision for you. I also remind you that you don’t have to implement it at all – that is a choice for you as well.

    The individuals who have been commenting on the other thread have chose to implement Open XML for a variety of reasons – and some even clearly state that they made this choice over ODF for technical or business reasons. Others will implement both. GREAT! I love it. That is the funny thing about supporting choice…is endorsing that actual choices that people make.

    Finally, I have never endorsed the idea that the things in the Open XML specification that could use improvement should be ignored. In fact, I have been publicly endorsing the idea that ALL comments SHOULD be considered, and the Projet Editor/TC45 team should do the heavy lifting to make the spec better based on the comments. That is common industry practice (ODF 1.1…ODF 1.2? Anyone? Bueler…Bueler?

    Anyway – we are not likely to agree, but please keep me honest.

    Thx

    Jason

  3. Swashbuckler says:

    "The recommendation was for us to standardize our format…and we did."

    Can you cite the exact text in the document you linked to that says this?  What I see says "Preferably, these open document exchange and storage formats would be subject to formal standardisation via international standardisation procedures."  That’s NOT the same as recommending Microsoft standardize OOXML.

    "As for your question about my thoughts on people implementing the whole standard, I have a simple answer. It is their choice. The terms that govern the IP are structured on purpose to enable partial or whole implementation. I would assume you don’t want us making that decision for you."

    Ah, but you are making that choice for me because all of the issues regarding backwards compatibility are not documented in the standard.

    Fortunately, I doubt either I or my employer will have to worry about it, but I’m sure others do.

    "I have never endorsed the idea that the things in the Open XML specification that could use improvement should be ignored."

    Kudos!  Very carefully crafted response!

    Put another way, you are in effect recommending that serious bugs be deferred (serious being in the eye of the beholder) to a subsequent version of the spec.

    Companies always have to make that kind of determination when it comes to ship software.  "Is this bug serious enough to stop shipping?"  Standards should be held to (if you’ll forgive the pun) a higher standard.

    "That is common industry practice (ODF 1.1…ODF 1.2?"

    It seems you are conflating enhancements with bug fixes.

    Let me ask you this, if ODF vX would submitted for approval with all the problems that OOXML has, would you honestly want to see it approved?

  4. jasonmatusow says:

    I think we are going to simply have to agree to disagree on this one Swashbuckler. I do hear you on your points, and I do not consider them without merit. I don’t agree with the conclusions.

    I did a longer posting on the status of ODF when it went through the JTC1 process. There were errors in the spec, and things that were completely missing (formulas, right-to-left text), and there were editorial problems (just read the comments). I did not oppose it then, and I don’t now. If I am to take the position of choice on document formats, then I should be consistent shouldn’t I? The business strategy behind ODF was to standardize fast with an immature spec, get the ISO impramature to drive awareness, use the ISO status to try and drive procurement preferences, and start fixing all the things that were either shoddy in v1 or not in there at all. That strategy seems to have done quite well for its supporters up to this point – but let’s call it what it is. ODF was not about the purity of the spec, nor of the purity of ideals about doc formats. (that is not to say there was not good engineering, nor good ideals – it is just that they were not the real motivators).

    At the BRM – it is not about a referendum on ODF. It is about Open XML. I think the fact that these implementations exist should be an influential part of the long-term view of Open XML.

  5. Andrew Sayers says:

    Jason,

    I’ll stick to discussing issues around your first point (the utility of Office Open XML), as it seems like the others can only be answered by evidence that I don’t have.  I’m not trying to make any wider point with this post, it’s just a collection of observations.

    ODF really raised the community’s expectations of quality in a document format, even if it sometimes fails to meet those expectations itself.  Because the bar has been raised so high, it often goes unsaid how much better an ECMA-approved XML-based file format is than the binary formats that went before.  For example, if one has N man-hours to complete a project, and it would take 0.5N man-hours to get a sufficient understanding of a binary format (including reverse-engineering time), you’ve only got half as much time to get your project done.  The time to learn the format might even be greater than the total number of hours available, killing the project before it starts.  The improvement in documentation means that even if all of the criticisms of Office Open XML are taken at face value, it’s very hard to argue that this generation of office tools won’t be way ahead of the previous generation.

    Reading the PEGSCO recommendations, it seems like there’s an implicit assumption that the ISO would only certify a standard if it was faultless.  Since everyone agrees that there are serious shortcomings in at least one of the document formats the ISO have considered, it seems like that was a serious error of judgement on their behalf.  It’s not surprising therefore that they didn’t see how saying "prefer standardised to proprietary formats" translates in business terms to "first through the ISO, wins!".

    While you mention several useful aspects of the specification, one thing I’ve started to realise lately is how much a specification shapes your view of a problem.  During a conversation on his blog, Rob Weir discussed the "structuralist school" of document authoring, compared to what he considers the "bad habit" of treating a word processor like a digital typewriter.  The discussion is available at https://www.blogger.com/comment.g?blogID=11236681&postID=8539500596888326148, and while the whole thread is worth reading, his comment is dated "Wed Sep 12, 01:19:00 PM EDT".  Because ODF is a highly structured (tree-based) format, you have to grok the structuralist school of document authoring to understand it; on the other hand, you have to grok the "digital typewriter" school in order to understand Office Open XML’s unstructured (stream-based) format.  For information scavengers like myself, the true value of the Office Open XML specification is that it casts a light on this sort of deeper assumption behind the formats.

    – Andrew

  6. hAl says:

    Actually the EU recommendation is listed here:

    http://ec.europa.eu/idabc/en/document/2592/5588

    "Microsoft should consider the merits of submitting XML formats to an international standards body of their choice"

    A rather amusing recomendation is this one:

    "Microsoft assesses the possibility of excluding non-XML formatted components from WordML documents"

    This recommendation has seems to have actually led Microsoft away from the single XML file appraoch of the Office XML 2003 formats  to the use of ZIP package. However Opendocument still fully supports the use of a format that has non-XML formatted components in a single XML file.

    The IDA  reccommendations were made by a directorat resorting directly under the European Commission and were approved by a committee representing all EU member countries.  

  7. hAl says:

    Also interesting:

    At the time of those tag recommendations called:

    "TAC approval on conclusions and recommendations on open document formats"

    OASIS still called their format Open Office formats and their TC doucments were signed with Open Office TC.

    So the EU recommendations are not just the basis for standardization efforts but also for the  name change of OASIS’ Open Office format and TC to Opendocument.  

  8. Jason,

    I’ve posted an entry back at my blog pointing back at this and your last entry, and agreeing that what really matters is what the marketplace chooses to do with OOXML and ODF, with some additional observations on how that relates back to the goal of assuring long-term access to documents.

    Andy Updegrove

  9. It seems to me that both ODF and OOXML are interim stages to a document format that works. You look at ODF, and there bits missing in there, the ability to nest XML in different namespaces, like a formal definition of formulae (With tests) etc. But OOXML? It doesnt come with a public test suite either. It also has lots of stuff that looks -well- transitory. The next generation of application will have less VML in, will have less binary stuff inline, so why freeze things now? Presumably because the EU and other governments forced Microsoft to follow ODF. Question is: should ODF have been standardised yet, given it cant reliably represent the vast amount of documents in existence.

    One thing I do have to criticise the OOXML story for is "backwards compatibility", as if a goal of OOXML is lossless replication of existing content. Whenever I open an MS word document sent to me, it gets relaid out for my (A4) printer. And whenever i change printers, it gets laid out again. So there is no stable layout of the content. If the tool’s can’t reliably represent content across machines, why even pretend that it can be done across versions. Alternatively: if it doesnt matter that things get laid out differently depending on your default printer, how important is backward’s compatibility in the first place?

  10. Jalf says:

    One thing I’m curious about is this:

    How many of these OOXML implementations are actually interoperable?

    From what I’ve seen, Apple’s OOXML implementation generates errors and warnings when opening the .docx version of the OOXML spec, and also generated visual artifacts like missing/wrong page numbers and other problems.

    I think the most important question is not "Are there independent implementations, and what do they use it for", but  "do the independent implementations actually understand each others documents?"

    Obviously, if they don’t, it’s not a very useful standard.

    Then the question becomes "did the implementers not follow the spec (their fault), or does the spec contain ambiguities or underspecified parts that make it impossible to write independent implementations that actually generate and read the same documents (the fault of the spec itself)"

    So now that we have a fairly impressive list of implementations, how many of them can open, say, the OOXML spec in .docx format correctly? Or a spreadsheet?

    I think that has to be the ultimate test of any standard. is it standardized enough for people to rely on it? TCP wouldn’t be much good if I couldn’t rely it to behave the same way regardless of implementation. Can we rely on OOXML?

    And a separate question for Jason:

    Some of the ISO comments (notably the ones from France) suggested merging OOXML into ODF over the next several years.

    How do you (not Microsoft, as I assume you can’t speak for them) feel about such a solution?

    To me it definitely seems to have its merits.

    And finally, you mentioned choice above.

    To me, that seems like a nonsensical argument. The point in a *standard* is not, and should not be, to provide choice. On the contrary, it is to eliminate choice.

    Do you have a choice in which side of the road to drive?

    Would you be better off if you had? I’d say having such a *standard* makes life safer for a lot of us. And the point in the standard is exactly to eliminate choice.

    How about the standard for voltage in AC outlets?

    Would you be better off if you had a choice of 20/40/80/110 volt outlets?

    And similarly, is it a good thing that we can "choose" between two document formats? (Why not four then? Surely, having 80 different standardized documents would be even better.)

    That just means I have to be able to read both, because people sending me documents might use either one. And preferably I have to be able to write both too, because I never know which one the recepient of a document supports. Who benefits from this kind of "choice"? (which isn’t really choice, but just forcing people to cover both paths)

  11. Igor says:

    Matusow said:

    Vast amounts of data is stored in the old binary formats and TC45 is clear in the importance of maintaining the bridge back to that data.

    Let’s suppose I want to implement to complete OpenXML specification on non Windows platform, including the ability to work with old  Word .doc binary formats.

    Is there a document, describing the old binary format, available ?

    If it isn’t, then  I would like to know  how can I implement the complete ECMA OpenXML standard (and, perhaps the complete ISO OpenXML standard in the future )?

    I am assuming, I am missing something here.

  12. Samuel N. Woods says:

    While I generally have no objection to your posting here this bit struck me as a play on words (marketing speak if you will)

    "If working better with MS Office were the only way that implementations of Open XML worked, it would be hard to argue that there was no utility in that, nor that it did not create massive economic opportunity for the implementers. Better yet, this is not the case. The opportunity is considerable for non-Office-related work as well as for those with a direct relationship."

    The idea of standards is to allow for interoperability between different products or offerings, this is a very tangible benefit to consumers and developers/manufacturers alike..unless of course they are already in a dominant position.

    If the implementations of OOXML are primarily just allowing for file compatability with Microsoft Office I would argue that it doesn’t bring true interoperability and as such fails to fulfill the role of a standard.

    In that case its somewhat comparable to a alternate history where the 802.11 standards were for the purpose of interoperating with for instance Linksys wireless routers.

    Can you give me a number or proportion that specifies how many of the implementations are for purposes aside from Office interoperability (because I know Gnumeric isn’t) and how many are full implementations?

    The reason being that surrounding this entire standardization process there have been a number of announcements such as of "strong support" and "implementations" that were less than complete.

    I have no objection to a standard if it allows complete interoperability, but if theres not a single case of a full implementation aside from those using Microsoft API I do not believe that ECMA376 passes that test.

  13. hAl says:

    It is well worth noticing on the above comments  that ODF with almost two years headstart in standardising still has no full implementation and also no fully interoperable implementations with OpenOffice which is the main implementation.

    It is very well possible that with another 1,5 – 2 years the amount of interoperability between the different Office Open XML application excedes the current interoperability of ODF implementations.

  14. Wu MingShi says:

    Woods,

    I agree with your comments.

    However, I think publishing something as a standard, even if in real life it is simply to interoperate with one program and one only, has merits. This is true even if one vendor is in charge of the specification and they take madman-like dictatorial approach. I just don’t think these type of standard deserves ISO blessing.

  15. Swashbuckler ,

    > Ah, but you are making that choice for me because all of the

    > issues regarding backwards compatibility are not documented in

    > the standard.

    If you seriously think that backwards-compatibility is solely concerning the couple of handfulls of legacy attributes, you ought to do a better job at your homework. Backwards-compability is much more than that and it cannot be removed from the spec. It is the ability to persist the possible settings in the previous formats in OOXML, and stating that it only concerns the legacy attributes really show the debth of your technical knowledgde on OOXML.

  16. Samuel N. Woods says:

    hAl,

    My question and general push in the prior posting is not so much has it been fully implemented (I would be surprised if it had due to the refusal of the largest player to accept it), the question is can it be?

    While I’ve not tested Lotus Symphony yet to see if it implements complete interoperability (and that would probably take some time) theres none of the questionable patent assurances and undocumented or under-documented format specifications.

    If Microsoft would release these specs and make an assurance that an independent implementation of them by third-parties would not be acted upon legally then in one major respect the objections I have to the standard would disappear.

    Because of the reliance by OOXML on a number of proprietary Microsoft technologies the problem presently is the same we’ve had with the Microsoft binary formats. The format is not vendor neutral and there exists no assurance that files will remain accessible indefinitely.

  17. hAl says:

    [quote]If Microsoft would release these specs and make an assurance that an independent implementation of them by third-parties would not be acted upon legally then in one major respect the objections I have to the standard would disappear.[/quote]

    That is what Microsoft have already done.

  18. nksingh says:

    @Samuel Woods:

    Lotus Symphony is based on the OOo codebase, so it’s hard to say that it’s "independent."  

    Also, you should note that Microsoft has granted all the patents needed to implement the OOXML document set.  Furthermore, it should be mentioned that Microsoft does not regularly sue people over patents…  I’ve only been informed of one case of them suing a hardware manufacturer who was using unlicensed technology while its competitors had chosen to follow the law and bear the costs of licensing.  If you can find another lawsuit with Microsoft as the plaintiff, I’d love to know what it is.

  19. Dave S. says:

    nksingh – MIcrosoft rarely sues. The US military never does. They show up and the problem disappears. Same reason for both.

    Jesper Lund Stocholm  – "It is the ability to persist the possible settings in the previous formats in OOXML, and stating that it only concerns the legacy attributes really show the debth of your technical knowledgde on OOXML."

    What is the substantial difference between "legacy attributes" and "settings in the previous formats?" Also of note – homework is for children and research is for adults.

  20. Dave S. says:

    @nksingh – Of course MIcrosoft rarely sues. A company with a legal department backed with 10s of billions of dollars doesn’t have to sue, do they?

    @Jesper Lund Stocholm  – "It is the ability to persist the possible settings in the previous formats in OOXML, and stating that it only concerns the legacy attributes really show the debth of your technical knowledgde on OOXML."

    What is the substantial difference between "legacy attributes" and "settings in the previous formats?"

    @hAL – Have you gotten copies of any of the legacy specs? Also, why didn’t MS seek ECMA or ISO approval for the legacy specs under the same claim as now – that the specs represent Billions of very valuable documents and should be preserved as-is forever?

    Doesn’t implementation only contrast with partially implemented? To claim as "implemented" anything that does not handle the entire spec is misleading. One might enumerate those features that are implemented, but so far there are no other products that have been independently verified to implement MSO-XML, just fragmentary or partial implementations.

  21. hAl says:

    @Dave S.

    I have not requested the binary format specification but I know of people who have requested them and have recieved those by post.

    Dave, the binary specs are likely not up to scratch for a standards and with billion of documents out there MS would certainly not go to a standardization proces which can change/alter the format and would only lead to extra conversions or to MS not supporting the official but altered standard version. Or would you know of a standards organization that would standardize the format as is (so not standardization proces).

    So standardizing the binary formats is not even a remote option. We only want one conversion and that is from MS binary to an open standard XML format. Nobody would beinterested in some extra conversion layer from MS binary to open standard binary.

    So having the format specs available for everybody in free use in any implementations is likely the next best thing

  22. Dave S. says:

    @hAL

    ECMA says their process "Offers a path which will minimise risk of changes to input specs" so not much risk of altering the format there.

    I’m sure ECMA will add a title page and some page numbers  – but publish the file spec as it is implemented within Microsoft.

    Microsoft should know the binary specs are usable as-is – how else could they ensure the MSO-XML standard they submitted could accurately represent those billions of documents if they had nothing to compare it with?

    There is also no requirement to update the binary standard. There are a number of standards that are this way – some still-used MIL specs originated in WWII and have not been changed, so there is no reason to change the old ones.

    What do you know of standardization anyway?

    The Unified National screw threads were standardized by one guy – who went from place to place and sold buyers of assembled products on the idea that having one thread standard was a great advantage to them. They agreed and then it was a standard.

    The assembly makers preferred their own non-interchangeable screw threads, but had to capitulate and make threaded fasteners that met the standard.

    I’d prefer to have converters from binary to MSO-XML available from sources other than MS. I’d also like confirmation the MS converter (MS-OFFICE) actually does what it claims based on  independent verification.