Is it jetlag?


I know Rob Weir has been traveling a lot these days lobbying against Open XML across the world, so when I saw this post yesterday I assumed it must be jetlag. I think he completely misunderstood some of the responses from Ecma related to the issue of harmonization, and has missed some significant developments in this area over the past year. I already posted earlier about my thoughts around harmonization, and work that is already under way in the German Standards Body (DIN) to help guide the way. As I said previously, it appears OASIS is already discussing with DIN taking a more direct role in the Working Group, as indicated by this discussion between Florian from Novell and Rob Weir.


Here is what Rob had to say though which had me confused:


Ecma rejected every single one of these requests. Ironically, their response was that harmonization was not necessary because there exist tools that will translate between OOXML and ODF. However, since these conversion tools are restricted in their fidelity because of the lack of these very features, Microsoft’s argument is rather weak.


On the question of harmonization, we are either moving toward it, or we are moving away. The Ecma response does not move us toward harmonization, but starts down the road toward further divergence.


But if you actually read the Ecma response, you’ll see that TC45’s position is actually quite the opposite. Harmonization is not as simple as just adding a few tags here and there. It’s going to be a lot of hard work, and the German Standard Body (DIN) is already working on the first step, which is to identify the differences. This isn’t something to take lightly.


Here is Ecma’s full response to this issue (emphasis added):


There are currently several XML-based document formats in use, each designed to address a different set of goals or requirements. These include ISO/IEC IS 26300 (ODF), China’s UOF, and ECMA-376 (DIS 29500 – Open XML). All these formats have numerous implementations in multiple tools and multiple platforms (Linux, Windows, Mac OS, hand-held devices).


The Ecma Response Document from the Fast Track 30-Day contradiction phase for DIS29500 addressed the question of harmonization by explaining the differences between the ODF and Open XML formats as follows:


“… one must recognize that creating a single “merged” format to address the user requirements of both ODF and OpenXML is a much more difficult goal—one that is hindered by fundamental obstacles comparable to what one might encounter while merging HTML and ODF or HTML and PDF. This is because of sheer difference of scope, feature and architecture. Ecma believes that one format cannot simultaneously meet the requirements that would come from the merge of the two formats and the stringent requirements of backward compatibility that drive the design of OpenXML.


First, while both formats share the high-level goal, to represent documents, presentations, and spreadsheets in XML, their low-level goals differ fundamentally. OpenXML is designed to represent the existing corpus of documents faithfully, even if that means preserving idiosyncrasies that one might not choose given the luxury of starting from a clean slate. In the ODF design, compatibility with and preservation of existing Office documents were not goals. Each set of goals is valuable; sacrificing either at the expense of the other may not be in the best interest of users.


Second, the resulting differences are not merely variances in scope that could be resolved by adding capabilities to one or the other. They are structural and architectural in nature. Where functionality overlaps, the corresponding elements nonetheless differ in precise meaning, usage, capabilities, options, and interaction with other elements. Even more importantly, the corresponding elements do not exist in isolation, but are components of whole document models, with different rules and constraints for such things as page/slide layout, flow, style inheritance, event processing, relative positioning, calculation order, formula dependencies, chart construction, graphic templates, animations, and so on. The resulting variations are not merely cosmetic. They compound to create qualitative disparities that, although perfectly acceptable for much of the user base, can be significant for organizations that require high fidelity in layout, content, or editability. Differences between the implicit page style model of ODF and the explicit page style model of OpenXML, differences in the models for splitting table cells, differences in the style information associated with spreadsheet cells, and differences in the full formula specification used in spreadsheets are only small examples of the hundreds of explicit design decisions that ensure the information included in the existing formats is represented faithfully in the OpenXML format.”


There are many translation tools already in existence that enable interoperability between different formats by providing useful translation capabilities between ODF, Open XML and UOF.


We note that the German national standards body, DIN, has a committee, NIA-01-34 (see http://www.fokus.fraunhofer.de/fokus/fokus/presse/meldungen_fokus/2007/05/DIN-E.pdf), that is preparing a Technical Report on the translation of documents between the IS 26300 and DIS 29500 formats. The members of NIA-01-34 include format experts from a number of countries, working together to define the numerous differences between these formats.


Ecma strongly supports any harmonization effort that enables better sharing of information and allows better translation between the formats in the following way: Ecma believes that the work of the DIN (NIA-01-34) committee is essential to any harmonization effort. The work of DIN (NIA-01-34) will enable the industry at large to understand the detailed differences between the formats. Based on this detailed understanding, the ODF and Open XML formats could be extended in the future in order to enable better sharing of information and allow future translations tools to provide even better translation and interoperability between the formats.


Harmonization would require functional changes to two International Standards and would fall under the JTC 1 procedures for new work within SC 34 and could be done in the future. Such work should not be done in this Fast-Track process and should not impede the adoption of DIS 29500.


 


So, as I said there are many approaches you could take towards harmonization. The key for any effort like this though is to first have a full understanding of the issues (in this case identifying the differences), and then you can start to design the solution. I hope that once Rob is done with his travels and anti-OpenXML lobbying (I hear the latest is a trip out to Asia to meet with some national bodies) he’s able to get up to speed on the DIN work and as the head of the ODF technical committee he joins in the work towards a better understanding of harmonization.


-Brian

Comments (26)

  1. Very eloquent post Brian.

    For those of us that enjoy reading your blog, this once again clears up the fog and confusion that is artificially created around these core issues.

    Miguel

  2. Fiery Spirited says:

    Like many times before you fail to address the real point of Rob Weir. I assume you are not permitted to actually discuss the real issues due orders from your superiors.

    The crucial question is in what ways it would be more simple to harmonize two incompatible standards instead of building a joint standard that solve all Microsofts problems.

    The answer is of course that there are no such reasons. The optimum solution is that Microsoft join Oasis and help make ODF 1.3 fully capable of capturing all legacy format behavior.

    Of course Microsoft might lack the capability to add ODF support to office on short notice, and might not want to switch before ODF 1.3 is done. On the other hand until you are done with this you could be using the current office implementation that is Ecma standard. You would get no cost to adjust Office 2007 to the OOXML specfication and would on your prefered time schedule migrate to an xml format that is future proof and not.

    Drop the OOXML attempt and join ODF and all involved, including Microsoft, will profit in the long term.

  3. jones206@hotmail.com says:

    Miguel,

    Thank you. 🙂

    ———–

    Fiery Sprited,

    I don’t think you get it fully. You seem to be in a camp that believes ODF is all you need and you can just add a few things on in addition.

    Please look back at the history here. http://blogs.msdn.com/brian_jones/archive/tags/History/default.aspx

    The two formats were developer in parallel (this is a fact). They had different design goals, and that’s why they are fundamentally different.

    Harmonization is hard to even define at this point let alone to act on. Merely joining the OASIS ODF committee wouldn’t solve the issue. This is why a standards group has been hard at work the past year trying to understand the differences. Once that work is complete we can go on to the next step of defining what harmonization would mean.

    -Brian

  4. Allen says:

    "First, while both formats share the high-level goal, to represent documents, presentations, and spreadsheets in XML, their low-level goals differ fundamentally. OpenXML is designed to represent the existing corpus of documents faithfully, even if that means preserving idiosyncrasies that one might not choose given the luxury of starting from a clean slate. In the ODF design, compatibility with and preservation of existing Office documents were not goals. Each set of goals is valuable; sacrificing either at the expense of the other may not be in the best interest of users."

    This quote indicates to me that there is not a good reason for the OOXML proposed standard.  If the underlying goal of OOXML is represent legacy documents faithfully, then a much better approach would be to provide an application that understands those formats, but can also use an accepted standard (ODF) for new documents, not propose a new standard that is heavily encumbered with idiosyncrasies of legacy formats without providing sufficient information to allow others to fully implement those idiosyncrasies.

    Of course, as has been mentioned repeatedly over the last few years, Microsoft has amply demonstrated that they have no interest in genuine interoperability because they refused to participate in the OASIS committee during the development of ODF.  The committee desired to provide as much compatibility with Microsoft Office and other applications as possible, but because Microsoft was not willing to participate, they were limited to including those features that third parties had successfully reverse engineered.

    No amount of rhetoric can cover up the fact that Microsoft’s primary goal in ramrodding OOXML through ISO is to confuse as many people as possible as to the true meaning of an open standard and to maintain the vendor lock-in with as many computer users for as long as possible.

  5. Francis says:

    Allen: a couple of points.

    1. Do you really think Microsoft could have influenced ODF such that it supported all of the features and functionality of Microsoft Office? There’s a lot of anti-Microsoft sentiment in the ODF camp. Any attempts to improve ODF would probably have been a) obstructed or b) regarded as a continuation of the "embrace, extend, extinguish" strategy. And don’t forget–ODF’s standard-bearer, StarOffice/OpenOffice, like the Java MS got into so much hot water with, is a Sun product.

    2. Who appointed OASIS alone the steward of The One True Format? No company, including Microsoft, has to solicit OASIS. It’s not a inescapable government agency like the IRS, nor is it, despite the rhetoric, a church of infallibles. It’s an NGO staffed by people many of whose employers have a direct financial stake in the success of ODF.

    3. If Microsoft’s true motive is to perpetuate vendor lock-in, why didn’t they stick with the old binary formats? XML formats, especially well-documented ones like OOXML, are a boon to developers. They are orders of magnitude easier to work with than binary formats like DOC, XLS, PPT. And this isn’t just spin. They’ve been a godsend for me personally. The transition to XML has meant I can rescue corrupted documents with a simple text editor. My data are no longer locked in a binary box.

  6. orcmid says:

    Nice one.  I just shook my head over Weir’s post and decided to just let it lie there.  At least he acknowledged that he is the chair of the ODF TC.  I would love it if he would talk about ODF issues and concerns and the roadmap.  They seem stalled.

    But this is why I am commenting here, as you probably could have guessed:

    "Ecma believes that the work of the DIN (NIA-01-34) committee is essential to any harmonization effort. The work … will enable the industry at large to understand the detailed differences between the formats. Based on this detailed understanding, the ODF and Open XML formats could be extended in the future in order to enable better sharing of information and allow future translations tools to provide even better translation and interoperability between the formats."

    Amen, brother.  Someone needs to wake up to the fact that ODF-OOXML translation is imperfect and if translation is imperfect it is probably because the overall models are incompatible.  

    Also, this is the best way we know to learn where the stumbling points are and what the prospects are for reconciling the document models.  

    This is a big step from the days when ODF-advocates were satisfied that they had enough of what Office provided that most users would successfully convert with no problem.  I still sense that there are those who think document formats (or at least ODF) are universal, and the DIN work is the most-responsible come-to-Jesus effort that I know of in achieving a little baptism of reality.  

    I also appreciate your use of "useful translation" in this part:

    "There are many translation tools already in existence that enable interoperability between different formats by providing useful translation capabilities between ODF, Open XML and UOF."

    Thanks. Well said.

  7. Fiery Spirited says:

    What does it matter that the formats was developed in parallell? The best possible interoperability would be if the formats are merged. Why invest time in getting a ISO label for a format that your current applications does not truely support if you really mean you want interoperability?

    Maybe you know something about the legacy office format that sun and the rest of the world does not know, but until we see hard evidence there is little reason to trust your hypotetical speculation about the ODF perhaps being hard to extend to suit office. If we look at hard facts Open Office compability with microsoft legacy formats are a well documented and Open Office use ODF…so you are up for some pretty serious work if you want to prove that ODF is not capable enough.

    As for it not being enough that you join the Oasis commity you are most certainly correct. Microsoft would of course need to put an effort in their participation. Just like when you designed the ooxml standard draft there is choice about open and closed standards.

    The truth is that Open Office and Symphony have compability with Microsoft formats as one of the prime goals so why would it be hard to gain support for making ODF more Microsoft friendly?  

  8. orcmid says:

    @Fiery Spirited, please don’t confuse OpenOffice and Symphony the software products with ODF the format.  The usual way OO.o maintains Microsoft Office fidelity is by saving back in the same format that is read.  Going to ODF and then coming back to an Office Format is not done with "pure" ODF, to the extent that it preserves round-trip fidelity with the original Microsoft Office document.

    On the other hand, they provide some useful ideas for implementers who want to support multiple formats.

    Meanwhile, I just ran into this post about the problems of harmonizing metadata systems:

    http://digital-scholarship.org/digitalkoans/2008/01/31/harmonization-of-metadata-standards/

    The conclusion (about narrow fields of application) is also the only places where "ontoligies" have been harmonized, and that is still difficult.  

    We already know how difficult it is to obtain round-trip fidelity (a good test) between natural languages, and we are finding that digital formats, sometimes even trivial ones, are problematic.

    I am happy that we are in the process of developing some important practical experience in this area.  It matters for the future.

  9. dmahugh says:

    The ODF TC hasn’t even been able to finish adding some of ODF’s original omissions (e.g., formulas) in the last two years.  So do you really believe that it would be possible for them to add all of the functionality of Open XML to ODF quickly and easily?  In other words, the people involved would move more quickly on that project than they’re currently moving on their own goals, which have only produced two incomplete and non-standardized variations in the last two years?  I’m having a real hard time imagining that>

  10. Ian Easson says:

    Fiery Spirited writes:

    "The best possible interoperability would be if the formats are merged."

    Ok, so what is the first step in accomplishing that?  Well, it is to understand and document the differences in detail.  OOOPS!! That’s the step that the DIN committee is doing, and that you are opposed to!

    Go back to square one.

  11. Jemm says:

    Why settle with only these two standards when we could merge all the file formats into one, huge entity!

    World would be much simpler without confusing jpg’s, mp3’s, odt’s, docx’s, wmv’s etc. Just one file format that all the applications support.

    Just add few tags and that should do it. 😉

  12. Fiery Spirited says:

    Ian: You are mistaken…I am in no way opposing work of the DIN committee to learn how harmonize the formats. Their findings will allow Oasis to fill any gaps found in the ODF standard.

    The reason why I think ODF will finish first are that the problems encountered by the odf – ooxml translation project suggest that ooxml is the one that is not enough feature rich to handle the other. Ooxml is also a "flat" format, something that makes it harder to transform it into something different.  

    Still…the important fact here is that DIN have been working for a year on mapping the differences even while dis29500 is not a ISO standard yet. This show without doubt that harmonization is possible without confusing the common people by having two ISO standards for the same thing.

    Going back to square one is all about cutting losses. Microsoft are on the doorstep to spend insane money on adjusting for the changes from the ballout meeting…why not say enough is enough and put the money to better use?

  13. Christian says:

    I hate it when everybody and his dog comes along and says "Just merge it" as if these structural differences would not exist and if this were a project that could be completed in less than 5 years!

    But still I dislike your position like this:

    "is why a standards group has been hard at work the past year trying to understand the differences. Once that work is complete we can go on to the next step of defining what harmonization would mean."

    So what does this mean? What happens if these differences are known? (And I don’t believe that they are not known at the moment!)

    Are there really ANY plans to invest billions of dollars into yet a third file format that combines ODF and OOXML?

    I think that it will be clear what the outcome of this group is: The file formats are too differnet to merge, etc.

    And of course MS will not merge, because then the performance drops down and/or Office’s internal data structures have to be completely rebuild or Office has to be rebuild completely which would be its doom.

    I think that additional features in OOXML would always mean that the Office team has to include new features in Office.

    e.g. if the page borders are now bitmaps, then office has to add a dialogue to select the bitmap. But Microsoft is always on a very tight set of allowed features, just as if a feature would cost them enourmous amounts of money 😉

    I really wonder if Office 2009 will contain many new features (and with features I don’t mean that stuff from this century like Sharepoint integration and new XML embedding functionality, but I mean the kind of new features like in 1997: Wordart, new layout possibilities, DTP-stuff like allow text to float around graphics, initials, new page borders, nested tables in Excel and PowerPoint, more than 63 colums of text in word, …) that are created JUST for the purpose of changing the OOXML spec or for interop with ODF?

    I’m absolutely sure that in order to resolve these converter-problems, BOTH Open Office AND MS Office have to be extended to get new features.

    Does Microsoft have a budget to add features to Office that no user actually needs? Like custom page borders?

    (And it really stinks that Microsoft does not add stuff any more for fun. Office has not too many features, but too less 😉  )

    Sorry for rambling!

  14. Dave S. says:

    If Microsoft had offered only 42 billion dollars for Yahoo! then that would have left 2 billion dollars to work on interoperability.

    Too bad one can’t buy much for 2 billion dollars these days. Maybe Redmond costs are too high. I hear India is much less expensive. Has Steve done much traveling lately or been taking evening language classes?

  15. Dave S. says:

    @jemm – what you are referring to is MS Office, which reads most of these formats and creates new ones to represent them internally.

  16. oliver says:

    I love the way that proposals to "harmonize" these two formats usually begin with (and I parraphrase) –

    “Microsoft, we would like to invite you to abandon the output of seven years of engineering effort that absolutely meets the needs of your customers and invite you to come over here to get involved with this thing that clearly doesn’t…”

    The conversation then goes on to dismiss a whole range of formats that have much wider adoption than ODF.

    The work with DIN makes a ton of sense.

  17. Aidan Thornton says:

    "The ODF TC hasn’t even been able to finish adding some of ODF’s original omissions (e.g., formulas) in the last two years.  So do you really believe that it would be possible for them to add all of the functionality of Open XML to ODF quickly and easily?  In other words, the people involved would move more quickly on that project than they’re currently moving on their own goals, which have only produced two incomplete and non-standardized variations in the last two years?  I’m having a real hard time imagining that>"

    Formulas are hairy – really hairy. There’s all sorts of nasty legacy quirks, like functions that are plain wrong from a mathematical point of view – they don’t compute what they say they do. While backwards compatibility is needed, just (say) documenting what Excel does now into a standard – which is what Microsoft is doing – isn’t really enough. (I’m not sure Microsoft even did a through enough job of that, either.)

  18. Ian Easson says:

    Christian wrote:

    "I’m absolutely sure that in order to resolve these converter-problems, BOTH Open Office AND MS Office have to be extended to get new features."

    Yes.  I saw the list of "missing" features for MS Office somewhere on the web.  It was about a dozen quite small features (e.g., make the number of lines in orphans and windows user-specifiable.)  MS could easily slit these features into the next release of Office, but why should they?  Their customers aren’t asking for them!

    On the other hand, OpenOffice has to add features in two steps:

    – Add or change features to be compliant with ODF 1.0.  (Note: that is not a typo.)  The list is on the web.  I recall that it numbered about 300+ features.

    – Then add features to make it OOXML compliant.  Since it is about 10 years behind OOXML, I couldn’t even guess the number of features required, but think in terms of $1B+ of effort.

    There’s no way in the world that OpenOffice would even attempt it.

    Even if they had the money, they wouldn’t do it.  Why?  Despite what OpenOffice people say, it’s aimed at a *different* market segment than MS Office.  It competes with MS Works, not MS Office.

  19. A says:

    Christian:

    Just one little disagreement.  Office wouldn’t actually need to make the featurea available through the UI.  For example, I have just discovered it is possible to make a frame float on top of text in Word.  But you can’t do that through the UI which only allows around and none wrapping.  You have to modify the file itself.  Of course, that doesn’t mean that they don’t still have to implement the underlying functionality!

    But in agreement with you, if it was so easy for both applications to show each other’s formats, OpenOffice would already the binary format perfectly (contrary to popular belief, it was documented up until Office97)!  But it doesn’t work like that.  There are a lot of things that are "default" behaviours of a feature, that are not stored in the file at all.  Like when splitting a table across two pages.  If a single cell in the row being split across the page overflows all its text, then all cells in that row will go to the next page too, even though some of the text may fit on the first page.  That effect is not stored in the file, because it isn’t something that gets turned on and off.  Now maybe OpenOffice already does this the same way as Word, maybe not.  There is no reason for it to do so though.

    Or say we compare with another text displaying format, like PDF.  Now, you could say that PDF is just a bunch of images and text, just like OOXML and ODF.  But PDF stores the position of each character, so you know exactly where it goes.  This makes it easier to line and page break of course.  However, this ability would be really hard to add to an application that was based on the concept of free-flowing text.  Its just a completely different underlying concept.  Of course ODF and OOXML are much more similar to each other than they are to PDF, so this is an extreme example, but you’d be suprised how small a nuance is required to make harmonizing two formats very hard.

    So though I think there is a possibility of harmonizing the two, there is no way it would have happened in time for MS to release their product.  I agree completely with those who point out that if OASIS can’t even put enough effort into documenting their own ODF formulas in 2 years, why do you think they’d jump eagerly to add the new OOXML required features in a timely manner?  Especially when it is in their best interest to delay and obstruct?

  20. Miguel de Icaza says:

    I noticed that Rob has yet to address or correct the issues pointed out in this post.

    Miguel

  21. Talking about harmonisation and interoperability: I just did some tests on my own to find out how interoperable Lotus Symphony and OpenOffice.org are. They both use ODF as the standard file format.

    First I created a presentation in Lotus Symphony (Beta 4). It had one title slide with text, one slide with a table, one slide with an animated drawing object and one with an animated text box. I only used standard animations and I also applied one of the standard slide transitions.

    Then I opened the presentation in OO.org 2.3. All text and colours etc. were preserved 100%. The table from Lotus Symphony was converted to an invisible OLE-object. When I double clicked the object it was opened as a spreadsheet inside the slide. I had to manually resize the object and re-format the text and cells to make look like the original table.

    The drawing object was preserved but the animation was gone. No sign of it at all. The same happened to the text box: it was there but no animation. The slide transition was preserved but it was interepreted differently in the two applications. No big deal but at bit annoying.

    Then I created a similar presentation in OO.org 2.3. OO.org does not support tables in presentations so I had to insert a spreadsheet object to illustrate the table.

    When I opened the presentation in Lotus Symphony, all colours and formatting seemed to be preserved correctly. The spreadsheet object behaved the same way. The animation on the drawing object was preserved, and the slide transitions looked the same. Good!

    But what about the animated text box? I could see and edit the text box in slide view but when I ran the presentation there was no sign of it. It never showed up. Not so good.

    IBM (Lotus Symphony) and Sun (OpenOffice.org) are the main architects behind ODF. The both give away Office suites that use ODF as its default formats. Still, they are not able to make the applications interoperable with simple things as tables and animations. This just shows that interoperability is not an easy issue.

    Btw, I also posted this on Rob Weirs blog but as expected it was blocked.

    The files I created are available on my blog.

  22. Ian Easson says:

    Frederik,

    It is always appreciated when someone brings a sense of reality into these discussions.  Too often, some commentators say things that show they have little experience in document formats, programming, or standards development.

    Thanks for the example.  Where is your blog?

  23. Thank you very much, Ian. I’m just so tired of reading the same things repeated over and over again, without anyone checking the reality of all the different claims (both for and against OOXML and ODF).

    My blog is at http://fenilsen.wordpress.com but it’s only in Norwegian for the moment. I’m considering writing in english too. you can find the presentations here:

    http://fenilsen.files.wordpress.com/2008/02/presentasjon-oo.odt

    http://fenilsen.files.wordpress.com/2008/02/presentasjon-ls.odt

    The first link is the one made in OpenOffice.org and the second is Lotus Symphony. You can rename them to .odp but it’s not necessary.

  24. S says:

    "there is no way it would have happened in time for MS to release their product. "

    You’ve got to be kidding me, they rushed Office 2007 out the door just to do damage control.

    What they did was to INTENTIONALLY AVOID supporting ODF natively. It would however been a perfect test case, and a very valid demonstration of open engineering.

    What else did you expect from Microsoft? I hope you realize the Office cash cow brings more money than the Windows cash cow and that any serious dent in it could severely damage Microsoft health.

    Don’t forget this anytime you suggest "Microsoft good intentions".

  25. I’m heading home from Norway in the morning, but wanted to give a quick update on the progress made over

  26. Alex Brown’s post "ISO committee takes full control of OOXML" is the first report I’ve seen from the