They’re bringing out the big guns


Anyone else been following the latest blog posts from IBM and Sun discussing the Office Open XML formats? It looks like they’re stepping up their push to try make ODF the only choice in file formats. I read Tim Bray’s post yesterday, but there have actually been a number of other posts folks have pointed out to me as well. Everyone knows that Sun and IBM have a lot riding on ODF financially (they’re large corporations, not philanthropies <g/>). It’s clear that their plan is to somehow convince governments into mandating just ODF and remove any choice between the two formats.


Thankfully, what you’re actually seeing in most places is that governments are asking for ‘open formats’ in general, not just ODF (contrary to what is usually written in the headlines). Most of those governments understand that Office Open XML is on the verge of becoming an international standard as well and it serves a very important purpose that ODF doesn’t. This has raised the alarm bells for IBM and Sun though, and that’s why we see the latest smear campaign kicking into gear. It could be that this is more innocent and that instead there is just a lack of technical knowledge. Based on the strong reputations of the folks involved in this campaign though it seems more malicious. I’m saying this after reading their claims that the spec is too complex and therefore not interoperable, which is just ridiculous. Too much information? Every developer I’ve talked to (even those working for companies that compete directly with Microsoft) is extremely grateful for the amount of information the spec has provided. Look at the 600 developers up on the openxmldeveloper.org site building all kinds of powerful solutions across a number of different platforms (Linux; Mac; Windows).


I think it’s pretty ignorant for folks to call this effort a rubber stamp. Talk to the people from Apple, Novell, the British Library, the Library of Congress, Intel, BP, StatOil, Toshiba, Essilor, NextPage, and Microsoft who spent over 200 hours in group discussions around the formats. Look at the results of all the hours that went on in the smaller groups tasked with solving particular problems or those working on the actual documentation that had to go on between the weekly group meetings. The schemas themselves changed significantly and the spec went from 2000 to 6000 pages. Rubber stamp? You must be joking. <g/>


Another thing I’ve seen from an IBM employee is that he’s trying to get more technical by examining the Office Open XML standard looking for minor nits and then attempting to turn them into big issues. That’s fine and everyone is entitled to their own opinion. It’s kind of funny though that many of the issues he raises are even worse in the ODF spec.


Why would IBM and Sun push for a more limited format?


There is this false claim from some high profile IBM and Sun employees that the Office Open XML spec is not interoperable because it’s too big. These statements really help to paint a picture of their strategic interest in ODF. What’s the easiest way to compete with another product that has a richer set of features? Get governments to mandate a file format that doesn’t support that richer set of features. This way, if the other product (Microsoft Office in this case) has to use the format that was designed for your product, you’ve just brought them down to your level. It’s a brilliant approach, and that shows why there are IBM vice-presidents flying around talking to governments about the need to mandate ODF. It also shows why they want to discredit the Office Open XML format… IBM and Sun feel they have a lot to lose if Office Open XML is standardized, and that’s why they’ve been fighting so strongly in opposition.


Now, contrast that with the Microsoft position, where we’ve never opposed ODF. We didn’t plan on supporting it, but we had no problem with other people using it. The only opposition we’ve ever had is to policies mandating ODF and blocking Office Open XML. We want choice; IBM and Sun on the other hand absolutely want to block choice. The spin they try to put on this is that by blocking choice in formats they are providing freedom to choose your application… what they don’t way though is that we’re doing that to an even greater degree. We’re sponsoring a free open source project for translating between the two formats, which gives everyone the freedom to choose both the application and the format. Microsoft’s view has been that open formats are really important and there is nothing wrong with both ODF and Open XML. IBM and Sun on the other hand want one specific open format (ODF), and that’s it.


Now, if you look at it technically, there is no reason to complain about the size of the spec unless you are trying to limit the features supported by the spec. There are plenty of large specifications out there (look at the Java spec) that are completely interoperable. As an implementer of the Office Open XML specification, you are free to decide what pieces you want to implement.


Let’s think about this complaint though that the specification is too large. What are the ways in which you could fix that:    



  1. Less documentation and explanation??? – I can’t imagine anyone wanting this. Remember, the standard isn’t a novel you’re supposed to read end to end. It’s a detailed description of every piece of the Office Open XML file formats and how it all works. More documentation is an important thing in this case.

  2. Less features??? – Who gains from this? Any implementer has the freedom to pick which part of the spec they want to support. Only applications who want to compete by bringing everyone down to their level would actually want features removed.

There are a lot of features between the three main schemas (WordprocessingML, PresentationML, and SpreadsheetML), and as a result the file format is very large. The ODF spec most likely would have been bigger if they had done a more thorough job documenting it, but even then it still doesn’t compare in terms of functionality. One of the other justifications I’ve heard for the ODF spec being so much smaller is that it reuses other standards. That may account for some, but it still doesn’t get you all the way (not even close).


We also looked at reusing other standards where it makes sense (Dublin core, ZIP, XML), but there are plenty of places where that didn’t make sense (MathML). Take the example of MathML. It wasn’t specifically designed for representing math in a wordprocessing document, but instead math in general. It’s a good spec, and it does do a decent job in a wordprocessing document, but it’s not able to handle everything that our customers would expect. It doesn’t allow for the rich type of formatting and edit history that most customers of a wordprocessing application would want (see Murray’s post for more details). Even more interesting though, to date there aren’t any ODF wordprocessing applications out there that even support all of MathML. I think that Office 2007 actually has better MathML support with our import/export funcationlity. Another example given is the use of XSL-FO. It’s a nice spec to reuse, but it doesn’t fully define how international numbering should be done, so as a result OpenOffice has already extended the format in their own proprietary way.


XML itself has only been a standard for about 8 years. For one to assume that all the great thinking and tough problems in the Office document space have already been handled since then is ridiculous.


-Brian

Comments (45)

  1. It is a bit depressing to see you join the ranks of those who want to simply slam the other guy without listening to their point of view.  Technical criticisms of a standard are not something to be railed against, and it should be beneath you to make broad statements about how other people want "less documentation" and "less features" (should be fewer, by the way) to try to change the subject.  While there have certainly been attacks on Open XML that are arguably biased, such as the criticism for not using MathML, there are others that are completely technically justified.  You should welcome those critiques.  Those are what make a standard stronger.

    For example, putting into the standard the bug that happens to exist in Excel about leap years was just plain stupid.  It both gives a clear case for those criticising the standard, which is bad PR, and forces a standard to conform to a product, when it would have been very easy for the product to adjust slightly to handle the bug itself.  That sort of thing should be weeded out.

    The other technical concern raised is about bitwise operations in XML.  Again, this is a mistake in XML, where representation and manipulation are supposed to be easy with all the available languages and tools.  It would not have been hard to make the standard spit out separate values for each bit, but a lazy decision was made.  Is this terrible?  No, lazy decisions happen all the time, and there are plenty of lazy decisions in ODF.

    But how about stepping back from the instant desire to fight, and welcome those sorts of criticisms.  If you really want a strong house, you don’t yell at the people who point out a bad beam, you thank them.  Part of the value of open standards, any open standards, is that a lot of eyes look at them.  You are a bit too used to proprietary systems, where  the Q.A. process is private and you try to put a positive face on in public.  Open standards are not like that.  You want all those eyes looking for weaknesses.  Maybe it is uncomfortable right now, but ten years down the road, your standard and your software will be better because of those eyes.  It doesn’t even matter if they have a negative reason for raising the objections – you can deflate their ulterior motives by inviting the objections and handling them honestly and openly.  It will take a bit of getting used to.  I know, I write proprietary systems myself, and I wouldn’t welcome the scrutiny, but for open standards to get stronger, the scrutiny is necessary and beneficial, no matter what the goal of those making the objections.  Remember, you don’t have to embrace open source software to embrace open standards.  It will mean a bit more work to keep up with, but Microsoft has a huge, huge lead with regards to Open XML anyway.  Do you think eliminating the bugs and making the bitwise values and such changes are going to open the doors for competitors?  Hardly.  What they will do is make your standard ready for the next ten years and not just this year.  Microsoft Office will change over time, and a more solid foundation for that change will benefit you more than anybody.

    Besides, do you think it makes people trust you MORE when you are defensive?

  2. AC says:

    Ok, raise of hands, who else finds these posts tremendously hypocritical?

  3. jones206@hotmail.com says:

    Ben,

    I have no problem at all with people questioning design decisions in the spec. How did you get that impression?

    This post was all about the more general comments coming from Sun and IBM. I was addressing those general comments around “rubber stamping” and the claim that somehow the size of the spec made it less interoperable. Both are ridiculous claims and I was commenting on that. You seem have skipped over that though and gone directly to the specific technical issues that folks have raised. That’s a different topic, and I think it’s definitely worth drilling into separately. As you know though, any spec will have areas where people disagree on design decisions. That’s unavoidable.

    I’ll actually pull together a post on the date issue because it’s been raised a number of times. You may disagree with the decision, but it’s definitely not a “stupid” decision. There are a number of other applications out there that also maintain similar behaviors for compatibility reasons. Sure, you could say that it should be persisted in the format one way, and then when parsed into the application it’s then mapped back to the old way. What does this do for spreadsheet functions though? We want the functions to be 100% interoperable (that’s why we have the hundreds of pages on functions as well as settings on numerical precision, etc.). You would need to also modify those functions every time on open and save, and that could be extremely error prone. That’s just not worth it. It’s better to maintain the compatibility and just document how it works. Again though let’s treat that as a different topic.

    AC, how many hands so far? 🙂

    What posts are you referencing? Did you mean just this post, or are there others? Or were you talking about the posts I was commenting on? If you primarily meant my post, do you mean that as a much more generic “prior Microsoft behaviors” type of statement (as in Microsoft is one entity that has behaved one way or another in the past and therefore it’s hypocritical for someone from Microsoft to point out poor behavior in other companies)? Or do you mean I’m actually being hypocritical based on other comments I’ve made or positions I’ve taken?

    -Brian

  4. Wouter Schut says:

    Which 600 developers? Do you mean the 583 forum members of OpenXmlDeveloper.org? Because then you are even counting me! And I am certainly not doing any openxml development. Ooh, and it seems they deleted all my comments there… strange.

    And why exactly don’t you link to the blog posts of IBM and Sun? It’s always handy if readers can vaildate your claims or come to their own conclusions.

    You just think that you are the good guy, but in fact you are contributing to more ms-evil. Microsoft is only open when it absolutely has to, just look at all the other formats and protocols which are still closed.

    Don’t you think the developers on Messenger, Kerberos, Posix thought they where working on an open system?

  5. RequiredName says:

    You cannot compare ODF and MS OpenXML. It is like comparing apples and oranges. Both are a fruit (XML based format) but that is about it.

    ODF is a vendor and product independent specification. MS OpenXML is a XML version of the legacy Microsoft Office file format. Governments do not standardize on products or vendors. And they shouldn’t.

    The ODF specification covers the domain of office documents and Microsoft is supporting the development of a ODF-plugin. So what is the real problem with standardising on ODF?

    The real problem is that MS Office 2007 users cannot select ODF as their default file format and are forced to a cumbersome workflow involving MS OpenXML.

  6. Brian, we have to distinguish two different sets of assumptions you implied in your post:

    1. Who should make the choices?

    2. What happens once the choice is made?

    You work for Microsoft. Do you have the choice to use the Email-client you want? The browser you want? The PIM application you want? The operating system you want? The hardware you want? I’m sure they have IT policies at Microsoft. An IT policy does one thing and one thing only: limiting choice of the individual user. That is regarded as a good thing, since most of the time it is not their choice in the first place. It limits support headaches and provides a well-known infrastructure which can be used system requirement for custom-built applications.

    Governments want that kind of stuff too, you know. They have to interchange documents and data between a lot of different agencies and administrations. They want uniformity, less support headaches and no interoperability problems (try out the MS Office ODF plugin and you’ll know what I mean). Governments HAVE choice. Open XML was one of the choices. All you do now is bitching that they made a choice. If it’s not their decision, then who should chose? The CIO of an agency? Why should he limit choice? Should the individual worker choose the document format? Your post just doesn’t make sense.

    They have choice. Choice between the two formats, and some actually chose. That’s the point of having choices: making decisions. If you choose MS Exchange, you better be prepared to use Outlook too if you want to have access to all the features. Your decisions limit choice, that’s the whole point of them. The important part is having choice BEFORE making the decision, not after it. You should know, and that’s why your post seems malicious.

    – Stephan

    P.S.: Every developer knows a needlessly (!) complex specification hampers adoption. How many feature-complete, production-ready Java implementations do you see flying around? Caveat: no licensed SUN code should be used, so the likes of IBM etc. are excluded! THERE is your problem with big specifications. You have a lot of Java code in the wild, but just one main provider of the crucial Runtime Environment. Come on Brian, there are developers reading your blog. You think we are stupid?

  7. Brian, you cite compatibility reasons for the design choices you made (Excel bug in the specification, bit fields in XML (WTF??)).

    Of course Microsoft does not have a lot of experience in designing universal standards that promote interoperability, so let me give you some free advice:

    If you design a specification after a product, it should never even have been submitted to a standards body. Now I understand the problems with backwards compatibility. But frankly, if you are trying to create an international standard for document exchange, you have to balance between a single product and what makes sense for, you know, the world of software developers. Especially bit fields in XML are so (sorry) stupid and senseless that it makes this whole Ecma standardization look like a farce. You talk about all those working groups and nobody caught the fact that NONE of the XML tools out there can do anything with that data, that it will be just a string or number to them? Like a mini-blob? Or maybe they couldn’t do anything about it?

    The ODF people atleast admit they have work to do, and they are prepared to change significant parts of their specification if necessary. And you come here, attack them for reasonable criticizm and feedback and defend the most ridiculous design choices ever made in the history of XML file formats? Honestly, how could you? I could never imagine myself defending that kind of crap in ODF (although admittedly, I’m just a user and not a developer of ODF).

    – Stephan

  8. Yawar says:

    Here is Tim Bray’s blog post talking about Office Open XML: http://www.tbray.org/ongoing/When/200x/2006/10/16/OOXML-Hoo-Hah. It contains links to some other posts which I think Brian mentions.

  9. n4cer says:

    Why is the leap year issue being characterized as an Excel bug when it has been pointed out previously that the behavior originated with Lotus 1-2-3, and Excel (and subsequently other products) copied it for compatibility?

  10. Brian, I know this is hard to understand, but again:

    No one trusts Microsoft.

    Say that over and over again. Because it’s true. Microsoft spent over a decade pushing people, IT managers esp. into that stance. Hell, you can do quite well just by saying "We’re not Microsoft".

    And as of yet, Office 2007 is still a future product. No one can buy it, install it, and run it. And as we’ve been taught, when it comes to Microsoft, if you can’t buy it, it’s not real, and has a good chance of never BEING real.

    So if it bothers you that people treat Microsoft the way they do, well, Microsoft taught us. You cannot, simply can-NOT expect that to change overnight.

    It took Microsoft a decade to earn the crap they’re getting now. It’s gonna take more than "No really, we’ve changed, trust us" to actually change that.

  11. Francis says:

    I think you have done a great work. OpenXML is a huge improvement!

    That said, I understand why IBM and Sun are behaving the way they are. It’s because they have to. They have next to zero market share, not to mention being years behind in terms of Office-like technology. ODF will not go anywhere unless they fight tooth and nail.

    I also understand Microsoft’s largesse in going through all the standardization rigamarole. It’s because it can. OpenXML will become the worldwide standard, even without ECMA, because Office is such a fixture.

    Re the 1904 dates in Excel, maple muffins in Word, VML, etc.: perhaps these functions should have been implemented in a legacy annex to the standard. That way they would be on a track to be phased out in future revisions, and developers could plan and program accordingly.

  12. hAl says:

    The currently raised issues by IBM and Sun on the OOXML format are relativly minor issues.

    It already surprised me that they used these issues to attack OOXML.

    It seems like they might indeed be looking toward a lobbying offensive amongst the ISO voting members as it is unlikely that the Ecma industry representatifs will say no to the standard.

    However ISO standards are decided upon by politicians and those are much easier to influence by raising minor issues to big issues. Especially with ISO have a strong euro base where MS sceptesism is already relativly high.

    I asume therefor that the next half a year or so will see quite a lot more slamming of OOXML by IBM and Sun.

    These minor issues can only be a start. They hardly have relevance in the big picture of fileformats.

    This must also be the reason for moving up the timetable for the OpenFormula standard that is to be added to the ODF standard by almost a year. I guess it is suddenly nescesary that this new addition to ODF will be added to the ISO standard before ISO decides on OOXML else OOXML will certainly feature spreadsheet stuff that ODF ISO currently can’t match at all.

  13. Brian,

    I get the impression I get from your own words.  For example, you say in the post above:

    "I’m saying this after reading their claims that the spec is too complex and therefore not interoperable, which is just ridiculous. Too much information?  Every developer I’ve talked to (even those working for companies that compete directly with Microsoft) is extremely grateful for the amount of information the spec has provided."

    The complaint is not that there is too much information, which you neatly turn it into.  That would be laughable, and you go on to laugh at it.  But it isn’t the complaint.  The complaint is that it is too complex, meaning in some cases that it is too specific to a particular implementation.  For example, the specified page borders are made part of the standard rather than left as a general page border format with the implementation left as an instance that Microsoft Word happens to implement.  In some cases it is too complex because it assumes functionality not readily available to XMP parsing languages, such as the bitwise operations.  That is not a case of "too much information", which I am certainly not faulting Open XML for, but a case of "too much complexity", which I would fault Open XML for.

    In another place, you say:

    "Now, if you look at it technically, there is no reason to complain about the size of the spec unless you are trying to limit the features supported by the spec."

    This is silly, and completely incorrect.  How about if we introduce a spec for integers that deals with a set of rules for integers between 0 and 100, and another set for those between 101 and 200 and so on.  the spec would be infinitely large, but would provide no more features.  It is quite possible, and frequently done in standards organizations, to reduce the size of a spec while increasing the features, by generalizing better and extracting instance data from format standards (see page borders argument above).

    Finally, you ask the question:

    "Let’s think about this complaint though that the specification is too large. What are the ways in which you could fix that:"

    and the only too ways you can think of are "1. Less documentation and explanation???" and "2. Less features???"   Might I suggest "3. Better generalization" as a rather obvious choice?

    I am curious about one statement you make.  You say "It’s kind of funny though that many of the issues he raises are even worse in the ODF spec." but fail to point out any examples.  As these would obviously be welcomed by both the ODF committee and those who choose to attack ODF, how about you share some specifics?

    Ben

  14. jones206@hotmail.com says:

    Wouter,

    I was just generally looking at the folks currently signed up on OpenXMLDeveloper (which would include you), but to be honest, that is just a small fraction of the actual group of developers. Did you know that there are in total about 1 million developers around the world who use Office in some way for their solutions. Of those, 30% of them use the XML functionality. So really, you could say there are 300,000 developers using Office XML (remember, we’ve been using XML in Office since Office 2000). of course many of them are still waiting on Office 2007 to ship before they really get going with the new file formats, but now that the Ecma draft is complete, you’ll see more and more folks starting up their projects.

    I can’t comment on Messenger, Kerberos, etc. but I don’t think you should focus on that. It doesn’t affect this file format in any way. The format is already owned by Ecma, so there isn’t anything Microsoft could do, even if we did decide to change our mind.

    ———

    RequiredName,

    I didn’t want to argue whether or not ODF is vendor specific but I will now say a couple things. It was started by Sun, and there was a fully developed implementation before being brought to OASIS. The initial group of people working on it in OASIS were primarily Sun employees (both Chair and Secretary were from Sun). More folks joined over time, and in the end there were a number of different things added (xforms, etc.), but the fundamental design didn’t change. No one challenged the spreadsheet architecture. The huge hole of not defining tables in presentations was a complete map from the lack of table functionality in the OpenOffice presentation application.

    We’ve never claimed that the Office Open XML formats was designed from the ground up as the ultimate office productivity application file format. It was designed for the main goal of opening the legacy Office binary formats to the world. compatibility with the features from the legacy binary formats has always been a key goal and we’ve never said differently.

    We’d always planned to fully document it, and make it free for all to use. Many governments wanted to see us also take it to a standards body so that the specification would be owned by a third party rather than by Microsoft. It wasn’t something we’d initially planned to do, but I’m really grateful that we did it. The amount of great feedback and hard work we got from the other members of the technical committee is what got us to where we are today. If it weren’t for them, I don’t think we could have built such a great specification.

    ———

    Stephan,

    Yes, compatibility was a key requirement. 99.9% of our customers don’t care about their file format, they just want everything to work. A small percentage actually do want to build solutions with the formats, and have the ability to read and write Office files outside of Office. For those reasons, we moved to an XML format. But there is absolutely no way we could have interfered with the experience of all the other customers. So that was a key design goal, but at the same time, our whole reason for moving to an XML format was to allow solution builders to both read and write the formats.

    I understand there are other potential goals around which a standard could be built. But you shouldn’t assume that the word "standard" means it was an attempt to make the most generic solution. The reason Office Open XML was taken to a standards body is that a number of the governments we talked to didn’t want the format and it’s documentation to be owned by Microsoft. They asked us to instead submit it to a standards body so that regardless of what happened to Microsoft in the future, the standard would always be available.

    ———

    n4cer,

    Yes, it is not an Excel bug but rather a Lotus 1-2-3 bug. I’m sure they still maintain the same compatibility that we do. It’s not as easy as just remapping all the date values, as you also have functions that will leverage those dates, and you also want the functions to be interoperable.

    ———

    John C. Welch,

    That’s not hard to understand, I live with it every day. It’s something I often refer to as the microsoft tax. No matter what good you may do, there will always be hardcore doubters or even worse folks that just refuse to believe no matter how obvious you make it.

    All I can do is just keep providing my time to this blog so I can help people understand the design of the file formats, and where we’re going next.

    ———

    Francis,

    Yes, it’s definitely clear why IBM and Sun are taking the approach they have. As I said, it’s really a brilliant way for them to quickly catch up.

    I like your suggestion around some of the legacy stuff. VML actually has been called out as a legacy approach in the spec. It says that implementers looking for a drawing format should instead try to make use of DrawingML. There are no requirements on implementers to implement the entire spec either. You could not support "maple muffins" for example and still be totally conformant.

    ———

    hAl,

    The specific examples being raised so far are definitely minor issues. They are made to sound bad, but if you look at the reasons for the decisions, or in other cases the obscurity of the feature itself, the issues are minor. I will attempt to dive into each one of these individually though, since it’s clear folks are interested.

    IBM and Sun have definitely hung their hat on the ISO certification and that’s where it looks like they will attempt to fight against the Office Open XML format. You would think people would be happy to have ISO own the stewardship of this format, as it’s one more step in the long term availability of the spec.

    ———

    Ben,

    Those are great points. I was referring to Bob Sutor’s post about the weight of the spec. There are other posters out there who are actually bringing up specific details, and I want to talk to those separately. We should drill into the "better generalization" idea, as there are examples where that might have been better but in those cases it wouldn’t have significantly affected the specification. In most of the important cases, the "better generalization" approach wouldn’t have worked. That’s a good topic for discussion though.

    You ask for an example of IBM criticisms of OpenXML when there is the same or worse in ODF. One such case was at the ODF conference in Lyon, where one of the knocks on Office Open XML raised by an IBM employee in his presentation was about the documentation on how passwords were hashed. Well, if you look at the final draft of the standard, in section 3.2.29 of Part 4 there is an attribute called “revisionsPassword” that fully describes how these passwords are hashed. If someone wants to, they can use a different approach, but there is a recommended way that will help with interoperability.

    Even more interesting though is that while that behavior is completely defined in Office Open XML, it’s not defined at all in ODF. The equivalent attribute in ODF is called “table:protection-key” as mentioned in Draft8 of ODF 1.1 under the “Protected Sections” section (pg. 73); “Protected” section (pg. 177); and in section 8.5.1 “Document Protection” (pg. 203);

    From the ODF spec:

    "A user can use the user interface to reset the protection flag, unless the section is further protected by a password. In this case, the user must know the password in order to reset the protection flag. The text:protection-key attribute specifies the password that protects the section. To avoid saving the password directly into the XML file, only a hash value of the password is stored."

    That’s it. Absolutely no mention at all about how the hash is created in ODF.

    -Brian

  15. Ian Easson says:

    RequiredName says that "ODF is a vendor and product independent specification."  

    This is disingenuous.  It is a actually a standard for rebranded versions of OpenOffice from Sun, heavily promoted by them and IBM (hence the latest disinformation campaign from them).  

    ODF was designed primarily to support the features in this product, not to provide interoperability, despite what its supporters claim.  If it wanted to really support interoperability, it would support full interoperability with the full features of the most common (99% market share) office software — which happens to be from Microsoft.  But you cannot "round trip" from MS Office to ODF file formats and back, not because of any MS limitations, but because of ODF limitations that may takes years to fix, by their own admission.  

    RequiredName doesn’t understand this, because he/she proposes dropping OpenXML in favor of ODF.  They are not interchangable.

    ODF does allow limited interoperability with other software because it is is a published standard that anyone can use to interoperate with OpenOffice clones or subsets of it.  But, even that degree of interoperability is limited, as well documented in this blog (look, e.g., at lack of spreadsheet formula interoperability).

    RequiredName also says that "MS OpenXML is a XML version of the legacy Microsoft Office file format. "  

    This is actually totally 100% incorrect.  It is not a reworking of an old file format at all; it bears no resemblance to it.  The old file format was a binary dump of internal data structures.  The ECMA (note: not MS) OpenXML standard-to-be is a completely new XML file format intended to provide an external representation of the data used by the MS Office application suite.  

    Until such time as OpenOffice has all the features of MS office and a future version of ODF supports all those features (and is thus equivalent to OpenXML), all that can be physically done is to provide a limited "save-as-ODF" capabilty for MS Office, that strips out information needed by those features of MS Office that are not supported by OpenOffice.  But this is precisely what MS is promoting.

  16. Stefan Wenig says:

    Ben,

    your comment reads a lot like a discussion we’ve had here some weeks ago. Generalization is a good thing, especially if you create a standard without having a large volume of existing data in mind. ODF did just that, and if were that easy, there would really not be any reason for Open XML to exist.

    But Open XML was created with millions of existing binary documents in mind (see–i’m caught in an endless loop too). Generalizing the format would lead to the same information loss as converting to ODF. The old formats have non-generalized semantics and parameters, the old apps have a non-generalized user interface etc. Changing this without travelling in time could prove quite difficult. It all goes to prove that for the goals MS is claiming to have for Open XML, what they did is just the way to go.

    You want a generalized format, clean and easy to implement, go for ODF. You want something that does not break what you have, you’re gonna have to accept the burden that previous versions bring to the format.

    Stefan

  17. Stefan,

    There is some truth to this, but there are certainly areas where generalization could help.  The Art Page Borders section is about sixty pages of specific borders.  Now, granted, existing documents use those by name, but couldn’t you generalize top a "named border" and have those specific borders be part of Microsoft’s implementation?  That would reduce the spec there by about 59 pages, and it wouldn’t stop Microsoft from supporting the existing documents.  I am not suggesting that the standard be completely rewritten, it is what it is, but it could use a little genaralization to separate the specific implementation from the general implementation.  Right now, it just feels like every place where choices are iterated through, the decision was made to leave in the set of choices Microsoft has already used, rather than say "Here a choice must be made and it should be named and kept as a separate resource file in the document.

    Ben

  18. Brian,

    Good example.  Things like hash formulas that are not identified should not be in a spec.  That is a case where more information would be good, and Microsoft has done well by exposing such information, no matter the size of the spec.

    On the other hand, the bug in Excel which was put there for compatibility with Lotus 1-2-3 absolutely illustrates my point about why it should be changed in the standard.  Lotus 1-2-3 isn’t much of a competitor anymore, is it?  Yet you still have this  bug in your code which complicates matters for the sake of compatibility with that legacy software.  Do you really want to continue that forward?  Why not focus on the billions of documents to come rather than just on the billions of documents in the past.  Should they all deal with this ancient bug just because once upon a time Lotus made a mistake?  Of course not.  Don’t make that part of the standard and perpetuate the madness.

    Ben

  19. John says:

    Brian,

    Hang in there.  I read Tim Bray’s post a few days ago and was glad to see you respond.   I’ve always found your posts (and responses to comments) to be thorough, level-headed, and (I’m sure this will shock some) genuine.   Keep up the good work.

    John

  20. orcmid says:

    I wasn’t sure what kind of lightning this post would attract when I saw it in my feed reader.  It’s gratifying to see the level-headedness here.

    Brian, that’s a good catch on the terms of the debate about "choice."  I think you are on solid ground.  On the other hand, the purpose of interchange/interface is to move the debate about choice to implementations (think GDI and display adapters) and not formats.  

    Having said that, I think some of the debate now is about different technical sensibilities, and we are close to a new version of language wars.  

    It is the Universal Document Elixir (http://orcmid.com/BlunderDome/clueless/2005/10/magical-thinking-and-universal.asp), which many believe in, that has it be OK to think that the one-format choice is sufficient and has already been made (i.e., ODF).  The only thing that can test that is reality. No amount of talk will help.  And it will still be a chancy thing.  What will teach us whether magical thinking, perception and ideology can overpower reality will be all of those conversion and translator face-offs that are going to be happening in 2007.  

    Whatever happens, in the end we will all be the wiser for it.  I also think the TC45 effort is commendable and admirable work.  It is a difficult road, and however it turns out I think Brian and the Microsoft team that committed to this course are to be admired for extending the tremendous effort involved, all with the support of the responsible management.

  21. Brian, thanks for your reply. Since my other post didn’t seem to make it (might well be my own fault, I’m on the road right now) I’m reposting it:

    We have to distinguish two different sets of assumptions you implied in your post:

    1. Who should be responsible for making choices?

    2. What happens once the choice is made?

    You work for Microsoft. Do you have the choice to use the email-client you want? The browser you want? The PIM application you want? The operating system you want? The hardware you want? I’m sure they have IT policies at Microsoft. An IT policy does one thing and one thing only: limiting choice of the individual user. That is regarded as a good thing, since most of the time it is not their choice in the first place. It limits support headaches and provides a well-known infrastructure on top of which custom applications can be built.

    Governments want that kind of stuff too, you know. They have to interchange documents and data between a lot of different agencies and administrations. They want uniformity, less support headaches and no interoperability problems (try out the MS Office ODF plugin and you’ll know what I mean). Governments HAVE choice. Open XML was one of the choices. All you do now is bitching that they made a choice. If it’s not their decision, then who should choose? The CIO of an agency? Why should he limit choice? Should the individual worker choose the document format? Your post just doesn’t make sense.

    They have choice. Choice between the two formats, and some governments actually chose. They considered Open XML and rejected it. That’s the point of having choices: making decisions. If you choose MS Exchange, you better be prepared to use Outlook too if you want to have access to all the features. Your decisions limit choice, that’s the whole point of them. The important part is having choice BEFORE making the decision, not after it.

    – Stephan

  22. marc says:

    >The specific examples being raised so far

    >are definitely minor issues.

    I dont think so

    XML defined in section 2.8.2.16 (page 759) of Volume 4 the OOXML  is a dump of the Windows SDK memory structure

    ( http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_5ppu.asp )  

    Regarding interoperability and same-level-playfield for implementors, this is definitely _not_ a minor issue

  23. jones206@hotmail.com says:

    Marc, first off I’m suprised you find that to be a major issue. Those attributes are used by a font to describe which code pages and Unicode subranges the font actually provides glyphs for. It’s primarily used in font substitution (ie you don’t have the font on your machine and so the consuming application looks for other information that can help it determine another font to use). If you have the font, then you can get all this information just by querying the font.

    It’s cool that you care so much about the fonts themselves though. Most times, you mainly just care what the name of the font is that is used, but this additional information can greatly help with cross platform collaboration where the fonts on your machine may not be on the persons machine that you send the file to. That’s where this element as well as the other ones like panose information, etc. come into play. I personally wouldn’t call the code page declarations to be anything major though (which is why I said it was a minor issue).

    The list of properties for a font that are defined by seperate elements are:

    1. altName

    2. charset

    3. embedBold

    4. embedBoldItalic

    5. embedItalic

    6. embedRegular

    7. family

    8 notTrueType

    9. panose1

    10. pitch

    11. sig (this is the one in question)

    While I wouldn’t call the storage of the sig attributes a major issue, it’s definitely something we can drill into more if you’re interested. Even though it’s a hex dump, the values are all completely defined, so while it would be somewhat difficult using XSLT and XPath to parse it, it would be pretty trivial using any other number of programming languages (and there are actually online examples of how to do it with XPath/XSLT as well).

    I guess in certain ways there may have been a better way to do this, but I’m not sure it would have really have been worth it. It’s a bummer this wasn’t brought up sooner though (via public comments or by joining the Ecma TC), because it’s certainly something we could have worked together on to see if it should be done differently. Is this something you came across, or was this based on Rob Wier’s post about bitmasks? 

    Again, if it’s really a big blocking problem I wish it was brought forward earlier. I don’t think that’s the case though. While one could argue that it could have been more xml-friendly, it certainly is fully documented and so I don’t think it’s a barrier to interoperability. The subranges for csb are all defined in the spec, and the subranges for usb are all done according to the ISO 10646 standard.

    -Brian

  24. hAl says:

    @marc

    [quote]Regarding interoperability and same-level-playfield for implementors, this is definitely _not_ a minor issue[/quote]

    Why ??? You give no support for your claims whatsoever.

    The most I should say about it is that is isn’t a very regular solution but it seems easy enough to implement. Not much harder than a solution that was worked out more in full XML.

    You are suggesting that it isn’t interoperable. can you explain ?

    The issue that Brian mentioned about the hash key in ODF  is an example of a problem for interoperability. Also a minor issue btw as it unlikely that people with come up with their own hashing methode other than those in OOo.

    It is easy to see that the specs of both formats contain imperfections. ODF is already working on a new version and once the current OOXML draft is approved I asume that MS and Ecma will be looking at some improvements for the future as well. It isn’t a shock that formats aren’t perfect in one go. especially not those as big as these.

  25. Stefan Wenig says:

    Ben,

    i see your point, but after following the discussion here for some time (and the awkward border thing was mentioned here before), i really think you picked out one of just a few places where an obvious improvement would have been possible. (and this one’s really hard to miss, being that long!)

    so yes, they could have reduced the specs there by 60 pages. probably some more here or there. but taking away a significant portion of those 6000 pages is a completely different story.

    Stefan

  26. marc says:

    brian, yes, my post was based on Rob Wier’s post about bitmasks

    hAl, i believe in "implementable" standards ( a.la HTML, PDF )

    i don’t "like" a ~6000 page specification with OS specific "features" ( bit-masks for code-pages ) and "hacks" ( leap-year issue ) inside.  This is my personal opinion

  27. Stefan,

    You may well be correct, and yes that is one example that fairly leaps out from the specs, even aside from the discussions here.

    I guess my one issue with both sides of this argument is similar to my issue in most highly polarized debates… both sides seem to be willing to take almost any extreme position to defend "their" side, and be willing to nitpick almost any minor issue on the other side, or worse, to make broad generalizations that don’t hold water.  Brian does it here, mostly with the defense of the almost indefensible (e.g., the leap year bug) and with broad generalizations about the inability of ODF to support multitudes of existing documents, which is almost certainly untrue given the huge preponderance of documents that use only a tiny portion of the features.  Those on the ODF side equally like to act as if there is nothing substantial that ODF can’t handle, which is silly, as it is a much less proven format which does not handle formulas at the complexity level necessary, among other things.  They also want to make broad generalizations about Microsoft’s intent and vague warnings about monopolies, etc. etc., and act as if open source, open standards, open anything is better than proprietary anything.  Also hogwash.

    What bothers me is that both sides tend to weaken their own strong arguments by refusing to budge on the weaker arguments.  For example, Brian Jones defends the MathML decision but, just as ardently, the leap year bug, thus weakening a very defensible position by supporting an indefensible one.  Acting as if every decision made by the Ecma TC committee was reasonable and right is not going to gain you much credibility.  On the other hand, Bob Sutor of IBM seems to want to take any chance to put down Open XML and the work being done on it, and act like any reasonable government, company, etc. would choice any open source standard over any proprietary standard because… well, just because.  This also loses credibility because there are plenty of different issues people have to deal with, and whether a standard is "open enough" is not the single litmus test.  Reasonable people will choose to focus on Open XML, and resonable people will choose to focus on ODF, and both sides should be wary of irritating and alienating those reasonable people. ODF is not the savior of the world, and Open XML is not the devil incarnate, but neither is it the other way around.  These are two formats with different strengths and weaknesses.  Each could learn something from the other, but not if everybody chooses to make it all out war.

    Ben

  28. Juan R. says:

    I wonder if IBM and Sun people are aware of this month Mozilla and WHATG people plans to try to introduce MathML into next HTML5. Of course, that MathML-into-HTML5 serialization will be not compatible with MathML into ODF.

    A HTML5 aware tool will generate <none> when the OpenOffice will wait <none/> and viceversa.

    I wonder if that is they understand by "interoperability of standards".

    Nice reading Brian

    P.S: Murray Sargent write in his <a href=’http://blogs.msdn.com/murrays/archive/2006/10/07/MathML-and-Ecma-Math-_2800_OMML_2900_-.aspx‘>blog</a>

    "OMML tags are always written with a math namespace prefix like "m:" and I recommend this convention for MathML as well. The reason is that these XMLs are useful in many contexts, not just in HTML(5) and using namespace prefixes allows the XML parser to delegate to the appropriate tag-set owner."

    The (Oct 14) reply from Roger B. Sidje to our pleas is that he will avoid the m: prefix promoting <math>…</math> in HTML5.

  29. hAl says:

    @marc

    So your big issue is that you do not like the standard.

    And the interoperability that you mentioned ?

    That wasn’t an issue at all I guess ? It just sounds interesting.

    Hmmm, and about implementable standards. Interesting examples you bring forth. HTML , PDF. Those you call implementable standards ??? Have you ever tried implementing PDF ?  Lucky you did not mention CSS as we are stil waiting for any application that manages to implement CSS 2.1 fully let alone 3.0

    I also do not see how the standards is more difficult to implement by the bitmask thing ? If you change the bitmask thing to specs in XML does it get less complicated to implement ?????

    It seems you read the Rob Weir blog but when reading his higly amusing blog you should also read between the lines.

    I am still waiting for Rob to point out some more serious issues with OOXML as he seems to have an army of people digging the spec trough for him or he must spend half his life in bed with the OOXML spec. 😉

  30. marc says:

    hAl , i consider CSS a poor standard, the layout of objects in a page is not rocket science !! why did W3C complicated the thing so much ?? ( disgression )

    ghostscript ( and other PDF viewers with distinct code bases ) gives me near 100% fidelity in PDF

  31. Stefan Wenig says:

    Ben,

    while you’re doing a good job being the voice of reason here, I don’t agree 100%.

    First, I understand Brians position completely. Wherever there are disputable things in their standard, they probably have thought about alternatives too (like converting datetime values for the 1-2-3 bug), but decided on what finally made the draft–probably for a reason. So when confronted, they’ll naturally defend their decision. I’d do the same. He promised to provide a more detailled explanation later, and while I don’t quite believe it, I do know that obvious easy solutinos don’t always work. Let’s hear him.

    Reasonable people will use ODF, other OOXML. As it happens, that’s just what Brian and others at MS say, while IBM and Sun discredit every reason OOXML might have for existing. I think, on the bottom line you agree pretty well with MS’s perspective, even if you don’t share every thought they provide for justifying it.

    On the other hand, IBM and Suns stragety is:

    – first attack MS for even providing OOXML, rant about the quality (e.g., non-mixed-mode content), its proprietary nature, and the evilness of MS in general.

    – they demand regulation for ODF being an exclusive standard

    – when MS states their reasons for not just taking ODF, they demand details and examples

    – when MS does just that and advocates co-existence, they attack MS for being aggressive

    this seems so outright dishonest to me, that I really don’t understand how you can put this propaganda and MS’s unwillingness to acknowledge minor mistakes on the same level. border styles, historic datetime-bugs – that’s just details with little or no effect on the big picture.

    Stefan

  32. Stefan

    I have seen at least as many attacks from MS on ODF as from ODF backers on MS, possibly more.  Brian is certainly not the only voice in this regard, and he has not quite been the voice of reason that you suggest.  The MS approach has been to complain heartily about any minor feature of MS Office that ODF would have any trouble supporting, or to dismiss the more general approach ODF has to some features, while the ODF approach has been to complain heartily avout every feature of MS Office that MS overly specifies to lock in its own implementation.  I don’t see one approach as more noble or generous than the other.

    Also, while I don’t agree with IBM’s criticism of Microsoft for providing Open XML, it should be clear that from the public’s point of view, if the two standards are seen as equally beneficial, Microsoft wins.  For that matter, if Open XML is even seen as nearly as good as ODF, Microsoft wins.  That is why the strategy seems to be to advocate for complete adoption of ODF, as the only possibly winning strategy.  Again, I don’t agree, but I understand.  What I don’t understand is why Microsoft doesn’t seem to get that if they don’t lose, they win.  Instead of stooping to petty attacks, simply act as though Open XML and ODF are equivalent, since Microsoft will still win with that strategy.

    Lastly, I don’t think the "minor" mistakes are so minor, if the Open XML is supposed to be a standard.  Microsoft has simply had strategic reasons for taking a really good move (exposing its formats and documenting them fully) and projecting it as something else (a truly open standard).  If MS simply claimed that it was doing the first, I’d be the first to appluad.  The latter cheapens the concept of open standards, because this is so clearly a single implementation standard.  Lots of people will write to Open XML, but the vast, vast majority will only be providing integration with MS Office.  An open standard is not just an exposed interface or API, and MS is only claiming Open XML is an open standard to stop the ODF success at getting governments to push for open standards.  I just don’t like that approach, although again, I do understand the strategy.

    – Ben

  33. Fernando says:

    So much for Ben being a "voice of reason".

  34. Stefan Wenig says:

    Ben,

    if you read carefully, the voice of reason was awarded to you 😉

    I have yet to see attacks from MS matching those from the ODF camp.

    OOXML has an advantage, true enough. I agree totally that ODF must have more that just mixed content or anything to win here. But, outright lying in order to force administrations for regulation is still unacceptable. And that’s how I perceive a lot of the ODF propaganda, period.

    About open standard vs. product spec: Yes and no. Yes, it’s clearly made to support the functionality of Office products and existing documents. But that would just be what a lot of customers want, whether they decide to move towards OpenOffice or MS Office. Except that OO can’t handle the details right now – a clear advantage for MS. Not too fair either, because it’s not really a great achievement to build on one’s own code base. But that’s really the OO guy’s problem, why should customers care?

    OO could answer this by putting priority tags on OOXML features they do not support yet and start implementing them. Based on how important they might be for customers. The rest of OOXML could be preserved until OO one day supports it. Sure, this is a lot more additional effort for OO, but it would serve customers well. And it would make OOXML an open standard. It’s in their hands.

    They choose to ignore minor or less frequent problems for existing MS customers and MS Office features, that’s a fairly reasonable decision too. Simplicity comes with a price, but it’s also an advantage. But they should stop complaining that MS won’t support their agenda right now.

    There just is no way MS would completely switch to a format that does not completely support their features. They wouldn’t want to, their customers wouldn’t let them, and complaints would be even worse if they extended ODF to carry all the additional info, like they once did with HTML.

    So, what do you really expect MS to do?

    Stefan

  35. Stefan –

    I don’t mind being the voice of reason, but not the voice of Microsoft, and being reasonable in this case means seeing some of both sides.  Open XML is not designed to be an open standard, it is designed to expose Micrsoft’s existing document  formats better.  While that is a good thing, it does not make Open XML anything like a good standard.  I don’t think it even serves Microsoft’s long term interests.  The complaints about Open XML may seem minor to you or Brian, but they seem symptomatic of the way this "standard" is implemented.  Again, that doen’t mean Open XML is bad, or even that Microsoft should not be applauded for creating it.  It is a big move forward for Microsoft to open up a bit, and I am very glad they have, but Open XML does not seem like a very good standard.

    Of course, Open Document Format is not that great either. It is coming along, but has some glaring problems.  The biggest difference is its openness, which means it can change without a single company’s veto.  It can evolve, and is likely to do so, and the ways it will evolve are likely to make it stronger, because they are open to argument and debate and influence by many sources.  Does that mean everybody should adopt it now?  No, probably not, but in the long term, ODF is likely to be the better bet for general use.  I fully expect that Microsoft will one day support ODF natively, but not until they are convinced that that is what it will take to compete, much the wau IE7 supports web standards better than IE6 because Microsoft saw the writing on the wall and wanted to stay in the game.

    I guess the question I have for you is, if Microsoft’s applications are better, why wouldn’t they compete on them alone?  Open XML is a good internal format, but Microsoft could certainly create excellent translators to ODF, and could contribute heavily to making ODF a standard that met their needs.  Then, if everybody wanted to use MS Office, they could.  So, why doesn’t Microsoft do that?  It seems every bit as fair as your question.

    – Ben

  36. marc says:

    from the MS Office XML "standard":

    …..

    2.15.3.63 useWord2002TableStyleRules (Emulate Word 2002 Table Style Rules)

    This element specifies that applications shall emulate the behavior of a previously existing word processing

    application (Microsoft Word 2002) when determining the formatting resulting from table styles applied to tables

    within a WordprocessingML document.

    [Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which

    involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML

    Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those

    applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated

    due to issues with its output, and is maintained only for compatibility with existing documents from that

    application. end guidance]

    Typically, applications shall not perform this compatibility. This element, when present with a val attribute value

    of true (or equivalent), specifies that applications shall attempt to mimic that existing word processing

    application in this regard.

    ….

    i’m not sure if this kind of elements should exist if it "cannot be faithfully placed into narrative for this Office Open XML Standard"

  37. tecosystems says:

    robilad: The JSR 277 early draft catches the worm "Why didn’t OSGi/Maven/Ivy/NetBeans Modules/JAR Manifests/whatever-else-is-there become the one true way to deal with modules in Java yet?" – and Dalibor’s got questions on the subject as well (tags: JSR

  38. Stefan Wenig says:

    Ben,

    what you dislike about OOXML is the price for compatibility. Being 100% compatible with existing documents is an issue for bot MS and its users. The only reason it is not so much of an issue for other applications is that they so not match MS office feature-wise 100%, so a perfectly compatible format would not help there. A product that could compete with MS Office in terms of features would face the same compatibility issues MS does, only there is none, at least not yet.

    Still, OOXML bears the possibility of perfect comtabilitity, and any vendor is free to implement as much of it as they want or at least preserve document content they cannot process. Whether other vendors adapt OOXML (and to which degree) will be influenced by business decisions, so it’s no measure for the quality of the standard.

    OOXML is not as good as ODF as a lean and mean, easily implementable standard.

    But ODF is not as good feature-wise and in terms of compatibility.

    Saying either one is better than the other means to completely miss the point. They have different goals and different tradeoffs.

    Microsoft supporting ODF natively would mean that for an ODF document, MS would have to disable or change MS-specific features in the UI, everthing else would be frustrating for the user. I don’t believe any other company in MS’s situation would make this move. It would actually mean that MS office apps would need to have two modes, one for compatibility with old binary formats, and one for compatibility with ODF. The user would then have to make a concious decision about when to convert (and possibly lose information) and continue to work in the new format/feature set/UI. They would have to master two different modes without getting confused. Sounds like science fiction to me. Users don’t care.

    MS could have tried to improve the ODF standard to include MS-specific features, true. I’m not sure if the other members would have accepted their proposals. (They have good reasons not to, it’s just a matter of justifying it publicly. Personally, I see the current propaganda machina as a sign that they probably wouldn’t have hesitated to prevent or slow this influence using any arguments they could say without laughing out loud.)

    Still, let’s say it could have been done. For a minute, ignore the facts that

    – MS had already made investments in its own XML format by then

    – the OASIS group was starting from the OpenOffice format (MS joining this would be the tail wagging the dog, not?)

    – that it was unclear how much and how fast MS could have influenced the progressing standard.

    You would still end up with problems. If everything works out fine, the format would include every major feature that differentiates MS Office (assuming that sun and IBM would let that happen to ODF, again).

    You would still end up with either one of two problems:

    a) ODF would not accomodate every office quirk from pre-2000 version (so conversion would be lossy), or

    b) ODF would become OOXML, including all the complexity and backwards compatibility stuff you don’t like.

    who would win, then?

    Stefan

  39. Simon Phipps says:

    I think Stefan’s last post here sums up some of the arguments nicely, although I disagree about what Microsoft would have to do to directly support ODF as a peer format to RTF, HTML and all the other formats Office 12 supports. No-one is asking them to /only/ support ODF in the current context, just to /support/ it. ODF was indeed designed with MS Office portability in mind and as Stephe Walli points out would have had a profound influence on the format had they joined the OASIS group in 2002.

    I have switched formats before (Displaywrite -> Wordstar -> Wordperfect -> MS Word -> Smartsuite -> OpenOffice.org) and the nemesis scenarios don’t hold water, I survived each time. What I want to see is my next format switch to be the last one.

    I continue to support ODF because I believe a product monoculture to be harmful in the long term, and because Office 12 XML format does little to address that. I can completely understand Microsoft’s desire to protect its monopoly and some of its MVP-types agreeing that the monopoly is their preferred choice over an open market. And I can understand how documenting a data dump of the internals of Office 12 makes it easier to handle Office 12 documents. And I am sure there are valid issues in the ODF specs, as in all specs.

    What I don’t hear much from the Microsoft folks is an equal understanding of the inverse of each of those issues or any attempt to address them. That’s why I don’t drop by here much any more. I only found this entry because of a Stephen O’Grady link since Brian didn’t link to any of his critics (presumably to avoid sending us any traffic) so it didn’t appear on any logs. That’s characteristic of Microsoft’s attitude here, unfortunately. ODF is still needed because that siege mentality still exists.

  40. Stefan Wenig says:

    Simon,

    thanks the praise, even if you seem to disagree with my post although it sums up everything so nicely. Still, i’m having a hard time relating your answer to the arguments I presented, so I’ll try and make it easier for you:

    – comparing to HTML/RTF

    MS was free to make RTF accomodate everything they needed, it was much closer to OOXML than to ODF therefore. what do you compare here? what word did to HTML was just horrible. you want this to happen to ODF?

    – "No-one is asking them to /only/ support ODF"

    yes, they are. read some of the comments here, for starters. many ODF supporters claim that it was an evil move of MS to create OOXML in the first place.

    – "ODF was indeed designed with MS Office portability in mind"

    yes, so that a reasonable conversion could be done. but perfect portability was not a goal, and that’s what MS customers would certainly expect.

    – "… would have had a profound influence on the format …"

    I would not expect anybody to say anything else, still there would be good reasons for MS’s competitors to be not so forthcoming when discussing the details. standards is politics. they would damage their business, and there are good technical arguments too: for perfect compatibility with MS office, ODF would have become OOXML. I think having two standards is the better way for both parties.

    – "I have switched formats before"

    So have I. You loose stuff. You cannot print important documents after conversion without checking every detail. there are conversion errors, and there are things that cannot be converted directly. did your documents contain macros? you need more than the language, it’s a litte like javascript and DHTML: if the DHTML object models are not the same, it’s no help that both browsers use the same javascript language.

    – "I believe a product monoculture to be harmful in the long term, and because Office 12 XML format does little to address that"

    This is only true because the only serious competitor already had very good support for the binary formats. OOXML gives you much easier interoperability, thus making it accessible for everyone. the binary stuff was only an option for huge project teams like sun’s, OOXML is within reach for small projects.

    OpenOffice is making good progress ending the monoculture problem by providing a free, platform-independent solution for those who don’t need everything that MS office provides. They are still far behind MS in number of users, but do you really think that’s because of the format, which they do support after all? If not, ODF won’t break the monoculture. It’s just a good technical ground for a world after that monoculture.

    – "MVP-types agreeing …"

    Your guess is correct, I’m not developing software for Linux. But does that mean I spend my evenings making up ridiculous arguments for OOXML to support MS’s monopoly? Hardly. As someone who develops custom solutions for office platforms, not office products, I like some of OOXML, especially the good support for custom schemas and the separation of content and references. Besides this, I would like ODF just as well, anything based on XML in fact. I’m not going to touch the formatting details of any format anyway, so that’s all fine with me. Still, I find it hard to believe how much misinformation is intentionally spread around OOXML, and how hard it seems to be for some people to believe what people like brian say, because it makes perfect sense to me. I just don’t like going to customers, trying to design a solution that includes office documents, and being confronted with stuff that seems to be taken straight from slashdot.

    – "documenting a data dump of the internals of Office 12 …"

    I think everyone who repeats this old allegation should know better. In order to reload what you have in memory, you need a format that contains all the information. Backwards compatibility sometimes is a mess, too. Take the picture format options of word, for instance. Or the headline numbering options. How could a 100% compatible format NOT contain every single value, no matter how MS-specific this would be? Prejoratively calling this a data dump is not helpful in this discussion.

    – as for your last paragraph, brian links to more OOXML critics and ODF supporters than I care to read. post those links you miss here. btw, your link is broken.

    Stefan

  41. Stefan Wenig says:

    All these discussion leave me wondering: what kind of interoperability IS possible between different office products?

    I think there are three things that can be done:

    a) Agree on a set of features and their parameters. Every word processor can agree on paragraphs, page breaks, bold and italic, etc. The repesentation of those are interchangeable. But there are also things that can be handled in many different ways. Think about complex issues like positioning, anchors, or numbering. The internal representation of the program determines the way the user can interact with it, and vice versa.

    b) Make good converters. This can hardly be a perfect solution, though. Even if both programs have a feature for everything the user wanted to do with his document: Programs don’t caption the intention of the user when he designs the document, but what he does to the internal data repesentation. Without knowing the intention, a converter can only guess what the user would have done to the data of another program. Sometimes it’s obvious, somtimes a heuristic approach as to be taken. Sometimes this is acceptable for users, sometimes it’s not. For updates, it’s usually not.

    c) A combination of a) and b) plus preserving the original information that cannot be losslessly converted  (until it is touched by the user in the target program).

    So, the perfect way would be to standardize everything in office applications that has reached commodity status, and then allow products to innovate and compete based on conversion and preservation, right? Not quite. Unfortunately, you cannot standardize the behaviour of legacy versions. But then, there’s only one family of legacy document formats you really have to consider (given the numbers of existing documents), and that’s the binary MS office formats. So, wouldn’t it be perfect to standardize on their features first?

    Again, no, because they carry so many idiosyncrasies from the past. Supporting those is hard work, and a fresh start has many advantages. So, many vendors will choose not to compete with MS on its home turf (compatibility) and rather target new customers and those who can accept limited compatibility.

    So, is there a perfect way? I think not. But having ODF _and_ OOXML (and acceptable converters between those) seems pretty good to me. Which one to use is a decision only the customer can make. Everything else seems to be just people trying to sell their products.

    What can be done to improve the situation? I think the only think that can be done is adding good standardized format preservation support to both formats. Which is probably not half as easy as it sounds.

    One of the reasons this is so interresting is that this seems like an anology for so many other types of data. UML models is one area where I’ve experienced similare problems, ranging from details in the representation to round tripping issues. But office formats get so much more attention.

  42. Stefan,

    I think you misunderstand me.  You say "what you dislike about OOXML is the price for compatibility", but I don’t dislike OOXML.  Actually, there are a few things I dislike, but nothing serious.  What I dislike is that Microsoft claims that it is an open standard, whereas OOXML seems very much like a fully disclosed vendor specification.  Mind, I think that is GOOD!  I don’t even mind the oddities much, as a vendor specification.  But the idea that Microsoft wants it both ways, as a very implementatio-specific format AND an open standard, is not going to work.  But let me repeat… I don’t object to OOXML, just the farce that making it a pseudo-standard represents.  Yes, MS needs to support every one of its umpteen gazillion legacy documents.  They should be praised for doing so, and for making the specification publicly available.  They should even be praised for the covenant not to sue, etc., but those things still don’t make this an open standard in the way ODF is.

    My argument is not about whether ODF or OOXML is better or worse (there are arguments for each side), but that ODF is an open standard that is designed to be improved for the good of the general populace, while OOXML is a basically closed standard that is designed to support Microsoft’s many, many customers and gazillions of legacy documents.  Telling me how much better one is than the other at supporting legacy MS documents hardly addresses my argument.

    – Ben

  43. hAl says:

    @Ben

    I think OOXML is as much a vendor specification as ODF is. Frankly I think the current ODF specs are unusable without a reference implementation.

    There is no way people will be interoperable on ODF without a reference implementation just from reading the ODF specs and builidng on that (unles they create only the simplest of Office files). That means currently the ODF specs are almost useless without the main implementation that OpenOffice provides (mayby supplemented by kOffice).

    Even allthough the OOXML specs are much more detailed I would dare anyone creating fairly complex documents that are compatible with MS Office without looking at documents created using the reference MS Office implementation doucments.

  44. Stefan Wenig says:

    Ben,

    maybe I did misunderstand you a bit, or at least exaggerate the reservations you have. From what you said, there are definately things that make ODF a better standard in the sense that it is cleaner and less complicated than OOXML for the subset of functionality that is covered by both. More generalized etc. And I’d agree.

    But besides this, I still disagree with you.

    MS office documents is basically everything that’s out there now. MS office customers are the target of every other office product. Therefor, perfect compatibility could be a design goal for any other product too, and this would make OOXML a perfect candidate for an open standard.

    I understand that MS’s competitors choose a different approach (as I explained in my response to Simon, for both business and technical reasons). But that does not change the fact that MS is actually offering something that _could_ be a widely adopted open standard. It’s not for MS to make the decision whether other products will adopt it as a primary format or not, though.

    We’ll have to wait and see. If Sun and IBM really fail to establish ODF as a widely used standard (in number of users, not products), they might choose to join ECMA later. I think both the spec and the process are open enough to make it happen.

    @hal

    Many open standards require a reference implementation for real-world interop. I don’t think this makes either ODF or OOXML less of an open standard.

    Stefan

  45. hAl says:

    @Stefan

    I agree that a reference implementation is of ten needed / usefull.

    However that makes claims of OOXML being a Microsoft format as much valid as ODF being an OOo format as those as the main implementations that will be used as a reference.  

    The fact that OOXML is now an open format gives it the oppertunity to move on and develop and even depreciate MS legacy specific features in future versions.