4000 pages of documentation

There has been a great overall reaction to the news last week of Ecma's first public draft for the Office Open XML formats. One thing that is now absolutely clear to everyone that we are talking about an extremely rich and powerful set of file formats.

I think many folks didn't realize the amount of work we've had to take on, which explains why some had the false assumption that we could just use ODF. We were pretty clear in our response that it just wouldn't work for our customers because at the end of the day an open format is useless if the majority of our customers won't use it. That's why we had to make our formats fully support all the existing Microsoft Office files out there. If the formats didn't support all those features, then the only people who would use them are those that fundamentally want an open format; and everyone else would have just stuck with the old binaries. We absolutely did not want that to happen, we wanted everyone using an open format. We've invested a ton of resources into XML file formats because we believe it's a good thing, and we need to make sure that our customers will be willing to use them.

Let me be clear on a couple key points:

  1. Rich format - Yes the format is extremely rich and deep and that's because it represents a very powerful set of applications that have evolved over many years and many documents. It would have been completely unacceptable for us to create a format that didn't fully represent the existing base of Microsoft Office functionality. If we had created some kind of subset format, many in the industry would have complained for very legitimate reasons. People would have complained that we were destroying fidelity with the key features they used, that we were hiding functionality, not enabling everyone to exploit the rich features, not encouraging the move to XML, etc. Bottom line ā€“ millions of organizations would have had a legitimate problem.

  2. Extremely detailed documentation - It's funny but I've actually seen people complaining that there is too much documentation. The documentation is essential, even if there are parts that are not used by everyone. I personally think we have to provide documentation on every aspect of our format, otherwise how do you know what something means? This is a lot of work, and I believe it's absolutely necessary. I can't imagine there being a benefit to anyone from not documenting something.

  3. Full implementation - I don't think it should come as a surprise that with the rich set of features in Office, it's going to be a lot of work to build an application that can support all of that functionality. In the past, people had said that the reason nobody could build an application that matched Office was that the formats were locked up. Well, the format information was available, but not for all the many purposes that we are enabling now. Now all those people should be happy because the format information is complete enough to enable a full understanding by everyone. It's up to those other applications though to decide what level of support they want to build. While I think interoperability is possible, the struggles that the applications supporting ODF are having show that it's really a lot of work even for a format that isn't as deep. This is often to be expected though because the different applications have different sets of features as well as different implementations of the same features. That is how things work.

  4. Partial implementation - Now, if you don't care about fully matching Office features, then anyone can choose to just support a subset of the format. You can implement as much or as little of the format as you want. You can build an application that does nothing more than adds comments to a WordprocessingML document; or that automatically generates a rich SpreadsheetML file based on business data. It's up to you. The information is all there to use in the way that best benefits your application.

  5. Room for innovation - Now that all the features we've stored in our formats are fully open and documented, people are free to build with them. In addition to the fact that you can implement as much or as little of the format as you want, you are also free to add more to the format. The formats are completely extensible. You can add your own extensions to the format, or you can even join Ecma and propose that those extensions get added to the official Ecma standard. The strong support for custom defined schema in Office gives you a lot more power than what a document format on it's own would give you, through integration of your own parts.

  6. Microsoft does not own the standard - We no longer own these formats, Ecma does. I know there is still concern out there that these formats could change out from under you, but that's not something that Microsoft can do. Ecma fully controls it, and once it goes through ISO, it will be even more solid and locked down.

I'd also like to reuse some information that I left as a comment in my post last week. Some people were a bit confused on how you could create a standard that was so rich and had all the backward compatibility with the existing base of Microsoft Office documents. It was even suggested that almost as a way of leveling the playing field we should choose just a subset of features that we think everyone can build applications for. This would be a great move for our competitors but a horrible move for our customers. Adam provided a lot of feedback and I really appreciated that he took the time to write all that up. Patrick and Biff had some really great replies that tried to explain why backward compatibility was so important. Here is the reply I left for Adam that I hope helps really clear up his questions around why we went the standardization route in the first place:

Hey Adam, thanks for taking the time to get all your thoughts down. It definitely has helped me understand where you are coming from.

It sounds like you understand that from our point of view, in order to use an XML format as the *default* format for Office it needs to be 100% compatible right? I think you're point is more that we should also have an optional format that is more basic and doesn't necessarily have 100% of the features covered. That smaller more basic format would then be the one that should be standardized. I think that's what you are saying.

Based on your description, the format you desire sound a lot like HTML. HTML is a great format for basic interchange. It doesn't support everything that is present in an Office document, but as you said, that isn't always desirable. We've supported HTML for quite awhile, although we took the approach of trying to have our cake and eat it too when we attempted to make our HTML output support 100% of our features. The result was an HTML format that had a ton of extra stuff in it that many of the people who just wanted HTML didn't really care about (and it just got in the way).

Our primary goal this release with the formats was not to try and re-implement HTML, but instead to move everyone over to using XML for all of their documents. Let's talk about the motivations for what we are doing with Open XML since that was the main point of your question:

  1. The reason we've spent the past 8 or so years moving out formats toward a default XML format is that we wanted to improve the value and significance of Office documents. We wanted Office documents to play an important role in business process where they couldn't before. We wanted to make it easier for developers to build solutions that produce and consume Office documents. There are other advantages too, but the main thing is that Office documents are much more valuable in just about every way when they are open and accessible.

  2. The reason we fully document them is the exact same. We need developers to understand how to program against them. Without the full documentation, then we don't achieve any of our goals I stated above. The only benefit would be that other Microsoft products could potentially interact with the documents better (like SQL or SharePoint), but that doesn't give us the broad exposure we want. That would be selling ourselves short. We want as many solutions/platforms/developers/products as possible to be able to work with our files.

  3. The reason we moved to the "Covenant not to sue" was that a number of people out there were concerned that our royalty free license approach wasn't compatible with open source licenses. Again, since the whole reason for opening the files was to broaden the scenarios and solutions where Office documents could play a role, we moved to the CNS so that we could integrate with that many more systems. Initially we'd thought the royalty free license just about covered it, but there was enough public concern out there that that we decided we needed to make it even more basic and straightforward. We committed to not enforce any of our IP in the formats against anyone, as long as they didn't try to enforce IP against us in the same area. No license needed, no attribution, we just made a legal commitment.

  4. The reason we've taken the formats to Ecma for standardization is that it appeared that a number potential solution builders were concerned that if we owned the formats and had full control, we could change them on a whim and break their solutions. We also had significant requests from governments who also wanted to make sure that the formats were standardized and no longer owned by Microsoft. Long term archive-ability was really important and they wanted to know that even if Microsoft went away, there would still be access to the formats. We were already planning on fully documenting them, but the Ecma standardization process gave us the advantage of going through a well established formal process for ensuring that the formats are fully interoperable and fully documented. It's drawn a lot more attention to the documentation as well so I'm sure we'll get much better input, even from folks who aren't participating directly in the process.

I hope that helps to clear it up a bit. It really is just as simple as that. Any application is free to implement as little or as much of the format as they wish. If you really want every application operating on a more limited set of features, that isn't as much of a format thing as an application thing. You would need to get every application to agree that it will not add any new features or functionality, and will disable any existing functionality that the other applications don't have. That wasn't our goal. Our goal was to open up all the existing documents out there, and then anyone who wants to build solutions around those formats is free to do so. In addition, anyone is free to innovate on top of the formats, as I believe there is still a lot of innovation to come. The formats are completely extensible, so if someone wants to use the formats (or parts of the formats) as a base and build on top of that, they can do so as well. They can even join Ecma if they want and propose to add those new extensions to the next version of the standard.


Comments (24)
  1. Rob says:

    I definitely concur. I have been waiting for beta 2 since I did not get in on beta 1.

    I am most interested in this next release Office primarily because of the open xml standard.

    I have been reading thru the ecma document and playing around with System.IO.Packaging and I am extremely impressed with the openness.


    Oh, the Ecma doc is very very detailed. If you don’t get it after reading it your blind.

    Extremely cool!!!

  2. Mike says:

    I’d like to add some fact checking to your prose. You have said countless times that the new format are all xml. It is simply not true. Example : it was brought up a couple of weeks ago in the openxmldeveloper.org website that if you password-protect your documents then Word/Excel/Powerpoint will actually create an OLE document, and encrypt it there.

    I’ll repeat it, files will have the same extension that if they were zip-based openoffice xml packages, except that they are not.

    So much for xml.

    And, of course, since this behavior has nothing to do with System.IO.Packaging, it means that it is impossible to programmatically work with such documents.

    Any comment? Replying that "a password-document should not be programmatically" does not count, since the  document security should have nothing to do with the programmability of the document. Consider a scenario where you’d like to make a bulk update to those files before you share them across the enterprise.

  3. Adam says:

    Brian> "It’s funny but I’ve actually seen people complaining that there is too much documentation. […] I can’t imagine there being a benefit to anyone from not documenting something."

    I think you have the wrong end of the stick there.

    I don’t think people are complaining that there is too much documentation because they’d like things to be less well documented, or even not documented.

    Instead, I think people are complaining that there is too much documentation because they’d like the format to be less complex, such that less documentation would be required to describe all of it.

  4. BrianJones says:

    Adam, see my first point about why it has to be such a rich format.

  5. Alex says:

    Interesting that people complained that the format was too rich or too documented; where did you see those complaints?

    I think you’re wrong saying ODF isn’t as rich as OXML. ODF builds on a number of industry standards, so for example where OXML has to specify a load of packaging conventions, ODF just says "We use XLink". ODF’s richness comes from building on existing industry standards.

    On your comment about Microsoft not owning the standard – uh, surely until it successfully goes through ISO it does? The mandate of Ecma is to create a format compatible with Microsoft Office.

    Or are you saying that the mandate of TC45 has changed?

  6. Adam says:

    Brian: I’m aware of why MS, and some of its customers, need the format to be that complex. (Mostly thanks to your patient explantations.)

    I was just trying to point out that a number of people (even, or maybe even especially, some technical ones) will be intimidated by that level of complexity, and that will be why they’re asking for less documentation. Not because they want it under-documented, but because they’ll want it less complex.

    But, hey, you’ve got enough customers that some of them will always want the exact opposite of what some of the others want, so you’ll never be able to please all of them anyway. šŸ™‚

    I really wasn’t trying to start something there, I was just offering a possible explanation for people asking for less documentation, in the hope of reducing your bemusement levels.

  7. BrianJones says:


    I really appreciate you taking the time here to work through this. I really understand your point of view, and I actually agree with a lot of your latest comments. I think that they key isn’t to simplify the spec or the format, but instead to provide a lot of great tools and resources to help people get started.

    That’s why I have this blog, and it’s why we started the openxmldeveloper.org community. Unlike a lot of the other blogs and foundations/alliances/fellowships that focus more on policies and marketing, we’re trying to focus on the actual technologies and how you can work with them. We’re going to keep this going stronger and stronger, especially now that Beta 2 is out.

    Kevin Boske is planning to post a collection of code examples that show how to do a number of common things with the file formats. I think we’ll continue to see more and more things like that pop up that will help to get people started. The Ecma spec contains the full blown reference materials for how everything works. It’s all the other stuff that will help simplify and break things down to help you get going.

    I really hope that we’ll be able to have the right level of information for everyone, so that anyone that wants to work with the formats can do so. That’s why I really like having this discussion with you, because it’s helped see things from your point of view, and I’d love to hear more from you on what type of information you’d like to see.


    A few people had raised the richness issue in my original post about the Ecma draft being released, which is why I thought it was important to talk about.

    I think you might want to dig a bit deeper into the formats. ODF does build on existing industry standards, but at times they are partial implementations, and it still leaves out a lot. For instance, Open XML actually uses more of the dublin core metadata schema than ODF does.

    Another easy example would be to look at the different types of numbering for a wordprocessing file. In Microsoft Office you can say that the numbered list should be "first", "second" and "third" instead of  "1.", "2." and "3.".  ODF doesn’t support that.

    That’s just the beginning though. If you are from another country like Japan or China, there is absolutely *zero* mention for how your numbering types are defined. The spec only specifies:

     – Numeric: 1, 2, 3, …

     – Alphabetic: a, b, c, … or A, B, C, …

     – Roman: i, ii, iii, iv, … or I, II, III, IV,…

    No mention at all about what you do for any other language. If you use OpenOffice, they actually do support other languages, and they even save out those other numbering formats into the ODF  style:num-format attribute. The problem though is that behavior isn’t defined in the spec, so how does anyone else that wants to read that document figure out what OpenOffice’s extension means? Maybe I’m just missing something, as the ODF spec is really vague in a lot of areas, but I looked around for awhile and couldn’t find anything.

    Even if you don’t pay attention to the things that are just flat-out missing from the format, the documentation for the things it does support is pretty minimal. In the latest Ecma draft, we have about 200 pages discussing the syntax of formulas for spreadsheets, ODF has a few lines. That gives me the impression that no one that does accounting or works on Wall Street was involved in the standard because I can’t really imagine them allowing it to go through without specifying how formulas should be represented. It’s no wonder the few applications referenced as being "full implementations" of ODF aren’t even capable of full interoperability (http://permalink.gmane.org/gmane.comp.openoffice.devel.xml/2236).

    As far as the ownership issue goes, the Ecma charter says that the formats are to be compatible with the existing Microsoft Office *documents*, not with Office itself. I think we’ve already established pretty clearly why it’s important to maintain that compatibility. And yes, Ecma owns the formats.


  8. Brutus says:

    <quote=Adam> "I don’t think people are complaining that there is too much documentation because they’d like things to be less well documented, or even not documented.

    Instead, I think people are complaining that there is too much documentation because they’d like the format to be less complex, such that less documentation would be required to describe all of it. "</quote>

    In other words, Microsoft should eliminate (or at least deprecate) features so as to make it easier for less functional office suite vendors to compete.  LOL

  9. Biff says:

    Mike: how do you propose encrypting a document and keeping XML structure at the same time, with ROT-13?

    Brutus: bingo!

    Brian: in my opinion making a list of Open XML parts that are not defined in ODF standard is great idea. I’m sick and tired of people saying that you can do everything in ODF. No you cannot! And of what you can, at what price?

  10. Mike says:

    "how do you propose encrypting a document and keeping XML structure at the same time, with ROT-13? "

    I am the one supposed to answer this question? I thought we were supposed to buy a product which products 100% XML-based file formats. Ask Brian, thanks.

    If Brian cannot come up with a positive answer, then all the claims "100% xml", "full xml", "went entirely to xml", … made since last year are lies. I know, it’s the kind of little details that you’d rather slip through something. But that’s those little details that are important to users, a lot of which are as savvy as you can imagine. Remember, users may only use 10% of the features, but they don’t use the same 10%.

    If I were to be consistent and avoid the XML sand castle, I would definitely encrypt the content in a XML CDATA section. Doing so, it would not still be programmable without a supplement API able to let me appropriately consume the content, i.e. the API would require passing a password in order to read/write the content in clear. That would be a first step.

  11. Alex says:


    OpenDocument is well known to support  variety of languages, and the Japanese ISO member pointed out a couple of problems with the spec. (mostly to do with international URIs). I think they would have noticed if numbering was a problem. The guys in the middle-east were looking at it too.

    You’re absolutely right about formulas; OpenDocument does not specify a syntax, and that is something the TC is working on. There is a wider problem here, though: formula syntax is something users know directly. Should OpenDocument do something new, or just what Lotus 1-2-3/Excel did/do? OXML has the luxury of only caring about compatibility with Office file formats; OpenDocument is designed to be widely compatible with all.

    We can both reel off things which are missing in the other spec. What is sad is that Microsoft didn’t participate in the OpenDocument process, where you could have added whatever features you thought were missing! Had you gone that route, you’d have been working with an ISO standard by now šŸ˜‰

    I know you like to keep branding OpenDocument as "vague", "hobbyist", etc. I really think you ought to give it the respect it deserves: it’s been through a thorough standardisation process twice, and was created in an open industry process which involved a lot of people with huge expertise with office documents.

    Granted, Microsoft don’t want to use it, fine. But, I’m not sure the comments you make are fair. The comments about it on the Microsoft website certainly aren’t, they’re not even factually correct.

    Biff: OpenDocument manages to encrypt documents using the strong Blowfish algorithm, yet doesn’t resort to embedding OLE. It’s definitely possible.

  12. Don Giovanni says:

    Alex, you said, "What is sad is that Microsoft didn’t participate in the OpenDocument process, where you could have added whatever features you thought were missing!"

    I think ODF supporters as well as the members of the OASIS ODF committee who intimate the above sentiment are being disingenuous.  They know darn well that there’s no way that Microsoft could (or even should) adopt ODF as their default format even if they wanted to.

    Here’s what http://xml.openoffice.org/ says regarding ODF and OpenOffice.org:


    OpenOffice.org XML file format: "The OpenOffice.org XML file format is the native file format of OpenOffice.org 1.0. It has been replaced by the OASIS OpenDocument file format in OpenOffice.org 2.0."

    OASIS OpenDocument file format: "The OASIS OpenDocument file format is the native file format of OpenOffice.org 2.0. It is developed by a Technical Committee (TC) at OASIS. The OpenDocument format is based on the OpenOffice.org XML file format."


    So, ODF is *based* on OpenOffice.org’s previous XML format.  ODF is not "nuetral" any more than OpenXML is.  ODF is simply the opened version of OO.o’s previous XML format and OpenXML is the opened version of Microsoft’s previous XML format.  ODF is not standing on any higher moral ground, contrary to the rhetoric of the ODF peanut gallery.  Indeed, it looks like OpenXML’s ECMA standardization process has been much more rigorous than the OASIS ODF standardization process which was little more than tweaking OO.o’s previous format and calling it good.  The ODF supporters even subtly acknowledge this with their claim that "Microsoft should’ve participated in ODF and added whatever they thought was missing", as that suggests that only minor tweaking would’ve been required/permitted.  ISO ratified the OASIS spec, but that ISO did this without there even being a standard syntax for spreadsheet formulats shows that ISO’s standardization process was not rigorous in any way, shape, or form.  Make no mistake, ISO’s rubberstamping of ODF as a standard doesn’t change the fact that ODF is OO.o’s format.  Not WordPerfect’s, not Lotus’s, not KOffice’s, not AbiWord’s, not Gnumeric’s, and certainly not neutral.

    To ask that Microsoft participate in ODF discussions to add whatever they thought was needed means that Microsoft, who already had an XML format, should forcefeed their features into, not a neutral format, but into a competitor’s format.  The problems with this should be apparent, but here are some:

    1. It’s illogical to ask the Vendor A to adopt Vendor B’s file format when Vendor A’s suite has produced orders of magnitude more documents in the world than Vendor B’s has.

    2. Microsoft has the burden of supporting billions of documents that have already been created using its formats, and cannot afford to risk ODF members vetoing Microsoft features that aren’t present in OO.o.  Indeed, certain ODF members have an incentive to dupe Microsoft into supporting a format that breaks old Microsoft Office documents.

    3. Looking at it from OO.o’s perspective, would they really want up to 4000 pages worth of features that they may or may not support to be placed into their format?  There’s no way that OO.o would’ve allowed Microsoft to come in and overhaul OO.o’s own format to suit the purposes of Microsoft, nor would I expect them to.

    Oh, and regarding your "Should OpenDocument do something new, or just what Lotus 1-2-3/Excel did/do? OXML has the luxury of only caring about compatibility with Office file formats; OpenDocument is designed to be widely compatible with all" statement regarding how formulas should be saved, since ODF is *based* on OO.o’s previous XML format, they should’ve just did what OO.o already did.  The problem with that is that OO.o’s spreadsheet is too woeful to use as a basis for a forumla format.  You say that "the TC is working on" a sytax for forumlas.  I thought this was already an ISO standard.  Looks more like a standard in progress.  That ISO simply let that go through without blinking an eye shows that ISO had no rigorous standarization process whatsoever.

  13. Adam says:

    Don> "ODF is simply the opened version of OO.o’s previous XML format and OpenXML is the opened version of Microsoft’s previous XML format.  ODF is not standing on any higher moral ground, contrary to the rhetoric of the ODF peanut gallery."

    Are you saying that OO.o’s format wasn’t designed as an implementation-neutral format to begin with, and no-one other than Sun and OO.o developers had suggestions accepted into the format?


    Or are you saying that MOOX has not been specifically designed to support MS Word and all it’s accumulated features, no matter how complex that makes it?

    Don> "Make no mistake, ISO’s rubberstamping of ODF as a standard doesn’t change the fact that ODF is OO.o’s format.  Not WordPerfect’s, not Lotus’s, not KOffice’s, not AbiWord’s, not Gnumeric’s, and certainly not neutral."


    http://www.koffice.org/announcements/announce-1.4.php (KOffice has supported ODF since June 2005

    http://www.koffice.org/announcements/announce-1.5.php (KOffice now uses ODF as it’s native format)

    I do generally agree with you though.

    MS probably couldn’t have got every change they would need to get an iron-clad 100% upgrade guarantee for their customers through the ODF process.

    MS also can’t accept anything less than this because of the number of customers they have that insist on 100% document upgrades.

    On top of that, the ODF format wouldn’t have been any better, and those looking to build fully interoperable independent office applications wouldn’t have been helped in the slightest if MS had got all the changes through that they’d have needed to.

    Based on that, I’m coming to the conclusion that MS attending OASIS to try to get _all_ their changes into ODF (as half measures wouldn’t be good enough), would have been a waste of everyone’s time, and it’s probably for the best that they didn’t.

    As someone who works in a heterogenous environment though, I know which one I’ll be looking to use in the future. MOOX just isn’t right for me. I’ll have to hope that the ODF plugins that 3rd parties are working on for Word will be good enough for anyone who needs to share documents with me.

  14. BrianJones says:

    Actually Adam, it is true that the ODF format is largely based on the XML format from OpenOffice, you can’t really argue with that. There are differences, but for the most part the structures look to be almost identical.

    I know that there are multiple applications that "support" it, but from what I’ve seen there isn’t any true implementation of ODF out there. Apparently even OpenOffice doesn’t properly support it yet: http://permalink.gmane.org/gmane.comp.openoffice.devel.xml/2238

    In addition to that (as I mentioned above) there are a number of places where OpenOffice has already extended ODF (formulas and international numbering for example). If those extensions aren’t in the ODF spec, then that would imply the spec is not yet complete as Don pointed out. It also leads you to wonder what happens when/if the spec is updated to support those extensions. Will Sun make sure that the spec matches the OpenOffice extensions? Or will OpenOffice have to change it’s format again to match the spec? Or will they use extensibility mechanisms to always output both representations?

    Adam, I’d really like it if you gave OpenXML more of a look. As I said before, we are investing a lot into providing good tools to help developers work with the formats.


  15. BrianJones says:

    I’ve also heard others make the same statement that Alex did about the OpenDocument specification being designed to be widely compatible with all. But when I look in the appendix of the spec  entitled "E.1. Changes from "Open Office Specification 1.0 Committee Draft 1", there are a handful of changes, mainly corrections to documentation, namespace changes, and a couple extra sections added (9.5; 9.8; 13; and 11.2). That gives me the impression that this was pretty strongly focused on matching the original Open Office format. Even then though, there seem to be things like formulas that exist in Open Office but weren’t even included in the draft. Maybe I’m missing something?

    Alex said that OpenDocument didn’t do formula syntax because it was hard to choose between the different available options. Well that’s what creating a standard is all about isn’t it. You need to specify how everything should work so that people can use it. You also need to account for future innovation, and make sure it’s extensible, but when you already have a feature that exists, it should be represented. Especially something as crucial as formulas.


  16. Don Giovanni says:


    Don> "Make no mistake, ISO’s rubberstamping of ODF as a standard doesn’t change the fact that ODF is OO.o’s format.  Not WordPerfect’s, not Lotus’s, not KOffice’s, not AbiWord’s, not Gnumeric’s, and certainly not neutral."


    http://www.koffice.org/announcements/announce-1.4.php (KOffice has supported ODF since June 2005

    http://www.koffice.org/announcements/announce-1.5.php (KOffice now uses ODF as it’s native format) </quote>

    You’ll have to forgive me for indulging in rhetorical flair. šŸ™‚  Yes, I know that KOffice made ODF its default format and others will do the same, but my point is that anyone that adopts ODF is essentially adopting OO.o’s format rather than some "ideal" created from scratch format.

    And yes, I know that OpenXML is a descendent of previous Microsoft XLM formats and that part of its raison dā€™être is to be an open XML format for Microsoft’s documents rather than to be an "ideal" created from scratch.  It’s right there in the charter.  Microsoft never claimed otherwise.

  17. Alex says:

    I can’t comment on the section you’re talking about, but the OASIS FAQ on the subject says over 100 non-editorial changes, and even the "couple of extra sections" you mention are actually huge features. XForms alone is a massive spec., and is a seriously powerful feature set that no suite, OpenOffice.org or even MS Office, really harnesses right now without resorting to macro programming.

    The formulas thing wasn’t about avoiding setting a standard. For one thing, it’s not even clear that it should be part of this standard: the route OXML has taken, of specifying the syntax in the standard, means that you need two separate validation tools if you want to check a document, because it’s not (and shouldn’t be) an XML syntax. It’s a valid spec. choice, but not necessarily the only choice.

    But, you’re right, it should be specified somewhere – the question is how. There’s a good argument to say that OpenDocument should import the formula syntax from OXML, at least as an option, and I think you’d find many ODF users would support that. That’s more or less the way it works right now in any event.

  18. Adam says:

    Don: Heh. If most of the people here couldn’t forgive rhetorical flair (which I can’t honestly claim to have not employed myself on occasion) then this would probably turn nasty and unproductive pretty damn quick šŸ™‚

    Brian: I never meant to argue that ODF wasn’t largely based on the XML format from OpenOffice. My intention was to point out that the original OO.o XML format was designed with implementation-neutrality and _re-implementation_ from scratch in mind, and that as a place to start from it was better than most of the alternatives.

    Brian and Don: Can we at least agree that MOOX was not designed with ease of complete re-implementation as a goal? Even though that will unavoidably be technically possible as a result of the standardisation process.

    And can we agree that programatically manipulating existing documents (e.g. to produce summaries, or to make a particular change to a set of elements matching a given pattern), and automatically creating basic documents (e.g. reports from data sets and templates) is fundamentally different from creating a piece of office software that has to be able to work with the full range of features – reading, writing and exchanging – that the format, and other implementations, can provide?

    Brian: I think there’s an essential difference between what the majority of MS’s customers want from a standardised spec, and what I want from a standardised spec. MS is, as they should be, serving as many of their customers as possible in taking MOOX to ISO.

    But because their reasons for doing so are not the same as mine, and because the problems they are trying to solve are not the problems I happen to want solving (and who am *I* to demand that MS solve *my* problems?), I don’t think that MOOX will end up being right for me.

    OTOH, OASIS and ODF do happen to be trying to solve the problems that I want solved. (ref. long post in previous article) Therefore, even though conformance may not be 100% now (but, let’s face it, there are currently no *shipping* products that support MOOX either – so lets see where both camps are in 6 months time to argue that one šŸ™‚  I think that ODF will end up being the better choice for me and the sort of diversity of groups I think I’ll be working with 10 years from now.

    AFAICT, MOOX and ODF are both being sent to standardisation bodies for different reasons and to solve different problems. I think that *intent* will shape how the formats are used more than their current levels of implementation and conformance. And I think it will depend on things like the size corpus of documents you currently have, how important fidelity is to you in comparison with cross-platform portability (not just automatic manipulation, but document creation, editing and exchange) and what problems you want a standardised document format to solve, as to which one you’ll want to adopt.

    Brian, thanks for the time you’ve taken to explain the process MS has gone through here, and the reasons they’ve had to go through it. It has certainly made me more appreciative of why the spec (at its current stage) is 4000 pages long, and why that _is_ the right thing for many of MS’s customers. It’s so easy to forget that not everyone else wants exactly the same things as I do.

  19. Doug Mahugh says:

    I recently mentioned on this blog that the Ecma TC45 committee had released Working Draft 1.3 of the…

  20. Hey Folks,

    I have purposely held back from adding additional comments over the last week or so because I found myself getting a bit too worked up in regards to various topics, and realized I simply needed to step back and reevaluate why XML is such a wonderful invention in the first place.

    Theres only one comment from above that I want to quickly add a few bits to,

    > If Brian cannot come up with a positive answer, then all the claims "100% xml", "full xml", "went entirely to xml", … made since last year are lies. <

    I can take ANY XML document, and run it through a variety of programs that would turn it from XML to mush, and everywhere in between.  To suggest "once XML, it must ALWAYS stay XML, and if you attempt to suggest anything else your liar!" suggests that we are forgetting what XML is all about in the first place.  In fact, we are forgetting what a computer is even capable of for that matter.

    XML was designed as a cross-platform data exchange format.  Once the XML reaches the platform, it is up to the platform to decide what to do with it from there.   Data is data no matter what format it happens to be in.  And for a computer to process data it needs to be in a format it understands.  Computers don’t understand XML.  XML is text.  Computers understand binary.  Two digits.  O and 1.  That’s it.  From that point forward everything is an abstraction.  In fact, in nearly every case, we have to trick the computer into providing what appears to us as multiplication, division, and subtraction, and in fact is just a combination of clever hacks that turn a combination of LOTS and LOTS of addition into the other three.

    Microsoft, Brian, the overall group of folks working on this format from MS, the ECMA, and the list of ECMA members taking part in the development of this format have done a bang up job with Open XML.  This is truly more than I ever expected would even be possible, and this is only the first public release.

    Adam, you spoke to the notion that people would like to have a simplified format rather than several copies of War and Peace (obviously I added the W&P, but I think the general idea is still intact šŸ™‚ — Well then…  Why don’t we, the community, take this upon ourselves to help simplify things into smaller bite (I was tempted to use byte, but held back šŸ˜‰ sized pieces such that those who are only interested in smaller, more manageable pieces can more clearly understand how to go about this?

    In fact, I have already begun the development of just such a project, of which Brian is both aware of, and supportive of.  I plan to also encourage people to help simplify the OpenDocument format, but lets be honest… ODF is pretty vanilla already.  If you can’t read through the spec during your lunch break, I’m guessing that maybe you should take more than a five minute lunch break. šŸ˜‰

    None-the-less, if folks feel that ODF is over complicated for their needs, then I think its a good idea to help out as needed.  We need to remember that this isn’t about document formats, and instead building tools that enable us and the people we serve with the products we develop to be more productive, more capable, more free to do more with their valuable time than to be disabled to accomplish the task at hand, whatever that might be.  Instead we need to help make things as efficient as possible, while at the same time providing all the pieces necessary to make the experience as rich as anyone might want it to be.

    I will be making an announcement to my XML.com/O’ReillyNet blog when the project I mentioned above is ready to go.  It won’t be long, as to get it up and running and ready to begin working on the mentioned tasks requires nothing more than existing, prebuilt software components with a little bit of customization to add a bit of style.

    If interested (and this goes for anybody who reads this) in getting involved with this, you can access the Atom or RSS web feeds for my mentioned blog via http://www.oreillynet.com/pub/au/2354

    If you folks are truly interested in making a real difference for users of OpenXML and ODF, then I would encourage you to get involved with this project when the time comes to do just that.

    Thanks everyone!  This really is important work… I look forward to being a part of it with any of you who find interest.

  21. Ecma has published an updated draft of the spec for the Office Open XML Formats Standard. Here’s a link…

  22. I’ve had a lot of folks ask me to provide more information on what features are missing from ODF and…

  23. As we move forward with the standardization of the Office Open XML formats, it’s interesting to look…

Comments are closed.

Skip to main content