Draft 1.3 of the Ecma Office Open XML formats standard

Wow, we finally have an updated draft of the Ecma Office Open XML formats standard! http://www.ecma-international.org/news/TC45_current_work/TC45-2006-50.htm I’ve been waiting for a long time to be able to share all the great work that’s been going on in Ecma TC45, and it’s so awesome that we have a new public draft. I can’t wait to hear what everyone thinks. If you go to that site, you’ll see three different downloads:

  1. Draft 1.3 of the spec – The big download is the spec itself in PDF form. It’s about 25 megabytes and is around 4000 pages.
  2. Draft 1.3 of the spec in the Open XML format – Alternatively, you can download the .docx version of the spec. Once Beta 2 comes out, you can open it that way (although opening 4000 pages of content with beta software may be slightly problematic <g/>)
  3. Schemas – The schema files are also available for download. They are available in a ZIP file, that also contains an index.htm file that describes each xsd

We’ve been working really hard over the past 5 months bringing this standard along. There is still a lot of work to do, but you’ll see pretty clearly that we’ve made a ton of progress over the initial submission from last year. We have weekly 2 hour phone conferences (they are actually at 6am my time which is not ideal <g/> ), as well as 3 day face to face meetings about every 2 months. The contributions from everyone has just been outstanding. It’s so awesome to work with such a diverse group of people. While the initial submission was made by Microsoft, it’s now completely in Ecma’s control and we’ve had a lot of help from Apple, Barclays Capital, BP, The British Library, Essilor, Intel, Microsoft, NextPage, Novell, Statoil, and Toshiba.

***Note*** Remember that this is just a draft. Some sections of the spec are much further along than others, so keep that in mind while you are looking through the spec. If you are in an area that looks like there isn’t much information, odds are we just haven’t gotten to that yet.

While I’m sure we’ll be able to spend the next several months talking about all this, some of the big things I wanted to point out are:

  1. Public feedback – While the Ecma organization is completely open and anyone can join, I understand that some people just aren’t able to make that commitment. That’s why I was really excited that we have a mechanism set up now so that anyone can give feedback on the spec: ecmatc45feedback@ecma-international.org
  2. Technical discussion – If you are looking for technical discussions around the formats, you can also go to the openxmldeveloper.org site where there is a forum for a wide range of technical issues for developers who want to implement the formats.
  3. Navigating the PDF – The PDF file was actually generated using Word 2007. Bring up the Bookmark pane and you can easily navigate through the document structure (it’s over 4000 pages, so that helps a lot!). You will also notice that in the reference sections, you can easily navigate through element and type reference just by clicking on the section number next to the element or type’s name.
  4. Spreadsheet Formulas – Check out 15.5 (starts on page 247). There are about 160 pages of content describing the formula syntax and about 360 different functions. You’ll notice that there is still a ways to go, but this is already a huge amount of really useful information.
  5. Depth of documentation – I know we’ve said this a million times, but this is a huge project. Migrating all the existing Office documents into an Open XML format and then providing full documentation is a ton of work. Many people don’t realize how large these applications are, and how much there really is to cover. If you want an example, download the spec and look a the documentation for the simple type “ST_Border” which starts on page 1617 (it’s in the WordprocessingML reference section under simple types). That shows a list of almost 200 legacy border patterns that you can apply to objects in a Word document. Tristan Davis, the Word representative on the Technical Committee, had to wok on every single one of those and provide images so anyone else could reproduce them. He created almost 200 documents, took screenshots of each one, and then provided the description and image representation in the spec. This format is 100% compatible with the existing base of Microsoft Office documents, so nobody will need to worry about losing features, even if it’s the “Maple Muffins” border style (page 1643) πŸ™‚
    1. Want some more depth? – Check out section 14.5 starting on page 135

I’m so excited right now, I’m really rushing just to get this blog post out. I can’t wait to hear from people about what kinds of questions they have, or what they hope to do with the formats. We’ve going to have a lot of fun over the coming months (especially once Beta 2 is out the door and everyone can start to experiment with the files). More information to come, but that’s it for now.


Comments (48)

  1. The Open XML document standard is progressing through [ECMA]. I think

    Microsoft using [ECMA] for…

  2. Adam says:

    Ummm…..what version numbering are you guys using here? How come this isn’t version 0.x?

    When you get the first version of the spec finished, what version do you expect that to be?

    /me is confused.

  3. Version 1.3 of the Ecma spec has been published as of today.&amp;nbsp; It’s a huge document, with lots of…

  4. BrianJones says:

    Actually Adam, it’s officially "WD 1.3" meaning is the 1.3 version of the working draft.

    Once it finished and approved by Ecma, it would be version 1.0 of the Ecma standard.

    From there it would to go ISO and if there are any changes suggested it would be version 1.x of the Ecma standard and 1.0 of the ISO standard.


  5. Alex says:

    It would be better if the page numbers matched up. Page 1 doesn’t occur until ~93 pages in due to the contents, and it makes using the contents to look stuff up pretty hard. People surely aren’t going to use this in dead tree format….

  6. Adam says:

    Hmmm…..pity it’s so open for misinterpretation. The [trackback?] above your comment claims it’s "Version 1.3 of the Ecma spec".

    Whose designation was "1.3"? I can’t quite figure that out from your reply.

    By analogy with Ecma/ISO numbering, it sounds like "1.3" is someone else’s (Microsoft’s?) designation, and ISO 1.0 will be Ecma 1.x, which will in turn be 1.x+y of the original line, where 1.y was the WD that became Ecma 1.0.


    Aaaaaargh! And I though C++ programming was hard. My brains are trying to escape out of my ears! πŸ™‚

  7. JayV says:

     Tuesday’s blog discussed Office 2007 performance, but I noticed that section properties for Word are still output at the end of a section according to Draft 1.3 of the ECMA docs.  A SAX parser would normally be great for performance because you can parse tags immediately as they are encountered, but the location of the section properties forces a consumer to read the entire section’s content before knowing how to display the first page correctly.  This eliminates the benefit gained by the SAX parser since it delays the initial display of a document (especially for huge files that are composed of only one section).  Is there a reason why it was decided to put the section properties at the end of a section?  It seems like it would make more sense to start the section off with the section properties.

  8. BrianJones says:

    Adam, sorry about the confusion πŸ™‚

    Within the Ecma TC45, we have produced 3 versions of the working draft so far. We agreed last month that it would be great if we made the next draft public so that we could let people outside the committee see what we’d been up to. There is not requirement within Ecma for the committee to publish working drafts, but the members of TC45 decided that it would be worthwhile.

    Once Ecma approves it as a standard, then the working draft numbering won’t really be relevant.

    JayV, that’s actually a great question. There are a number of reasons for the location of the section properties. You are right that there would be advantages to putting them at the beginning of the section, but those benefits actually started to go away as you deal with the fact that a section break can be located in just about any location. I think a blog post specifically focused on sections would be worthwhile. I’ll add it to the list (a list that seems to just keep getting bigger) πŸ™‚


  9. Brad Corbin says:

    Wow, 4000 pages and only 25 MB? Not too shabby.

    Concerning "Maple Muffins" and other border styles:

    Why are you "hard-coding" 200 specific enumerations by name, instead of simply providing a method to store a small image in the document that could be then programatically tiled to encircle the page?

    Sure, for some types, you’d need a "top" image separate from a "corner" or "left", but that still more flexible than hard-coding these types.

    You’d save about 60 pages of spec document!

    The tradeoff, of course, is that documents containing these types of borders would be *slightly* larger than in the current spec, but the flexibility advantage is significant.

    I guess the real question is: WHO USES THESE BORDERS?? And when was the last time someone used them? 1998?

    (I know, full fidelity, etc…)

  10. You have me bustin’ a gut, Brad πŸ˜€

    I do wonder the same thing, but also have come to discover that there are some areas of any given specification, application, or any implementation of any "tion" what-so-ever that what you and I may find as dated fashion of folks who don’t even understand what the word fashion means, much less how to implement it properly, to them is something they simply are not concerned with, nor are they with our opinions as well.  To them, they want to make sure there borders from 1998 look the same in 2098 and beyond.

    I agree… the choices of style can defy what seems like reasonable justification to wonder what on earth they were thinking… To them, they worry more about consistancy, and focus more on what they feel to be most important, which in most cases is content, but in some cases can, in fact, be the borders themselves that they just can’t live without.

    Such is the wonderful ways of the world we call home πŸ™‚

    That said, what my opinions might be, and what the reality actually is, are more than likely at odds with one another.  Sometimes I get lucky, but luck and I have a love/hate relationship, so when I do get lucky, I tend to bunker down for whatever storm is brewin’ thats about to strike wherever it is I happen to be when it happens.

    As luck would have it… thats usually the way it is πŸ™‚

  11. BrianJones says:

    πŸ™‚ I hear you Brad. That’s why I pointed it out as an example of how much work it is to document these things. There are so many features and we need to represent them all in XML, and document it all.

    We could have done as you suggest and just use resource files for them, but that would have actually been more work in the applications. I don’t really think it was worth an investment to make the borders more extensible. We don’t really have any customers asking us for that. If at some point there does appear to be a need for it though, I’m sure that TC45 could look at adding it into a future version of the standard.


  12. orcmid says:

    Great news!  I’ve been waiting for this too, especially for checking out my favorite bits on the packaging.  So, now you know the kind of technogeek I am, I care more about the packaging concept and its reuse than all of those hairy details that content involves.  Heh.

    It would have been cool to Zip the PDF as was done with the initial Working Draft.  It makes for a nice comparison with the DOCX, though.

  13. Hi Brian,

    first of thanks for the nice error detection in Word! I couldn’t resist and tried opening the .docx in Word 2007 B1TR.

    Just flipping through the spec, I finally understand why other companies always never fully managed to duplicate the Office binary formats: there are just way too complicated to reverse engineer.

    I agree with the other poster that you should fix the page numbering so that the first page number starts on the very first page of the document and are not in the version that would be more correct for publishing purposes. It’s annoying to not be able to just go to the table of contents, find what I want and type in that page number into Adobe.

    Patrick Schmid

  14. orcmid says:

    Funny.  The *.docx is delivered via the ECMA web server as a Zip file type, or that is at least what IE6 wants to save it as.  I played along, looked at it with WinZip, then renamed it to .docx.

    I must admit, this is exciting.  The expansion of the schemas looks great too, so I guess I can’t stay that rusty with XSD that much longer.

  15. Adam says:

    I’m with Brad. I’ve heard discussion, and M.’s comment about using MOOX in 2098 implies I’m not the only one, that some people are looking to use either MOOX or ODF as an archival format that will still be widely supported, at least for reading, for 100 years or so.

    You want to make everyone who supports reading MOOX implement those 200 specific borders for the next century? Is there even a single independent implentation that does that at the moment? AFAIK, the best alternative reader for Word documents at the moment is OpenOffice.org. How does that cope with those borders so far?

    M. said "To them, they want to make sure there borders from 1998 look the same in 2098 and beyond.".

    You said "We could have done as you suggest and just use resource files for them, but that would have actually been more work in the applications."

    I call baloney on M.’s statment as _no-one_ uses MOOX at the moment, there are no legacy borders to support in it. And I call baloney on your statement as well, as for _most_ applications written between now and then, adding specific support for these 200 borders would, IMO, be more work for less gain than adding support for a more generic border definition in the document. (As Brad suggested, allowing the document to specify images for "top left", "top right", etc…, and repeating images for "top", "left", … in either SVG or raster (BMP, GIF, PNG) formats could well allow reproducability with scope for custom borders in the future)

    Yes, people have legacy documents, _but none of them are in MOOX yet_. The only legacy document types with "Maple Muffins" borders are those that happen to have been created with some previous version of Word or other. So, yes, Word 2007 needs to be able to read them. And, yes, Word 2007 probably needs to be able to save those borders *somehow* in MOOX, if only for MS’s Marketing/PR purposes.

    But _only_ Word 2007 (+?) needs to be able to do that conversion from legacy Word documents to MOOX with 100% fidelity. Having a more generic spec will be more useful and less work to _all_ the other bits of office software written over the next 100 years by anyone who isn’t Microsoft.

    For a format that’s going to last a century, I don’t think that copying Word ’95’s *implementation* of borders in a generic, _standardised_ file format is necessarily a wise thing.

    In a previous post, you mentioned in answer to a question about who else had input into the spec process, that you had "partners and MVPs" having input into the spec. IMO, _this_ is the kind of thing where you need other groups who are producing independent implementations to have input. Yes, having partners and MVPs who are making sure they can read the actual text out of MOOX, and manipulate it with XSLT, and put it onto a website, etc…, is really useful. But how many groups people have tried to re-implement this particular bit of spec? Have the OpenOffice.org folks, or *anyone* other than MS _who are writing Office software_, had a chance to say "This is a stupid thing to put into an application-neutral document format"?

  16. gwb says:

    This whole MS Office Open XML is such a collosal waste of time.  There is already an ISO standard for this purpose — ODF — that fits 99.999% of the needs of the world.  To introduce another is just — well — silly.  No one cares.  

    It is tempting to compare this to the VHS/Betamax wars and we all know how that one turned out.  Betamax was (and still is) technically superior, yet lost out in the marketplace and is relegated to high-end, professional use (some call it fringe) today.  However, there are several differences:

    1. The video formats came to market around the same time.  OOXML is at least a year behind ODF and will forever be playing catch-up.  Ironically, I can read and write ODF from MS-word (any version) today thanks to the ODF plugin, but there is not a since MS product on the market today tbat can handle OOXML.

    2. Betamax is technically superior to VHS but it’s the other way around in the OOXML/ODF comparison.  OOXML commits a cardinal sin by mixing the message with the medium whereas ODF maintains strict separation (bewteen what the tags are and what they do).

    3. ODF is freely available to be implemented by anyone, including Microsoft.   OOXML cannot be combined with some licenses, including the GPL, severely limiting its chances for ubiquity.  We will not see OOXML supported by OpenOffice, KOffice, Abiword, etc. for just this reason.

    OOXML is not even out of the starting gate yet, but the race is over.  It’s time for Microsoft to forget it.

  17. BrianJones says:

    Adam, it’s jut not as simple as that. There are cases all over the place where people choose to use enumerations vs. resources. This is especially the case if there is no demand to increase the number of borders. You say that it’s harder for a consumer if they have to know about the different enumerations. That may be true, but it’s just as difficult for the producer if the approach was to use resource files. It means that rather than writing a few characters to declare it, you instead need to store glyphs so you can embed them in files you create. There are billions and billions of documents our there that were created in Microsoft Office. We are taking on the task of migrating all those into an open XML format that will fully preserve everything. ODF does not do this, Open XML does. ODF doesn’t even specify how spreadsheet formulas should be stored. Come on! That’s one of the most crucial pieces of any spreadsheet, and it’s just left unspecified.

    gwb, have you been paying attention to this area much, or are you just now getting started? I ask because you made a number of statements that seem to imply that you don’t know the history.

    1. The Open XML formats are based on XML formats that started in Office 2000 (where development started around 1997 or so), and have continued to evolve to the point where we are. There were already two XML file formats (SpreadsheetML and WordprocessingML) well before OASIS standardized ODF, and certainly well before it was ever submitted for an ISO standard. We had documentation and a royalty free license around the formats that was available for anyone. We were then told by governments that they would like us to submit our format to a standards body, so last November we submitted to the format to Ecma international and they now own it.

    2. Are  you talking about this in regards to formatting? I’m sure you don’t mean how customer XML is used because that isn’t supported in ODF. We actually do have data view seperation because of our customer defined schema support. If you are talking about formatting, this is pretty lame. Any style that is applied to text is stored separately and the text then references that style. If a style is not used however and direct formatting is applied, than those formatting properties are stored on the text run itself.

    3. In November we moved to a covenant not to sue, which essentially said that you can do whatever you want with our formats, as long as you don’t try to sue us for what’s in the format. You don’t need attribution or anything. While there has been a ton of FUD spread by the people pushing for ODF in this area, most open source folks I’ve talked to really like this new approach. OpenOffice already supports Word 2003’s XML format, and we have people from Novell who work directly on the OpenOffice project working with us to help design this standard. There is already a prototype implementation of opening and saving files in the Open XML format for Gnumeric which is an open source spreadsheet application: http://gnumeric.org


  18. BrianJones says:

    Hey Patrick, I’ll talk with the editor about the page number thing. You’re right that there is a lot of content. That’s why we really want to make sure that the spec is constructed in such a way that people can choose to just use parts of it if they want and not the whole thing. It’s really up to the implementer to decide what level of support they want to build. They may choose to ignore things (like the borders for instance :-).

    Dennis, it looks like the ecma web server doesn’t have the proper MIME type associated with the .docx extension. So as a result, the server just returns this:

    HTTP/1.1 200 OK

    Proxy-Connection: Keep-Alive

    Connection: Keep-Alive

    Content-Length: 6976940

    Via: 1.1 RED-PRXY-03

    Date: Thu, 18 May 2006 22:36:18 GMT

    Content-Type: text/plain; charset=ISO-8859-1

    ETag: "d708af-6a75ac-dafe3380"

    Server: Apache

    Last-Modified: Thu, 18 May 2006 13:17:18 GMT

    Accept-Ranges: bytes

    Keep-Alive: timeout=15, max=100

    Content-Language: en

    See how the content type is just "text/plain"? So your browser probably sniffs it to figure out what it is and determines that it’s a ZIP πŸ™‚


  19. Lars Hansen says:

    "That shows a list of almost 200 legacy border patterns that you can apply to objects in a Word document. "

    "This format is 100% compatible with the existing base of Microsoft Office documents, so nobody will need to worry about losing features, even if it’s the "Maple Muffins" border style (page 1643) :-)"

    As a format for Microsoft Office 2007 its probably fine and dandy to include all this legacy information. But seriously, didn´t you ever consider that this would make your open format overly tedious and complex? Did you you ever consider that as a standard, simplicity might be preferable, compared to 100% backwards compatibility?

    "3. In November we moved to a covenant not to sue, which essentially said that you can do whatever you want with our formats, as long as you don’t try to sue us for what’s in the format."

    To what extent does this include opensource use?

    I am not trying to be hostile, but I am honestly doubting the usefulness of this format as an open standard.

  20. BrianJones says:

    Hey Lars, no need to worry about sounding hostile. Those are great questions.

    I love XML, and I would have loved to have been able to start from scratch to build a new clean XML format without having to worry about backwards compatibility. That’s the sacrifice you have to make though when you have an existing customer base. The reality is that there is no point in building a format if people aren’t going to use it (I’m talking about end users, not developers). Look at SpreadsheetML from Office XP, it was pretty clean and verbose, but there was no way anyone would have used it as their default format. It didn’t support all their features.

    If we didn’t build a format that supported all the existing documents out there, than the majority of our customers would not want to use it. This would have made all the work of moving to XML pointless. We need to migrate all those existing documents and existing users into the new world, and we need to make it as easy as possible.

    What we are doing is really tough. It’s been years and years of development and has stretched out over multiple releases. There were a few folks early on who believed we could get to this point, and now we’re almost. Those of us who’ve gone through it all are finally starting to see the light at the end of the tunnel (and please no jokes about it being a train) πŸ™‚

    The covenant not to sue is non-discriminatory. The covenant currently applied to the Office 2003 XML schemas and we’ve also commited publicly to providing the same thing for the Ecma schemas. It says:

    "Microsoft irrevocably covenants that it will not seek to enforce any of its patent claims necessary to conform to the technical specifications for the Microsoft Office 2003 XML Reference Schemas posted at http://msdn.microsoft.com/office/understanding/xmloffice/default.aspx (the "Specifications") against those conforming parts of software products. This covenant shall not apply with respect to any person or entity that asserts, threatens or seeks at any time to enforce a patent right or rights against Microsoft or any of its affiliates relating to any conforming implementation of the Specifications.

    This statement is not an assurance either (i) that any of Microsoft’s issued patent claims cover a conforming implementation of the Specifications or are enforceable, or (ii) that such an implementation would not infringe patents or other intellectual property rights of any third party.

    No other rights except those expressly stated in this covenant shall be deemed granted, waived or received by implication, or estoppel, or otherwise. In particular, no rights in the Microsoft Office product, including its features and capabilities, are hereby granted except as expressly set forth in the Specifications."

    There are already opensource solutoins out there today that support our XML formats. Go look for yourself. I know there has been a huge FUD campaign around our formats from some of the folks that have invested significantly in ODF, but hopefully you’ll see that it’s really just a bunch of misinformation. There have been a number of folks in the open source community that have looked at the CNS and been extremely positive. OpenOffice today already supports opening the Word 2003 XML format. Gnumeric has a prototype implementation of reading and writing SpreadsheetML.


  21. Adam says:

    Brian> "There are billions and billions of documents our there that were created in Microsoft Office. We are taking on the task of migrating all those into an open XML format that will fully preserve everything."

    OK, there’s a fundamental question here that I want to ask:

    Why is MS sending MOOX to Ecma/ISO?

    (Warning: long post follows. Sorry. It didn’t start out this long (Duh! πŸ™‚ but just kind of evolved. As with all posts, only moreso this time, please feel free to ignore it. Or at least brusquely direct me to a FAQ. Google revealed little when presented with this question, but it may have been asked before. Blast! Now my *warning* is getting overly long. I have to learn brevity at some point…)

    If the answer is "because our customers want a standard." (or "So we can tick boxes on customers’ requirements documents", or "For marketing purposes", which are all the same answer) that’s not good enough. In that case, the question becomes:

    Why do MS’ customers want an Ecma/ISO standardised document format?

    If the answer to that is "So they can manipulate the documents themselves programatically, ideally with tools like XSLT and produce summaries and create web pages from their documents and all that kind of stuff." then that’s not good enough either. All MS would need to do for that would be to document the format and possibly provide an API that doesn’t require loading Word/Office. _You don’t need Ecma/ISO standardisation for that._

    (Loading Word to play with Word files has all kinds of automation issues that you don’t want to deal with if the manipulation is going to be happening on an unattended server.)

    Really, I’m not kidding, and I’d like to know the answer. Why is MS sending MOOX to Ecma/ISO?

    I know why *I* want a standardised document format. I realise that I do not talk for everyone who wants a standardised document format when I explain this, and that’s OK. But neither do I think I am alone. Let me continue…

    I want to move away from the 3(?) different legacy Word formats, legacy WordPerfect formats (which I believe are still used a fair amount in the legal profession, although please correct me if this belief is misplaced), AmiPro documents, MS Write documents, RTF documents, legacy KWord documents, etc., etc., etc…

    I want a document format that can be used as a first-class document format on Windows machines, Macs, Linux boxen, Solaris workstations, and JVMs. I want it supported by as many different manufacturers as possible so that this multi-platform support is available – I don’t see MS releasing Word for Solaris anytime soon, and while the KDE folk are hoping to have KDE4 running on Windows, I’m not holding my breath – and that people on any system can create, read and write documents that can be read and modified by people on any other system without caring _at all_ about where the document originated or where it’s going next.

    And I want the format widely supported enough that it’s going to be _viable_ as a "main" format for at least the next 20 years, and that I won’t have to care about my documents in 10 years time if I decide to move my entire operation to a Mac.

    As part of this, I realise that 100% fidelity when upgrading *all* my current document formats to "the standard" may not be possible.

    You say: "There are billions and billions of documents our there that were created in Microsoft Office. We are taking on the task of migrating all those into an open XML format that will fully preserve everything."

    So, what about the millions of WordPerfect files out there? Do they get 100% conversion? Why not? Just because they’re not Word files?

    It may surprise you, but I don’t care about *everything* in my document. I don’t care if the borders aren’t perfect. I don’t care if things get a bit repaginated. I don’t care if Comic Sans MS isn’t available on Linux, and Bitstream Vera Sans will have to do, even things get moved around a bit because the kerning is all different. I don’t care if when moving from XHTML+CSS to the new format the "h1 { font-size: 200%; }" rule that means that whenever I change the default font size the level 1 headings are always double that and don’t need re-adjustment gets dropped.

    All of that is in danger of happening _anyway_ at the moment every ~3 years, everytime I’m forced to upgrade my version of office to read the documents produced by everyone else with the new version. Yes, MS are pretty good at maintaining fidelity, but they’re not 100% perfect. But it’s OK, as the few thing that do mess up a tiny bit are things like pagination. And hey, most of those things are document defaults anyway, and if there were different to start with then I wouldn’t have changed them to what they are now.

    I care about the _content_ of my documents. That’s what you need to preserve. Text, tables, figures, lists.

    Keeping track of headings is good (Although most word documents don’t have headings, they just have "normal + 24pt bold" paragraphs, but I digress…) as is formatting. Keeping track of _which_ bullet mark I’ve used is _much_ less important than keeping track of the fact that a paragraph is part of a bulleted list.

    Content, semantics == important. Style == less important.

    That’s it.

    Like I said before, I realise that not 100% of everyone will agree with this. There will be some people who are using Word to do things like create flyers, despite there being more appropriate tools they could use for that particular job. Arguably though, they don’t need an Ecma/ISO standardised format to do that.

    Like I also said, I don’t think I’m alone. I think that a lot of the people who want a standardised format are going to be people who want to move from a large number of disparate systems that don’t talk well to each other, to a few that do.

    And for that, I think that many of those people will understand that 100% compatible conversions from _all_ their current formats might be impractical, and prefer a solution that gets them merely "very good" conversions. Especially if the _content_ can be 100% converted, and the presentational features that get dropped from some of the arcane formats they were using might cause pain in reading those documents 25 years from now if they were left in.

    And, I think it’s those people that _really_, _really_ want the Ecma/ISO standardisation.

    So: Why is MS sending MOOX to Ecma/ISO? What problems does Ecma/ISO standardisation solve that also _requires_ a 100% upgrade path for "Maple Muffins" borders, but _does not require_ a 100% upgrade path for document formats other than Word?

    I think this is probably the biggest cause of me (and possibly a lot of other people) going "WTF?!?" at some of the stuff that comes out as part of this process.

  22. BrianJones says:

    Hey Adam, thanks for taking the time to get all your thoughts down. It definitely has helped me understand where you are coming from.

    It sounds like you understand that from our point of view, in order to use an XML format as the *default* format for Office it needs to be 100% compatible right? I think you’re point is more that we should also have an optional format that is more basic and doesn’t necessarily have 100% of the features covered. That smaller more basic format would then be the one that should be standardized. I think that’s what you are saying.

    Based on your description, the format you desire sound a lot like HTML. HTML is a great format for basic interchange. It doesn’t support everything that is present in an Office document, but as you said, that isn’t always desirable. We’ve supported HTML for quite awhile, although we took the approach of trying to have our cake and eat it to when we attempted to make our HTML output support 100% of our features. The result was an HTML format that had a ton of extra stuff in it that many of the people who just wanted HTML didn’t really care about (and it just got in the way).

    Our primary goal this release with the formats was not to try and re-implement HTML, but instead to move everyone over to using XML for all of their documents. Let’s talk about the motivations for what we are doing with Open XML since that was the main point of your question:

    1. The reason we’ve spent the past 8 or so years moving out formats toward a default XML format is that we wanted to improve the value and significance of Office documents. We wanted Office documents to play an important role in business process where they couldn’t before. We wanted to make it easier for developers to build solutions that produce and consume Office documents. There are other advantages too, but the main thing is that Office documents are much more valuable in just about every way when they are open and accessible.

    2. The reason we fully document them is the exact same. We need developers to understand how to program against them. Without the full documentation, then we don’t achieve any of our goals I stated above. The only benefit would be that other Microsoft products could potentially interact with the documents better (like SQL or SharePoint), but that doesn’t give us the broad exposure we want. That would be selling ourselves short. We want as many solutions/platforms/developers/products as possible to be able to work with our files.

    3. The reason we moved to the "Covenant not to sue" was that a number of people out there were concerned that our royalty free license approach wasn’t compatible with open source licenses. Again, since the whole reason for opening the files was to broaden the scenarios and solutions where Office documents could play a role, we moved to the CNS so that we could integrate with that many more systems. Initially we’d thought the royalty free license just about covered it, but there was enough public concern out there that that we decided we needed to make it even more basic and straightforward. We committed to not enforce any of our IP in the formats against anyone, as long as they didn’t try to enforce IP against us in the same area. No license needed, no attribution, we just made a legal commitment.

    4. The reason we’ve taken the formats to Ecma for standardization is that it appeared that a number potential solution builders were concerned that if we owned the formats and had full control, we could change them on a whim and break their solutions. We also had significant requests from governments who also wanted to make sure that the formats were standardized and no longer owned by Microsoft.  Long term archive-ability was really important and they wanted to know that even if Microsoft went away, there would still be access to the formats. We were already planning on fully documenting them, but the Ecma standardization process gave us the advantage of going through a well established formal process for ensuring that the formats are fully interoperable and fully documented. It’s drawn a lot more attention to the documentation as well so I’m sure we’ll get much better input, even from folks who aren’t participating directly in the process.

    I hope that helps to clear it up a bit. It really is just as simple as that. Any application is free to implement as little or as much of the format as they wish. If you really want every application operating on a more limited set of features, that isn’t as much of a format thing as an application thing. You would need to get every application to agree that it will not add any new features or functionality, and will disable any existing functionality that the other applications don’t have. That wasn’t our goal. Our goal was to open up all the existing documents out there, and then anyone who wants to build solutions around those formats is free to do so. In addition, anyone is free to innovate on top of the formats, as I believe there is still a lot of innovation to come. The formats are completely extensible, so if someone wants to use the formats (or parts of the formats) as a base and build on top of that, they can do so as well. They can even join Ecma if they want and propose to add those new extensions to the next version of the standard.


  23. Biff says:

    Adam, I’ll lend you a hand and maybe the next time a round you will not miss a boat by a mile πŸ˜‰ Here: Office 2007 file format must persist *all* Office 2003 and earlier format features, but none of other suites or applications have to do the same. That means the standard has to be complex, yet there is no need to support all of it, not for read, not for write, – a reasonable subset will do.

    Ok, that was my best try and I’ll let Brian correct it as he sees fit.

  24. carlos says:

    I think that Biff and Adam largely have it right.  It is clear that a little bit of thought and intelligent discussion up front would have culled most of the needless detail from the proposed spec.  Part of the problem is that (I imagine) each of the employees dilligently working a section of this draft have a vested interest in keeping their section going, not in eliminating it.  Further, as a fully funded project within MS, more employees can glom onto a steady paycheck.  Thus the group becomes a positive feedback device with no one really pushing for moderation or the removal or reduction of sections.

    Besides that, Brian Jones is no small potato at MS and all the employees in his vicinity are going to do their darndest to meet his requirement of "100% Compatibility."  It just doesn’t help one’s career to oppose someone so near the top.  Thus the problem with this (working draft) document seems to rest squarely on the shoulders of Brian Jones as its head honcho.  Whether it was his bad idea or he didn’t push back on what he thought a bad idea, still his fault.

  25. Francis says:

    Why is MS sending it to ECMA? I would wager as a) to harness the power of unpaid (open-source) programmers and b) legitimation.

    a) Windows already contains a good deal of open-source code. When others do the work, it saves MS development time and money. If somebody invents a new use/feature for Open XML, MS will be free to bundle it into Office.

    b) Lots of people are jumping ship in favor of OpenOffice, often because of the purported superiority of "open" standards. (Unfortunately, they are confusing ideology with technology.) This is bad for MS Office users: the advantage they enjoyed because of their file format is evaporating (now I am in the minority with MS Word and not being able to read OO files.)

    In the end, I fear the new standards will be impossible to implement. Look at the much-simpler HTML: NO browser renders it 100% correctly. MS will get it right, however. For them, it’s merely new file coding, not a feature set.

  26. As nice as clean and lean spec would be, Microsoft Office would be out of business if it could not provide 100% upgrade fidelity with the new file formats. Imagine an organization deciding whether to upgrade their files to the new file format or not. If the people testing the upgrade (no serious administrator would take MS statement that the upgrade is flawless at face value without testing) discover just ONE single thing that didn’t get upgraded correctly, that might be it for the new file format in that organization. After all, it is human nature to assume that if one thing doesn’t work correctly, who is to say that something else, much more important, doesn’t work either? Therefore Microsoft has to guarantee 100% fidelity.

    The standard might get bloated because of that, but as Brian pointed out, you don’t have to implement it all. If you are implementing it, you can look at the standard and decide that going with 20 border styles you think are the most common is sufficient for your application. No one will force you to implement the other 180 (except maybe your user base).

    In the whole discussion about ODF and backwards compatibility for current binary formats from different Office programs, we should not be kidding ourselves about reality either. Whether we like it or not, the Office 97-2003 binary formats are the de facto standard worldwide for Office file formats. I don’t have the numbers of Microsoft’s competitors, but I am sure that the number of user who don’t use MS Office pales in comparison to the 400 million MS Office users. I’d guess that probably 80-90% of all of the world’s spreadsheets, presentations and documents are in the MS file formats. That the company which controls these file formats is offering a losless upgrade path for all these documents to a free, fully documented and open standard is in my opinion an effort from which we all can only benefit.

    If you hate Microsoft, then think about the following: One of the biggest reasons that is always brought forth to explain the dominance of MS Office is the file format issue. MS managed to establish a sizable market and then managed to keep (and grow it) by making it quite hard for others to compete with them, because their documents were never fully interchangable with MS Office. With these new XML formats, Microsoft is giving that competitive edge away. The competition between Office products is from now on no longer going to be about who implements the file format that most of the existing documents are in the best, but rather who offers the better features and usability. I think as consumers of Office products, we can only benefit from that shift in the competition.

  27. Biff says:

    Patrick FTW! πŸ™‚

    One reason ODF zealots hate Open XML guts so much is because it takes their favorite toy argument away. I predict 2007 is going to be a very sad year for zealots.

  28. Hi Adam,

    > Why is MS sending MOOX to Ecma/ISO?

    I think you already answered the question: The customers of MS are demanding it. Why are they demanding it?

    Simply put, by getting MOOX standardized, the file format won’t be subject to the whim of a single company (Microsoft). By getting it standardized, the file format also won’t be change frequently (every 2 years with a new Office version), because standards bodies move *very* slowly. By asking MS to send MOOX to Exma/ISO, MS’s customers are demanding asking for predicatability and long-term security of their investments. By having a standard, business can calculate the return on investment they’ll get from MOOX and hence can make a sound and stable business case why they should invest in it. What kind of investments are they going to make though? Let me first remind you that the main customer base for MS Office are businesses, not consumers.

    If you look at your typical company today, you’ll see an assortment of IT systems that are to varying degrees integrated with each other. Payroll might be a linked with accounts payable/receivable. An ERP system might connect to all other systems, etc. Also, the company will try to have its IT as integrated as possible with its suppliers and customers. Why? Integrating IT systems, whether internal ones or external ones, saves money (less manual work e.g.), reduces errors and speeds up processes. So information integration is good business for businesses.

    However in this landscape of integrated systems, you have an island of crucial systems that just don’t want to integrate: spreadsheets, presentations, documents. Lots of crucial business information is contained and maintained there. I remember a case at a client’s where a Excel spreadsheets contained the detailed information about transactions, whereas the massive assortment of multi-million dollar back-end systems (SAP was part of them) only contained dummy records for them. At the end of each month, the company had to spend quite some money to manually correct the balance sheet and income statement with the data from those Excel sheets (If I remember correctly, the corrections were ironically done in another Excel sheet…). The company had no other option, because the Excel sheets were not accessible to the back-end systems.

    Now picture the situation with an open XML format. Suddenly, all those isolated islands of presentations, spreadsheets and documents can be integrated into the workflow processes of existing systems. Doing that will save a lot of money, and hence companies will want to do it, if they can be assured their investment won’t be a waste of money. Hence an Ecma/ISO standard would go a long way for them (everyone still remembers that 97 had new binary file formats and the nightmare that created).

    So in my opinion, open file formats are good businesses for MS main customers. And what is good business for them, is definitely also good business for MS.

  29. Biff says:

    "the file format also won’t be change frequently (every 2 years with a new Office version)"

    Which format are you talking about? Take Excel – Office 97, 2000, XP and 2003 all used the same primary format. This is 11 years window until Office 2007 comes out.

    I agree wholeheartedly with the rest of the comment.

  30. Adam says:

    Patrick: Thanks for the reply. Has certainly given me a couple of things to think about. Just a couple of points:

    "The standard might get bloated because of that, but as Brian pointed out, you don’t have to implement it all. If you are implementing it, you can look at the standard and decide that going with 20 border styles you think are the most common is sufficient for your application."

    I don’t think that’s realistic. If you’re looking to implement an Office application, people will be looking at how well it works with "the standard file format", and I don’t think that "partial support" is going to cut it.

    If you’ve got an organisation producing, say, web sites and doing hosting, etc…, and you’re looking to use MOOX as the document format for writing internal specs, authoring end-user documentation, passing round newsletters, etc., etc., etc…. you might want pretty heterogenous network. You business & accounting people may need to use windows, your PR, marketing & graphic design people may want to use Macs, and your server admins may want to use Linux on their desktops.

    A *standard" file format should allow everyone to do that, and to create and exchange documents on an equal footing.

    However, if OpenOffice & KWord only support 10% of the borders in the spec, because it’s "optional", the suits could look at that and say "Only Word supports MOOX properly, therefore everyone must have Windows desktops." – which almost completely destroys the point of having a "standard" file format.

    "The standard might get bloated […]"

    If creating a standardised file format, that people have said they want to be able to use for years to come, is not a good time to drop all the crufty corner-cases, _when will be_?

    Come on. As an industry, we’ve got about 20 years of experience with creating file formats. Lets use it. Over the last 10 years of MS Office, I’m sure there have been some things that got into the file formats that either turned out to just be pointless, or were special cases of something that we could do a better job of with hindsight. Make them more generic.

    Do you really want to have to support every unforunate decision made in the last 10 years for the next 50? Or, if we continue with _never_ taking out the occasional misfeature, forever?

    Sometimes, you _should_ break backwards compatibility in order to move forward more effectively. Why is now, during _this_ change, when we’re specifically looking _forward_ to extensibility and interoperability, not a really good time to do so?

  31. Hi Adam,

    from a technical point of view I totally agree. We really could afford breaking some backwards compatibility in order to get a better standard moving forward.

    From a user/business point of view, we can’t afford it though. If, 5 years frm now, I need to send a Word document written in 1995 to someone using MOOX, all I want to do is hit a convert button and attach the file to an email. I just wouldn’t want to check whether the file was converted correctly. I really just would want it to work. Every single Office user has been taught over the years that no matter what, his or her documents will look fine in any new version of Office. Microsoft essentially made a promise to its users on that, and this is a promise that can never be broken.

    I read a great article a while ago that argued that Microsoft is a synonym for never-ending backwards compatibility and that the company will do everything it can to ensure that for decades to come, even to its own detriment. The article talked about this in relation to Windows Vista and how it still supports running programs written for the very first version of the first operating system MS ever produced and how this huge backwards-compatibility effort was affecting Vista’s (forward) development all across the board.

    I agree with you that if you are OpenOffice & KWord, you’ll implement 100% of the standard (and I expect them to do so within a year). But, if you are just a software company writing stuff to integrate Office into some process workflow, you really can choose to implement only 10%, namely the needed 10%. If you were to count all the software that implemeneted MOOX and uses it 2 years from now, I’d think that the handful of Office products (there aren’t really more out there) are number-wise a lot less than all the other implementations.

  32. Biff says:

    What Adam repeatedly misses is: for every wannabe Office in development there are thousands upon thousands of projects, many internal and of limited scope, which need to generate simple documents that Office can read; or load and modify libraries of existing documents; or simply load and tag a bunch of documents. In addition there are projects, take Gnumeric, that understand that they cannot provide all the functionality of their MSO counterpart and have different goals in mind. Open XML format will benefit them immensly.

  33. BrianJones says:

    Wow, that was wierd. I actually had replied to one of your earlier questions Adam, but the comment was blocked. I just now realized that it never made it up there, so I went into the blog admin site and unblocked it.

    Patrick and Biff both did a great job of helping to explain a number of the "why" questions you had, but you should also check out my response from Friday night: http://blogs.msdn.com/brian_jones/archive/2006/05/18/601150.aspx#602305

    There really is a strong business reason for opening our formats and it has nothing to do with the recent politics. We’ve been working on this for a long time.


  34. orcmid says:

    This is a very interesting thread.  I particularly like the questions and the response that pulls together so many of the considerations summarized by Brian at: http://blogs.msdn.com/brian_jones/archive/2006/05/18/601150.aspx#602305

    This also clears up some things for me around strictly conforming and [not-strictly] conforming applications in the TC Draft 1.3.  I love the attention to conformance and the intention to leave no unspecified elements (although I notice there is now also place to identify all [and hopefully few] behaviors that are left to an implementation — and must be defined for each implementation).

    I foresee a great variety of repurposings and integrations/interchanges that will not require OOX-conforming applications.  (All conforming applications support the full specification, which is a floor for conformance.)  For those arrangements, some sort of profiling will be needed, just as we need profiles for different applications of XML itself.  This will be important for the interchange and repurposing cases that don’t involve the full format.  (Users of ODF will have to deal with this issue too, especially because there is really no "floor" for minimum conformance that I’ve been able to find in the ODF specification.)

    It will be nice when we have our hands on enough bits to see how applications that involve office-productivity software integration can keep things straight and obtain effective interchange in and out of OOX and/or ODF.  When Microsoft Office is used as a component of a specialized application, it would seem that templates can help keep everything within the confines of a schema subset and within the constraints of an application.  It will be interesting to see if there is an interchange form that might evolve for specifying these constrained cases too.

    I’m looking forward to OOX coming to fruition so that legacy applications are secured and we can move on to consider the range of ways the open formats will be used in specialized applications and business processes.  

    Thanks for carrying this on so publicly.

  35. This is shaping up to be a pretty cool month. Last week we finally had the first public draft of Ecma’s…

  36. There has been a great overall reaction to the news last week of Ecma’s first public draft for the Office…

  37. gwb says:

    Brian, sorry for the longish absence.  My concerns regarding the technical aspects of OpenXML are covered better than I can express in the article: Format Comparison Between ODF and MS XML ~ by Carrera, D’Arcus, Eisenberg, which you can read here: http://www.groklaw.net/article.php?story=20051125144611543

    The bottom line: OpenXML is clumsy and requires workarounds to handle common problems and reinvents the wheel for many functions for which comparable (or superior) standards are already available.

    Regarding the availability of Open XML-supporting products (not its earlier incarnations, which are not relevant here), I followed your lead on gnumeric, but could find nothing on that site that mentions Open XML support — prototype or otherwise.  Do you have a more specific URL?

    Regarding licensing problems, a covenant not to sue is commendable, but doesn’t in itself resolve the issues that arise when combining Open XML with GPL’d software.  As I understand it, there is no way for this combination to be done while respecting both licenses.

  38. Doug Mahugh says:

    I recently mentioned on this blog that the Ecma TC45 committee had released Working Draft 1.3 of the…

  39. Ecma has published an updated draft of the spec for the Office Open XML Formats Standard. Here’s a link…

  40. There were a lot of great comments from last week’s announcement about the creation of an open source…