Update on Open XML’s ISO progress


I wanted to provide a bit of an update on how things are going in TC45 as we look through the various National Body comments that came in with the Open XML ballot. The ballot resolution meeting (BRM) is going to be the last week of February, and on January 14th the editor of the spec is tasked with pulling together responses to all the issues that were raised. There are 3522 comments in total, but when you group them into similar buckets it narrows down pretty quickly into a more manageable list… but still pretty impressive!


In TC45, we’ve been hard at work helping to sort through all the issues and come up with good resolutions based on all the feedback. It’s a lot of work, but it’s really progressing well. There were some really good suggestions, and I think we’ll see that this round of review will result in an even better spec than we had at the end of last year.


Just this weekend, we posted the first collection of responses to the 3,522 issues. http://www.ecma-international.org/news/TC45_current_work/First%20group%20of%20662%20proposed%20dispositions%20of%20comments%20posted.htm


There are currently 662 responses, and the plan is to provide updates to this list every few weeks. We still have almost 2 months until the deadline, but given that we have a lot of issues to work though, we thought it would be best to provide the responses earlier than the Jan deadline to allow for more time to discuss the issues.


One thing I was really hoping we’d get to do was provide a public view of the progress, but ISO rules are that the national bodies comments and responses should be kept private, only to be viewed by the other national bodies. I know there have already been some public postings of the comments, but since we want to follow the ISO rules access of the actual responses and the list of original comments will have to be restricted to just the National Bodies. I would think that at some point the access information will get out though, and maybe at that time folks will decide to just open it up and allow everyone to view the progress (I hope).


So far I think we’re doing a pretty good job of doing what the national body is asking for. Most of the comments were accompanied by a proposed resolution, and most of them are great suggestions, so our response back is often that we’ll do exactly what they are asking for.


There is still a long way to go though leading up into the meeting in Geneva. It’s been fun to get back into the swing of things with the other TC45 members though. We had a bit of a break last spring, but have been hard at work since the comments started pouring in over the summer.


-Brian


OpenXMLCommunity.org Quote of the Day:


Alcuadrado S.A. – Colombia


“In today’s world, there is a variety of standards for each technology. And in the document, spreadsheet and presentation physical storage formats it’s the same with ODF, HTML, PDF and Open XML. We consider that’s convenient that we could choose which one depending on the task at hand. OPEN XML should become and ISO standard as a very complete, open and documented standard.”


– Andres Fontan – Chief Architect

Comments (40)

  1. hAl says:

    I guess IBM will have started its team of of scrutinizers to disect the responses and then I would expect Rob to launch a series of critical articles probably starting close to 14 januari and then up to the BRM. A timeframe in which issues he mentiones cannot be dealt with in Ecma responses before the BRM meeting.

    To bad that the responses are not publicly available because being a very interested observer i would have liked to see them.

    But Brian even though you might not be able to share the Ecma responses you might comment on my guessing some of them ?

    * Borderart => moved to annex and in a vector graphics format ?

    * VML => moved to annex ?

    * Legacy compatibility item => either more info on them, moved to annex, substituted by generic solution

    * Spreadsheet dates => added support for ISO (subset) dates, made date_1900 format deprecated ?

    * Bitmask items => identified relation to ISO standard 14496-22 and/or Panose, changed values to decimal or other appropriate values.

    * Alle examples made to validate using XML schema’s ?

    * Spreadsheet bugs as mentioned in http://blogs.msdn.com/brian_jones/archive/2007/07/12/spreadsheet-formula-bugs.aspx all corrected ?

  2. Wu MingShi says:

    "Just this weekend, we posted the first collection of responses to the 3,522 issues."

    Only problem is we are not NB or big guns so have no way to see the content. Why the objection to put the comments in public, especially taking into account that opposition had been accusing OOXML process of not being transparent?

  3. jones206@hotmail.com says:

    hAl,

    That was one of the reasons we wanted to get the comments up there sooner. Hopefully if people disagree with our response it will give us a chance to look into changing it again before Jan 14th.

    Those specific issues listed don’t yet have their responses posted, so you aren’t missing anything yet. πŸ™‚ I think for those types of issues though, I’ll just blog about what our general response is since those are some of the more controversial ones and were raised by a number of different countries.The key thing that the ISO is wanted to keep private is the original National Body comments (I think). If that’s the case, I can still talk about what our replies would be to the ones you mention since they are already so widely discussed.

    ——————–

    Wu MingShi,

    It’s not an ecma decision to keep the site locked down. If it was up to me the site would indeed be public (at least that’s my opinion). It’s actually an ISO decision though, and they have rules in place where the countries submission are supposed to be kept private. It’s a general rule they have around this whole process, and unless they make an exception Ecma needs to stick to those rules.

    -Brian

  4. Jirka Kosek says:

    Hi Brian,

    are ECMA responses really already published? I see only the following files on ECMA site:

    000 txt No Title 0

    004 zip No Title 1385

    003 xls Combined Comments on ISO/IEC DIS 29500 from all MBs 1719

    002 pdf ISO commenting template – Electronic balloting application 225

    001 pdf DIS 29500 Project Editor’s Report: 2007-10-01 488

    None of them contains responses to NB’s comments.

    Jirka

  5. hAl says:

    [quote]The key thing that the ISO is wanted to keep private is the original National Body comments (I think).[/quote]

    I could read those on http://www.dis29500.org if I wanted too…

    I am glad you plan to discuss some of the more interesting issues on this blog.

  6. Andy says:

    You liked the comments that much that you tried to suppress them in the national committees with all means, right? And sure you will provide fixes to all comments your proxies let through in the national committees stuffed by your gold partners…

  7. jones206@hotmail.com says:

    Jirka,

    They are available, but you need to have the password and only National Bodies have access to it.

    ——————–

    hAl,

    Yup, I know it’s already out there on other sites. That’s why I was so dissapointed that we couldn’t make our site public. At the end of the day though Ecma needs to follow the ISO rules, even if other folks out there do not.

    ——————–

    Andy,

    Give me a break man. How about the fact that you have the exact same negative comments (spelling mistakes and all) showing up in a bunch of different countries? IBM had their lobbying and we had ours.

    Now we’re working on actually addressing the issues and we’ll try to make the spec even better. I wish ODF had undergone such scrutiny. They didn’t even have a BRM to address the comments that folks had raised.

    -Brian

  8. Mike Brown says:

    >> I wish ODF had undergone such scrutiny. They

    >> didn’t even have a BRM to address the comments

    >> that folks had raised.

    Come off it, Brian.  You know full well why ODF had no BRM: because it didn’t *need* one!  BRMs are not required for standards that pass the ISO vote unanimously; 23-0 in ODF’s case.  The comments that were generated for ODF – a fraction of your "impressive" count for OOXML – all came from "folks" that had voted Approve With Comments.

    You can pretend that OOXML’s rockier passage through ISO is down to IBM’s blocking tactics all you want, but you can’t hide the truth.  OOXML is simply too broken to become an ISO standard.  The kind of fixing that you’re attempting now should have been done by Ecma before submission for ISO Fastrack.

    And if you seriously do manage to fix all the comments that were raised by the ISO process, then you’ll end up with a spec that will be *very* different to what the one you first submitted.  And to where can you point for reference implementations for this new beast?

    Cheers,

    – Mike

  9. jones206@hotmail.com says:

    Mike,

    Dude, give me a break. πŸ™‚

    Have you tried to implement ODF? As other folks (non MSFT) have mentioned many times before, it has huge gaping holes, tons of unspecified information, and various design flaws. It’s a fine format overall, but it does have flaws, just like Open XML.

    I’ve never said Open XML is perfect, but neither are the alternatives.

    If you don’t think Open XML is being held up to a different bar than ODF was, I’ve got a bridge to sell you. <g/>

    -Brian

  10. Mike Brown says:

    >> Have you tried to implement ODF?

    No, I haven’t; and I didn’t realise that this was a pre-requisite for comment!

    I have, however, picked through the XML of files generated in both ODT(ODF) and DOCX(OOXML), and it’s not hard to tell them apart: one format looks like it’s been carefully thought through, while the other looks like a traffic accident.  For example, here’s a short extract from my test document of choice (my CV/Resume!).  Below are the ODF and OOXML repsentations of a table, listing the address of a previous employer.  (In both cases, I’ve changed angled brackets to square ones, so the browser doesn’t parse it).  The table is five lines long on my Resume.

    First the ODF representation, which is seven lines long and reasonably easy to follow:

    [table:table-cell table:style-name="Table6.B1" office:value-type="string"]

     [text:p text:style-name="P96"]Human Resources Dept*[/text:p]

     [text:p text:style-name="P96"]Boundary Way[/text:p]

     [text:p text:style-name="P96"]Hemel Hempstead[/text:p]

     [text:p text:style-name="P96"]Herts HP2 7YU[/text:p]

     [text:p text:style-name="P96"]United Kingdom[/text:p]

     [/table:table-cell]

    Now the OOXML version, or part of it anyway.  The entire representation of it is over 100 lines of OOXML!  Here’s just one line of the table in OOXML; note it takes 17 lines of OOXML to represent it (although the lines *are* shorter):

    [w:t]Hemel Hempstead[/w:t]

     [/w:r]

     [/w:p]

    – [w:p w:rsidR="000E2F7F" w:rsidRPr="00636E9A" w:rsidRDefault="000E2F7F"]

    – [w:pPr]

     [w:snapToGrid w:val="0" /]

    – [w:rPr]

     [w:kern w:val="1" /]

     [w:lang w:eastAsia="ar-SA" /]

     [/w:rPr]

     [/w:pPr]

    – [w:r w:rsidRPr="00636E9A"]

    – [w:rPr]

     [w:kern w:val="1" /]

     [w:lang w:eastAsia="ar-SA" /]

     [/w:rPr]

    It’s gobbledy-gook, IMHO.  What is w:rsidR, I wonder?  Not a clue myself; and even less clue as to what the values "000E2F7F", "00636E9A" or "000E2F7F" might be.  And w:eastAsia="ar-SA"?  where did *that* come from?

    Yes, I know: that’s what specs are for, right?  Look at the spec and I’ll understand.  Except I could understand the ODF version without even looking at the spec.

    Cheers,

    – Mike

  11. dmahugh says:

    Mike, that’s not a row of a table in the Open XML example you gave.  I’m not sure what went wrong here, but something that starts with a <w:t> tag isn’t a row of a table, as anyone working with Open XML knows.

    If you’d like to post the actual markup of what you’re talking about (or better yet, links to two documents), I’d be glad to discuss the details, but in this case you’ve posted two different portions of these documents.  They don’t even have the same text in them, so it’s pretty hard to say anything meaningful about how they compare.

  12. jones206@hotmail.com says:

    Mike,

    A quick look at the spec will tell you that rsidR is an optional property applications may place onto a run of text that allows them to label unique points in time when an edit was made. This allows for document that get forked to be easily merged together in the future (you can tell is something was added to one as opposed to deleted from the other one). This funcitonality does not exist in ODF, so that’s why you don’t see it in the files.

    The value w:eastAsia="ar-SA" is using the ISO definition for languages to specify what the language of that text is. Your document has the language directly applied to that text, so it gets saved out into the file that way.

    Also, if the user decides to use character and paragraph styles, rather than direct formatting, you’ll have the styling information seperated out from the content. In your example you had some direct formatting, so that shows up directly on the run.

    In order to parse the ODF file, you’ll also need to look at the character properties defined towards the top of the file, so you should take that into account.

    If you look at part 4 of the Ecma spec, it gives you a very detailed reference of every element and attribute. You can quickly find the tags you aren’t sure about and you’ll see a description.

    -Brian

  13. Mike Brown says:

    @dmahaugh

    >>They don’t even have the same text in them

    The OOXML example section that I posted maps to the following line of the ODF example:

    [text:p text:style-name="P96"]Hemel Hempstead[/text:p]

    I just copied and pasted what I saw.  Maybe I grabbed the contents of that one cell, instead of the table formatting tags themselves.  It really was not that easy to tell.

    @Brian

    >> if the user decides to use character and

    >> paragraph styles, rather than direct

    >> formatting, you’ll have the styling

    >> information seperated out from the content

    I repeat that this is the same document, saved in two different formats.   I made no formatting changes to it beforehand, so this user didn’t "decide" anything in the way that you say.  I think that the original MS Word file did use styles.

    Cheers,

    – Mike

  14. Ricus says:

    Mike where did you save from i.e. ms office2007 for the ooxml file and open office for the odf file;or did you use open office with a plugin to save the ooxml file?

    So what I’m basically asking is the output from a Office generated source or not.

  15. Mike Brown says:

    @Ricus

    It was either OpenOffice 2.3 or Lotus Symphony Beta 2 to save the ODF file.

    For the OOXML file, it was MS Word 2003, with the OOXML add-on pack that I downloaded (from Microsoft, I think).

    Cheers,

    – Mike

  16. neerajsi says:

    @Mike:

    Uh, why are you looking at the XML of the document, anyway?  Don’t you have a good editor for it, like Office, or iWork, or Abiword?  OOXML comes with a lot of legacy, a lot of features, and a design that’s geared towards expressiveness and performance.  It is clearly not designed to be edited by hand in any major way.

  17. Mike Brown says:

    @neeajsi

    I wasn’t trying to edit the XML "by hand in any major way" or in any minor way for that matter.  I just wanted to see what the damned thing looks like!  The first point of anybody wanting to implementing either file format is whether it makes any kind of intuitive sense at first viewing.

    One man’s "expressiveness and performance" is another man’s historical baggage; as you said it, "baggage".

    Cheers,

    – Mike

  18. jones206@hotmail.com says:

    Mike,

    OpenXML allows you to put character formatting directly on a run, *or* you can define a style and reference it that way.

    In ODF, you must declare a style for the character formatting (even if the style may be somewhat "fake").

    So the file you are looking at has some direct  formatting, and the  application you chose (Word 2003) decided to write that direct formatting onto the run rather than using a style. That’s an application decision though, and it could have writting the content out in a very similar way to the ODF file.

    With ODF, if you have a run of text that you want to apply formatting to, you have to first create a style in a seperate location of the XML markup, and then reference that style from the run of text. So you should also take a look a the properties for style "P96" and you’ll probably see similar values for kerning and language (unless ODF doesn’t support that).

    -Brian

  19. hAl says:

    I do still wonder why MS Office produces such ugly looking OOXML though.

    An enormous amount of unnescesary spacing an line spacing and weird indentations in the XML.

    Allthough that is of course very implementation defined and purely an Office issue I sure would ask the Office developers to make the XML more easy to look at.

    For our developers starting to use OOXML we often use a Office produced template as a basis for producing automatically genrated OOXML content. It takes me forever to edit the MS Office produced XML layout in such a way that it is useable for such a template. (i.e. readable for our developers that seem unable to work with the rough ugly MS Office produced files)

  20. jones206@hotmail.com says:

    hAl,

    I’m curious about what your refering to? We try to avoid any unnecessary line spaces and line feeds to improve parsing times. Where are you seeing extra spacing?

    -Brian

  21. hAl says:

    Forget I said that. It is the extraction tool we use that tries to reformat but actually ruines the xml layout.

    The original office 2007 files just seem to be longs data strings of xml.

  22. jones206@hotmail.com says:

    Yeah, it’s actually suprising how pretty printing an XML file can add a noticeable perf and file size hit. That’s why we just write out the XML without line feeds or spacing.

    -Brian

  23. Brian Jones, Office Program Manager, beschreibt den Fortschritt in seinem Blog und warum es derzeit so

  24. Brian Jones, Office Program Manager, beschreibt den Fortschritt in seinem Blog und warum es derzeit so

  25. Bruno says:

    "I would think that at some point the access information will get out though…"

    I guarantee that some of the info will get leaked somehow, and it will by OOXML opponents that will do the leaking.  Only, they’ll leak half-truths accompanied by plenty of spin and FUD.  By keeping the site private, you are merely playing right into IBM’s hands.  Open up the site, so that when Rob Weir spouts his FUD, people can check the real info by going to the official site, and see that Rob’s words are indeed nothing but FUD.

  26. jones206@hotmail.com says:

    Bruno,

    I agree that is a risk, and will most likely play out as you suggest. Ultimately though, we need to stick to the rules. It’s up to the ISO folks to decide if an open site is ok.

    -Brian

  27. hAl says:

    Rob is unlikley to comment on items not yet publicised. He is member of the US standards committees and as such should be acting responsibly with confidential info of ISO.

    Also the period to jan 14 is very short so there is not nescesarily a need for information to be leaked as it provides little advantage to do so.

    I would expect opponents with access to the spec just to use this period to analyse the responses and try to find item in them that can influence the votes negativly especially if those items are blown up to huge proportions.

    But we’ll see. I would expect the ooxml front to be fairly quit for the next 7 weeks or so…

  28. Alex says:

    My guess is that Mike created this document in OOo and then copy-pasted it to Office 2003. If the documnet were created from scratch in Office 2003, it would (a) have styles, and (b) have global language settings.

    Besides, OOXML is not native Office 2003 file format. Creating and saving the document in Office 2007 would make for a better comparison.

  29. luke says:

    Now that you have to modify the Office XML spec, the documents created by so many users of Office 2007 will by definition be of a different standard.

    Why not work towards merging the best of ODF and Office XML into a unified standard instead of more sabre-rattling games?

  30. jones206@hotmail.com says:

    Luke,

    What sabre-rattling are you talking about?

    The underlying goals of the two formats have not changed since their creation. The core goal of compatibility with the existing base of binary documents still exists for Open XML, and was not part of ODF.

    There is work though in the standards world to understand the differences between the two formats. If folks want to work on merging the two formats, the results of these comparrison efforts would obviously be very useful.

    -Brian

  31. hAl says:

    BRM meeting convenor Alex Brown on his blog stated that Ecma identified 1030 distinct comments.

    How would the number of current responses stack when measured to those 1030 distinct comments ?

  32. jones206@hotmail.com says:

    hAl,

    I don’t remember exactly. I think we were around 250 or so.

    -Brian

  33. Brian,

    As some of the out-cries from the anti-OOXML-lobby is getting louder by the minute on the secrecy of the disposition of comments – don’t you think it would create a bit of breathing-room if ISO/IEC made a statement that they have asked ECMA to keep the dispositions available to NBs only? Maybe a statement from Rex could do it, since he is the ISO/IEC-appointed editor of the dispositions.

    As you might have noticed in my discussion with Rob, it very quickly gets very "JTC1-directive-technical", so a statement would allow us (Rob and I) to focus on other, more important stuff than this … i.e. the comments themselves.

    :o)

  34. Luc Bollen says:

    Brian, I second the request of Jesper.

    We have been discussing in Charles-H. Schulz blog (http://standardsandfreedom.net/) the confidentiality (or not) of working documents as stated in the ISO/IEC rules, and my understanding is that confidentiality is not imposed by the rules (but Jesper disagrees with me :-).

    To better understand the situation, could you let us know who exactly from ISO/IEC made the request, and the reason given for it ?  Thanks.

  35. Dave S. says:

    Brian,

    How does "The key design goal of the Open XML format was that it coudl (sic) faithfully represent the existing binary documents from Microsoft Office. There are billions of those documents out there, and now they can be moved into a format that is an open standard without any negative impact on the owners of those documents.

    Once those documents are moved into this open format…"

      answer the question->

    "How does (MS)Open-XML becoming an ISO standard offer choices for accessing billions of legacy documents?"

    Are those billions of legacy documents inaccessible now? Is Office 2007 incapable of faithful representation of the legacy documents until MSO-XML is ratified by ISO?

    Having a single source for the MS-legacy-to-MSO-XML converter -is- a negative impact, especially when the original application is not available to check the faithfullness or the desire is to move from MS-legacy to ODF. It certainly doesn’t seem as if it offers more than one choice, at least it doesn’t offer not any more than are currently available.

    "There are already a large number of applications that support the open xml format…"

    There are a few applications that support fragments of the MSO-XML format and one that might support most.

  36. jones206@hotmail.com says:

    Jesper and Luc,

    It was a communication between Rex (the editor) and some of the folks in ISO (not sure who). I would love for it to be public, and as I said I’ll blog about all the big ones (without giving away the original national body comments). I doubt you’ll get ISO folks to make any sort of public statement on this, but maybe they would.

    If folks like Rob Wier from IBM don’t think ISO would mind, they should go ahead and post the content publicly. They have the access info (since they are on a number of national bodies). I’m sure it someone will do it, but it will probably not be on a site directly affiliated with IBM… πŸ™‚

    ———————–

    Dave S.

    What do you want? I have an opinion, and you disagree. Fine. Show me an app that supports all of ODF… I don’t know of any.

    What would you like to see us do that would make you happy?

    -Brian

  37. juan says:

    when you say "Open XML" what you mean? Open Office XML? Office Open XML? XML that is open ? ( is there any closed XML? )

    This is so confusing.

  38. Dave S. says:

    Brian,

    Why always an answer to a question not asked?

    What would make me happy is to see answers that are on-topic, complete, and accurate.

    I ask how ISO approval leads to accessing billions of documents and you answer that accessing documents was a goal of the MSO-XML effort.

    The answer is orthogonal to the question.

  39. hAl says:

    @Dave S

    You actually ask a different question now.

    First you asked about choice and not you have more limited your question to accesibility.

    To answer your current question. The binary formats will slowly disappear in the future. New applications will be created that use the new XML functionalitiy for instance to retrieve data from archives. These won’t nescesarily work on old files. Depending on your use a need for conversion of your files to a new format can arrive.

    With Office Open XML companies get the ability to faithfully convert their current binary formats (now or even in a distant future) to an XML based format in such a way that no information is lost. So it allows acces to ALL the information in the binary files even when converted to the new format.

    This is not 100% possible with for instance ODF. The lack of ability to faithfully convert the current Office document base to ODF was a main reason why the opendocument foundation abandond the ODF format development. So mayby your ealier question on choice in accesibility you should direct to OASIS and ask them why they never chose compatibility with current Office documents as one of their goals.

  40. Having looked at both specs, and the XML of both -because XPath and XSL is why you’d bother with XML in the first place- I’m somewhat underwhelmed by ODF and appalled by how awful OOXML content is. It has a look that radiates ‘work in progress transition from OLE based content to XML’.

    At the same time, the two products are fairly close to convergence in feature sets. Why isn’t ISO pushing back on both groups to say: come up with a format that is a proper superset of both, a decent format with the stricter format model of ODF, and normative test documents?