Working Draft 1.4 of the Ecma Office Open XML formats available


Last week we held the 5th face-to-face meeting of Ecma TC45, and this time it was hosted by Microsoft. It was nice being able to stay out here in Redmond for once <g/>. At the meeting, we all agreed to make working draft 1.4 publicly available, which is awesome news. You can now get the latest draft from here: http://www.ecma-international.org/news/TC45_current_work/TC45_available_docs.htm


Some key things to note in this new draft are:




  1. Document reorganization: We got a lot of feedback from the Ecma Coordination committee and ISO/IEC JTC 1/SC 34 members that a significant reorganization could help improve readability. The earlier drafts were basically one large (4000 + pages) document, and we’ve now broken the spec out into multiple pieces:



  2. Spreadsheet Formulas: The spreadsheetML formula definition is almost completely filled in at this point, and it was moved from the front matter into the Reference section (Part 4). In addition to that:



    • The rest of the missing function definitions have been completed


    • The vast majority of undefined behaviors are now well-defined


    • Some editorial improvements were made to make the argument list easier to read



    • Primer section: A lot more tutorial material was added:



      • WordprocessingML: Annotations, Custom Markup, Fields and Hyperlinks, Fonts, Glossary Document, Mail Merge, Document Settings, Styles, Tables.


      • SpreadsheetML: Calculation Chain, Comments, Custom XML Mappings, External Connections, External Links, Metadata, PivotTable, Query Tables, Shared String Table, Shared Workbooks, Tables.


      • PresentationML: Animations, Slide Synchronization


      • DrawingML: 3D, Diagrams, Coordinate Systems and Transformations, Picture, Shape Definitions and Attributes, Styles, Text.


      • General: Equations, Extensibility, Metadata


    • Conformance: The conformance clause was significantly simplified, and should make it much more flexible for folks who only want to implement a certain portion of the spec


    • Reference Material: The reference material has filled in a lot. The WordprocessingML section is now complete and the other MLs are getting close.


    • Schema Changes: There have been a number of schema changes made as well based on issues raised over the past 8 months or so. An example of one of the changes made was that we took further steps to remove any dependencies for specific platforms. For example, there used to be a number of tags for “Active X” controls, and those have now been changed so that they allow for any type of control (Java, ActiveX, etc.). These points were raised by a number of people in the technical committee, as well as people outside of Ecma.

    You’ll also see that all 5 parts are available as .docx files again. With the 1.3 draft the .docx files followed the 1.3 version of the format, and was also supported by Beta 2 of Office 2007. These new documents follow the 1.4 version of the spec, and are not supported by Beta 2. There have been a number of changes made to the spec over the past 4-5 months, and that’s why they won’t open in Beta 2.


    I was supposed to be taking the day off today, but I really wanted to make sure to let everyone know about the updated spec as soon as it was posted. Time to get back to my day off… :-)


    -Brian

    Comments (27)

    1. Francis says:

      Unfortunately, the page numbers printed in the draft spec and those in the PDF still do not match up.

    2. Francis says:

      The specification looks much cleaner now, but I am disappointed to see that a single relationship cannot support both an absolute AND relative path.

      The ability to store both forms of paths is important for document longevity, interoperability, and security.

    3. BrianJones says:

      Hi Francis, what is the problem with the page numbers? You mean the pages with the TOC aren’t numbered? I think that’s actually an ISO style guideline, and I don’t think we can get away from that. The good thing with the PDF though is that the TOC, and all of ther references are actually linked though, so you can just click on it and it will automatically go to that page.

      The issue around absolute and relative paths is something that could be solved leveraging the extensibility mechanisms. As we did the investigation into how the applications were used, there was never an instance in which both a relative and absolute path were stored, so the current version of the spec also doesn’t allow for that. If someone else actually had a case where they wanted to support that though, they could easily extend the format to do that. Of course applications like Office wouldn’t know how to preserve both paths, but that isn’t a format issue… it’s an application issue.

      Let me know what else you think of the spec, and if you have any other suggestions!

      -Brian

    4. Francis says:

      Thanks for the response!

      I understand that "there never was an instance" of using both paths. That, however, is the problem. Links that use one path are inherently non-portable.

      Say I have a Word document with linked objects. If I use absolute paths, I cannot:

      1. move the parent document AND linked objects to another local directory, drive, or system

      2. e-mail or otherwise transmit the parent document AND linked objects to a remote system and open it there

      3. open the parent document over a network (while the linked objects still reside in the paths where they were created)

      In all cases, the absolute paths will point to non-existent files. The relationships/links will break. This is bad news for electronic dissemination, groupwork, archival, and system migration.

      The application could skirt this problem by using relative paths. But that leads to analogous problems, namely that I cannot:

      1. move the parent document BUT NOT linked objects to another local directory, drive, or system

      2. e-mail or otherwise transmit the parent document BUT NOT linked objects to a remote system and open it there

      Again, the relative paths will point to non-existent files, thus breaking the relationships/links.

      However, if the application refers to both paths, these problems will not occur. Absolute paths may be broken, but as long as the positional relationship between the parent document and linked objects is preserved, relative paths will work. For instance, say I zip my "BP" directory (and all the directories under it, including Parts) and send it to a colleague. He could then open BigProjectParent.docx on his system and see/edit PartsBigProjectObject1.xlsx.

      Likewise, relative paths may be broken, but as long as the linked objects remain at their original location, absolute paths will work. For instance, I could send BigProjectParent.docx alone to a colleague of mine. He could then open BigProjectParent.docx on his system and see/edit \centrallocationBigProjectObject2.xlsx.

      It seems that relative paths are preferable for most situations. However, they necessitate users having the foresight to move/transmit ALL linked objects along with the parent document. Furthermore, in some situations relative paths are inappropriate (such as referring to a file database or live statistics at an invariable, central location.) No matter how "intelligent" the producing/consuming application is programmed to be, it will not be able to determine when relative paths or absolute paths are better for the user. In a given document, absolute paths may be more suitable for some relationships, while relative are more suitable for others!

      This is a real problem. I have scores of Word files with links objects (mostly to large Excel files and Access databases.) I cannot touch these files, lest I spend hours reconstructing broken links (a nightmare when you have hundreds of links in a single file.)

    5. Francis says:

      As for Office not knowing "how to preserve both paths," if it were a part of the format, as I suggest, it should be easy. Here is how it could work (in pseudo-code):

      1. On File|Open, application looks for linked object at absolute and relative paths.

      2. If both paths resolve to same linked object, and object exists, go to 10.

      3. If linked object found at absolute path but not at relative, prompt to update relative path (Fix path: Yes/No) and go to 10.

      4. If linked object found at relative path but not at absolute, prompt to update absolute path (Fix path: Yes/No) and go to 10.

      5. If linked object found at neither, alert the user, possibly ask the user for a new path, and go to 20.

      6. If different linked objects found at both, alert user and either default to absolute or prompt user to choose, then go to 10.

      10. Update linked objects.

      20. End.

    6. Escamillo says:

      For what it’s worth, OLE links use both relative and absolute paths.  An OLE link stores both the relative and absolute monikers referring to the source object.  When binding an OLE link (activating the link or updating the data, etc), IOleLink::BindToSource is called, which calls  IMoniker::BindToObject on the relative moniker, and if that fails, then calls IMoniker::BindToObject on the absolute moniker.  If IMoniker::BindToObject succeeds for either the relative or absolute moniker, then the other moniker is updated accordingly.

      See "Notes on Provided Implementation" section of the IOleLink::SetSourceMoniker documentation for details:

      http://msdn.microsoft.com/library/default.asp?url=/library/en-us/com/html/85fe1d28-d9c6-46b4-abff-6afce9ff3cd0.asp

      Now, I don’t know what "links" you guys are talking about in the OpenXML specs, but the same technique could be used for these links.

    7. A says:

      Does anyone else have trouble viewing the Markup Language Reference document in Acrobat 7.0?  

      It takes about 10 seconds to page down (or scroll down between pages) although jumping directly to a page is pretty fast.  It also takes a lot longer to search than did the previous version of the Schema, though the length of the document is about the same.

      Just wondering if I’m the only one, though a coworker was able to reproduce it on her machine too…

      Otherwise the document is much better now, a lot more complete.  Looking good :)

    8. Francisv says:

      Escamillo: I am referring to all links contained within Office documents.

      Your point on OLE is interesting, but, alas, it does not apply here. If you open an Office XML document containing linked objects in a text editor, you will only find absolute paths. You can also replicate the problem by doing this:

      1. create directory X, subdirectory Y in X, Word document in X, and Excel document in Y

      2. insert object from Excel document as link in Word document

      3. rename X

      4. open document and attempt to update links

      Incidentally, OLE links are not the only relationships used by Office documents. In Word, field codes often do this (e.g., INCLUDEPICTURE and INCLUDETEXT.) These are subject to the same problems, as they comprise either absolute or relative paths (at the user’s discretion, unlike OLE links, which, in Office, default to absolute.)

    9. The Ecma TC45 working group has released an updated version of the draft spec.  You can download it…

    10. Doug Mahugh says:

      If you were intimidated by the large, dense Ecma draft spec for Open XML, things have changed: the latest…

    11. Wesley Parish says:

      "For example, there used to be a number of tags for "Active X" controls, and those have now been changed so that they allow for any type of control (Java, ActiveX, etc.). These points were raised by a number of people in the technical committee, as well as people outside of Ecma."

      Ah, excellent!  Well done!

      This makes reimplementing Office Open XML on any one of the *nixes a much more realistic proposition.

      It also means there’s a much greater possibility that it will be usable on browsers such as Firefox, which has its own programming language/extension/environment.

    12. Steve says:

      These documents are not printable with "Print to Kinkos".  They were created with Acrobat six (I believe) which causes grief (can’t MS afford to uprade)  

      The workaround is to print from the Windows application that created the documents.  Of course because they are formatted according to the latest version of the spec, they are not readable by the public beta version of Word.

      It might make sense for these documents to be available from a "pubish on demand" service.  Reading hundreds of pages in Acrobat is no fun.

    13. The 1.4 public draft of the Office Open XML formats is available over at ECMA. Read up on Brian’s blog

    14. Rob says:

      Brian, Is there any chance of getting the "reference schemas" posted as was done with the 1.3 draft?  Or are they unchanged in 1.4?

    15. Bruce says:

      Comments on the bib support in the link.

      You need, in order of priority to:

      <ol>

      <li>use a more international-friendly personal name model; look at the vcard spec (given, family, honorific-prefix/suffix, sort-string, etc.); right now you are assuming Western users for what you want to be an international spec</li>

      <li>rationalize your bibliographic types</li>

      <li>define an extension model, at least to allow different types and properties</li>

      </ol>

      Will submit to TC45 as well.

    16. Jean Goffinet pointed out that he and the folks working on the ODF to Open XML converter project now…

    17. BrianJones says:

      Hey Rob, the TC is working on getting the XSD files uploaded as well. Hopefully they’ll be up there later this week.

      -Brian

    18. Jeff Bell says:

      Steve – the PDF documents Brian links to were actually created by a recent internal build of the Microsoft Save as PDF add-in for Office 2007. (We mark these as PDF 1.5; Adobe’s Reader adds the comment that this is the Acrobat 6.x version of the spec.)

      Thanks for flagging the issue with submitting these to be printed through the Kinko’s online submission tool. Kinko’s has confirmed that this is a problem on their side that will soon be fixed.

      Jeff Bell, http://blogs.msdn.com/jeff_bell

    19. A says:

      Seeing some of the other comments, I tried opening in the file in Acrobat Reader 6.0, the result was much more responsive.  The file actually became usable as one could scroll without waiting for 10 seconds between pages as when opening in Acrobat 7.0.  Search however is still much slower than the previous version of the document.

      MS might want to take a look at the problem in Acrobat 7.0 since if users upgrade to that, it will make the add-in look like its doing a very poor job creating PDFs since the files will be so slow to use.  Since Office doesn’t have a viewer, the PDF’s created by Office will be painful to work with.

    20. Difficult decisions between loose conformance and true interoperability – Rick Jelliffe had a great post…

    21. Those of you who using Beta 2 probably noticed that the .docx versions of the Ecma working draft 1.4…

    22. A comment was posted today that had a lot of thought put into it and rather than just replying to it…

    23. A comment was posted today that had a lot of thought put into it and rather than just replying to it

    24. Dating says:

      Last week we held the 5 th face-to-face meeting of Ecma TC45, and this time it was hosted by Microsoft. It was nice being able to stay out here in Redmond for once &amp;lt;g/&amp;gt;. At the meeting, we all agreed to make working draft 1.4 publicly available

    25. Weddings says:

      Last week we held the 5 th face-to-face meeting of Ecma TC45, and this time it was hosted by Microsoft. It was nice being able to stay out here in Redmond for once &amp;lt;g/&amp;gt;. At the meeting, we all agreed to make working draft 1.4 publicly available