Office Open XML final draft!!!


As I already mentioned, in the last face to face meeting in Trondheim, Norway we unanimously voted to approve the final draft of the Office Open XML spec as ready to submit to the Ecma General Assembly. The GA will then review the draft and in December there will be a vote to approve it as an Ecma Standard!


This is a huge milestone, and the entire technical committee has worked extremely hard over the past year. We really had an amazing collection of contributors to this standard, and if you take a look, it will show: http://www.ecma-international.org/news/TC45_current_work/TC45-2006-50_final_draft.htm


For those of you interested, here is the list of all the organizations contributing to the standard:



  • Apple

  • Barclays Capital

  • BP

  • The British Library

  • Essilor

  • Intel

  • Microsoft

  • NextPage

  • Novell

  • Statoil

  • Toshiba

  • The United States Library of Congress

We posted the draft in three separate formats. There is a PDF version; a tagged PDF version (for accessibility); and a DOCX version.


The final draft is still broken out into 5 separate parts:



  1. Fundamentals – gives an overview of the structure of the formats, and describes all allowed parts; content types; and relationship types.

  2. Open Packaging Conventions – describes the basic conventions used for storing the parts of the file within a ZIP package. *

  3. Primer – gives a great description of all the markup languages and how they work. This serves as a great tutorial.

  4. Markup Language Reference – contains detailed descriptions of each and every element; attribute; and simple type. Serves as a great reference when you want to look up what an element means. **

  5. Markup Compatibility and Extensibility – describes how additional markup can be added to the format while still conforming to the spec

* Part 2 has a couple additional electronic resources. There are a few XSD files, as well as the equivalent RelaxNG files (we were lucky enough to have Rick Jelliffe help in the creation of these).


** Part 4 has a collection of XSD files and the equivalent informative RelaxNG files. There are also a collection of predefined cell and table style references for spreadsheetML, as well as a collection of predefined shape and text warp geometries for drawingML.


I’ve been giving pretty frequent updates on the progress of the spec, so most of the content at this point won’t come as a surprise. We spent the last few weeks in the committee nailing down any potential interoperability issues, which included a new schema that allows applications to clearly define additional characteristics that may assist consumers in better handling their files. For example, it’s possible to define what level of arithmetic precision was used for Spreadsheet formula calculations, so that a consuming application can accurately display the same results.


We’re already seeing hundreds of developers working with the earlier versions of the draft, and this final version will really help everyone who’s been waiting for it to solidify. If you go over to the openxmldeveloper.org site, you’ll see there are almost 600 registered members and an extremely active discussion forum. There’s also talk of starting up a blogging collection so that the members can actively blog about the solutions they are building. It’s exciting seeing the diverse set of solutions; from document assembly on a linux box, to mind manager solutions that output wordprocessingML.


I’m already getting excited for what we do with version 2 of the spec (but I could use a little break between now and then). Here are a few fun facts about the work that’s gone on over the past year:



  • 72 presentations were given to the technical committee explaining the existing behaviors of features so that discussion on how to best structure and document it could then take place.

  • 66 hours of live meeting discussions (starting at 6am every Thursday for those of us on the west coast of the US)

  • 88 schema files

  • 128 hours of face to face meetings held in Brussels (ECMA); Cupertino, CA (Apple); London (British Library); Sapporro, Japan (Toshiba); Redmond, WA (Microsoft); Trondheim, Norway (StatOil)

  • 6,000 pages of documentation between the 5 parts of the standard

  • 9,422 different items to document (3,114 attributes, 2,500 element, 3,243 enumeration, 567 simple types)

-Brian    

Comments (24)

  1. Wolfgang says:

    any cance that there will be non-european and more worldwide approval of the standard?

  2. jones206@hotmail.com says:

    Is there a particular standards body you are thinking of Wolfgang?

  3. Chyld says:

    Awesome!  It’s great to see Microsoft working on and advocating open standards, congratulations on all the hard work!

  4. D. Wang says:

    Hi Brian, can you tell me why the Word2007 uses two formats – VML and WordprocessingDrawingML- to represent Drawings? Do you have any advice on it for a docx reader’s design, And where can i find the documents on it?

  5. orcmid says:

    Congratulations once again!

    I wondered what was keeping you so quiet lately.  Good work.  

    The downloads are going great while Europe snoozes and its late in North America.   Whew.

  6. orcmid says:

    Wow!  I downloaded a couple of what I thought of as small files from the .docx list so I could have some samples to explore.  

    I wasn’t expecting to see such a high degree of compression.  You really manage to squeeze those document.xml parts until the zipper hurts.  The only .docx that is larger than the corresponding .pdf is section 3, the primer, and that seems to be on account of all of the uncompressed (already compressed?) media parts.

    I went back and got Part 4 in .docx just to be more impressed.

  7. John says:

    Brian,

    despite all the criticism I made here on this blog, thumbs up and thank you. First time in history Microsoft really opens its Office formats. One can only hope that you will stay on this path and won’t make anything in the future that could compromise the positive effects of this step. It is now up to another manufacturers to make use of your format and make true interoperability.

    Sincerely,

    John

  8. jones206@hotmail.com says:

    John, Orcmid, and Chyld; thank you!

    We’ll definitely try to continue to build on this positive momentum and provide a ton of great tools for developers that want to build on top of the formats.

    D. Wang;

    Word was only able to do the work to upgrade to the new DrawingML framework for Charts; Smart Art; and Pictures. All other drawings still use the legacy VML architecture. That’s why we needed to include both VML and DrawingML in the standard; otherwise folks wouldn’t have been able to fully interop. The hope of course is to move to use DrawingML for everything in future versions, as Excel and PowerPoint did.

    -Brian

  9. Stephane Rodriguez says:

    "Word was only able to do the work to upgrade to the new DrawingML framework for Charts; Smart Art; and Pictures. All other drawings still use the legacy VML architecture. That’s why we needed to include both VML and DrawingML in the standard; otherwise folks wouldn’t have been able to fully interop. The hope of course is to move to use DrawingML for everything in future versions, as Excel and PowerPoint did."

    Excel 2007 uses VML to represent comments and the containment layer around OLE objects.

  10. jones206@hotmail.com says:

    Stephane you’re right, I forgot about that. That’s another example where we would want to move towards DrawingML eventually, but weren’t able to for Office 2007.

    -Brian

  11. Stephane Rodriguez says:

    The past incarnations of DrawingML have been chaotic. It would be interesting, out of curiosity, to get an accurate history of what changed over time, perhaps to better understand what is supported in what.

    Here is my take, I am pretty sure I got at least 50% of it wrong 🙂

    – pre-Windows 95 era, Word, Excel and Powerpoint use their own vector drawing layer used to draw shapes, pictures, diagrams, art and charts. Powerpoint, acquired by Microsoft in 1987, has by far the advanced drawing layer (bi-linear gradients, opacity, …), codenamed Escher (in reference of the famous mathematician).

    – In Office 95, it is decided to reuse the Powerpoint vector graphics layer in Word and Excel. Migration begins.

    – Migration ends with Office 97 where both Word, Excel and Powerpoint use the same vector graphics layer, publicly known as MSO (mso97.dll)

    – In Office 2000, it’s all craze about internet and Word tries to export WYSIWYG html. For that end, mark up extensions must be added to account for the MSO drawing layer. Hence the VML (Vector Markup language). Excel and Powerpoint don’t support it. Internet Explorer natively supports VML (Internet Explorer’s Direct animation vector drawing layer dismissed for performance reasons).

    – In Office XP, VML migration ends and both Word, Excel and Powerpoint support VML whenever a document is saved as a "Single web page archive" (.mhtml extension).

    – In Office 2003, nothing changes.

    – In Office 12, MSO gets rewritten with backwards compatibility in mind. The vector drawing layer uses more sophisticated drawing functionalities which makes it easier to draw themed, 3D realistic  objects. Technically, the differences are akin to the differences between GDI and GDI+. This new shared library is known as E2O and the corresponding mark up language is known as Drawing ML (Ecma TC45 specs).

    – In Office 14, ??? perhaps the drawing layer is rewritten, again, to 1) use WPF 2) to allow plugins, hence enabling much more sophisticated do-it-yourself scenarios. Use cases : custom charts ; BI analysis tools.

  12. Trondheim, Norway, 09/28/06. At the latest meeting of the Ecma TC45 technical committee, the final draft…

  13. I just saw this on Doug Mahugh’s blog and it’s really cool. Stephane Rodriguez has built a tool that

  14. D. Wang says:

    Thanks Stephane’s history story.

    There must be a big reason for using two formats. But it’s really a big defect for a file format, and a nightmare for a document reader/writer.

  15. hAl says:

    Isn’t the next step of the drawings format to move to XAML which will be the basis of the upcoming windows webdeveloper toolkit.

  16. D.Wang says:

    XAML? no more changes please, Is really the users want? no! they don’t care about it.

  17. jones206@hotmail.com says:

    No, we won’t be moving from DrawingML to XAML. The use cases behind XAML are much different from those of DrawingML. You may see areas where we use VML eventually move to using DrawingML, but I don’t see a move away from DrawingML for quite some time…

    -Brian

  18. Finally I have posted the updated (and final) mirror of the Office Open XML format specification to the

  19. Today we added Ecma Office Open XML final draft to the list of specifications covered by the OSP. Now

  20. SSiTE News says:

    Today we added Ecma Office Open XML final draft to the list of specifications covered by the OSP. Now users who implement solutions based on the Open XML format can choose to use either the language of the Covenant Not to Sue (CNS) or the language provided

  21. To quote Brian Jones : As I already mentioned, in the last face to face meeting in Trondheim, Norway

  22. Doug Mahugh says:

    One of the key benefits of the Open XML file formats is that they support all of the things you can do

  23. It’s finally official. Today the Ecma General Assembly voted almost unanimously to approve the Office

  24. Drejers Vue says:

    Enighed om Office Open XML format!