A few updates on the OpenXML formats

Sorry I’ve been offline for the past couple weeks. I’ve been meaning to post some content for awhile but I’ve been swamped with Office 14 planning. I was trying to stay up on the comments from my last post, but still ended up slacking there as well (I’m sorry I didn’t reply to everyone’s comments). I’ve also been receiving a bunch of e-mail lately from folks, and I wanted to apologize for not replying yet. Hopefully I’ll get some free time soon.

I just realized that I never finished my “Intro to SpreadsheetML” posts, so I’ll try to get something up early next week to close that off. I also haven’t posted many examples of PresentationML so hopefully I’ll be able to get to that as well. Let me know if anyone has some examples they’d like to see in order to help them get started with Open XML development.

National Bodies comments on DIS 29500 online

If you’re looking for some good weekend reading though, you should check out the official comments submitted by the various National Bodies during the DIS 29500 contradictory review: http://jtc1sc32.org/doc/recent/JTC001-N-8530.zip (if you’re looking for some fun conspiracies, check out the metadata of J1N850-12 and J1N850-13 and see who the authors were).

J1N850-22 is the official Ecma response to the comments, which was posted a few weeks ago. This is the first time the official comments from the various national bodies have been posted. You can see that there are a number of shared themes in the documents, which was why the Ecma response tried to group many of the comments rather than replying to each one individually.

James Governor takes a closer look at archival formats

OpenXML vs ODF: does the archiving argument stack up?

“The industry needs to move beyond good vs evil, manichaen black vs white, beyond the single answer to a problem. Our monoetheism does us no favours … One true format? What do we need that for and what god are we worshipping? What are the problems we’re trying to solve?”

I think as we see more and more applications pop up that support OpenXML (besides those built by Microsoft), you’ll start to see the anti-OpenXML folks calm down a bit. The ideal in any archival format is that it allows for long term access with as little disturbance as possible. That was the whole point of OpenXML. Give the world a fully documented format and pass the ownership of that format over to a standards body for safe keeping and future development. OpenXML allows anyone to build tools that read and write the formats, and at the same time is designed to cause the least amount of disruption possible. You can move all your existing documents into OpenXML, and you won’t lose a thing. J

Open XML workshop in Sweden


There have been a number of workshops going on around the world to help educate developers on how to build solutions leveraging the OpenXML formats and I’m pretty excited to see the types of solutions they build.

You can follow Doug’s blog if you’d like to find out more about the workshops: http://blogs.msdn.com/dmahugh/default.aspx

Have a great weekend.


Comments (23)

  1. Anon says:

    The problem with posting pro-Microsoft only links is that you post only once a week.

    There is a truth you can’t deny even with your typical spin. And this truth speaks for itself in this astounding silence.

  2. jones206@hotmail.com says:

    I think for most folks out there, silence on one’s blog is more of an indicator that they are too busy. I would love to post more often, but haven’t had a chance to.

    If there is a truth out there that I’m denying though, please educate me…


  3. jwilhelm says:

    Does Microsoft have any plans for Word or other products to be capable of single-source publishing? I am thinking of something like AuthorIt that can be used to author in chunks, and published to multiple formats at once (PDF, DOC, etc).

  4. jones206@hotmail.com says:

    The new formats in combination with the custom schema support are a great first step in this direction. It’s definitely a scenario we’ve thought heavily about.

    Once you get the files out as .docx, and you’ve marked up the structure with custom XML, it becomes much easier to transform not just into other formats, but into different layouts and presentation schemes.

    That’s actually how we built the Ecma spec. We had a single database with about 10,000 rows for each piece of the schema we were documenting. Each row would contain the name of the schema element, a unique ID, and a chuck of WordprocessingML for the rich content. We could then easily generate on demand granular sections of the spec to review, edit, and comment on. Any of those sections we would generate were marked up with custom schema though, so we could quickly re-shred that section back into the master database.

    This allowed us to easily manage the 6,000 pages of content, since at any point we were only really editing around 100-200 pages. It also allowed us to work on multiple sections in parallel so that members of the TC could focus on the areas that were of most interest to them.

    The other great piece of this was that we were only storing the rich text description for each piece of the schema seperately from the template. So we could quickly decide that we wanted to present the information in a different way (different heading styles, ordering, fonts, colors, etc.) and we would only need to change the master template. Then the next time we generated the full spec, it would reflect that new look.


  5. jwilhelm says:

    I guess what I’m looking for is sort of a re-engineered Word that allows me to write in chunks and recombine them in end formats without knowing anything about DITA or XML.

  6. jones206@hotmail.com says:

    Not sure if it was clear from my description, but that’s just a simpler form of the solution we created for generating the Ecma spec.

    We don’t have anything that works out of the box today, but the openXML formats and custom schema support make it much easier for solution developers to build something.

    In the longer term, document assembly is something I’ve looked at at the start of each product cycle. We continue to take steps in that direction, as you can see with all the work we’ve done around XML. We’re just now getting started on Office 14 so it’s a bit too early to talk about what we’ll deliver there. In the mean time though, I’ll try to point out third party solutions that extend Office 2007 to enable scenarios like yours.


  7. Francis says:

    Neat solution! The automatically-generated bookmarks in the spec led me to suspect that you were storing the attributes in a database. Obviously, the benefits were much greater than the cost (of setting up the DB and custom schema and gluing it all together with code.)

    The same probably can’t be said for more moderate collaborative projects, though (e.g. only a couple hundred pages in length.) Have you given any thought to non-DB solutions? E.G. directories full of marked-up files, all pulled together with an enhanced (XML-aware and reliable) master/subdocument feature? This might not scale as well as DB but would be much less costly to set up (and thus beneficial sooner.)

  8. jwilhelm says:

    Thanks for the feedback, it is helpful. Please put in my 2 cents that this type of functionality in Word would be killer in the buiness world going forward!

  9. jones206@hotmail.com says:


    We’ve definitely looked at solutions like you describe for general document assembly scenarios. In fact we had an intern project last summer where they played around with using Sharepoint as a document assembly repository. Basically you could put a bunch of documents up on sharepoint in a doc library. There was then a master document that would essentially rebuild itself based on the documents in the library (and the metadata on each document).

    It’s a pretty cool prototype, and at some point here we may clean it up a bit and post it as example code for folks to play around with. In it’s current form it only works with the earlier Betas though, so either way it would need some more work before it could be used.


    I’ll definitely take your feedback into account. Thanks!


  10. jwilhelm says:

    BTW, I would love to try your SharePoint example if it is possible. Any links or info you have would be great. I’m at jwilhelm@athenati.com

    Also, I suppose that master and sub docs can in some way function as reusable content – correct?

  11. jones206@hotmail.com says:

    Unfortunately I’m not sure when we’d be able to pull the Sharepoint solution together into a useful form, but I’ll take a look. It may be something we can get someone to write up an article and post the source (I’ll look into it).

    You’re right that master sub docs solves a similar scenario. I’ve been trying to explore ways we can do it in the cloud though, rather than the heavy client side dependency that currently exists with the feature.


  12. Fernando says:

    >(if you’re looking for some fun conspiracies, check out the metadata of J1N850-12 and J1N850-13 and see who the authors were)

    The fact that IBM undeniably wrote the Kenya’s NSB response makes Rob Weir’s latest rant(http://www.robweir.com/blog/2007/04/sometimes-i-need-to-remind-myself.html) looks even more cynical.

  13. Hydrogen says:

    Please show examples of how to unzip the docx files in a PowerShell script or a Visual Basic program.  I know that I can manually change the docx file extension to zip and double click on the zip file, but I want to get at the XML files without going through the extra step of manually unzipping the docx files.  Does Visual Studio provide methods for unzipping the docx file?

  14. jones206@hotmail.com says:


    I agree it’s a real shame to see those types of tactics used and unfortunately, this isn’t an isolated case. Oh well, hopefully in the end more level heads will prevail. :-)



    Check out this tool (it’s pretty slick): http://blogs.msdn.com/brian_jones/archive/2007/04/03/visual-tool-for-developers-working-with-the-open-xml-formats.aspx


  15. Jeffrey says:


    The names in the Kenyan responses look German and Korean. How would those be seen as IBM related ?

  16. Umm, your comment, Brian:

    "(if you’re looking for some fun conspiracies, check out the metadata of J1N850-12 and J1N850-13 and see who the authors were)."

    What I’m interested in is, why should this "conspiracy" alter the fact that some of those objections are quite substantial?

    I refer among others, to these paragraphs in J1N850-12:

    "ECMA 376 does not include the specifications for the Macro language nor its features, as such the

    claim of supporting billions of documents is in question as the main issue of backward compatibility in

    the real world is the problem of incompatible Macro functions. Additionally since the “covenant not to

    sue” does not cover features outside the current specifications, as such, efforts to reverse engineer

    Macro features may put independently developed implementations at legal risk.

    So is the alleged claim of supporting and upgrading the billions of legacy documents is not justifiable

    as the exact features in ECMA 376 which provide this backward compatibility is not specified even in

    the large document."

    MS Office Macros are one of the major reasons businesses give for remaining MS Office users, and this functionality isn’t considered valuable enough to make it into the MS Office 2k7 specification?  A bare implementation of the official specification – eg, the celebrated ECMA376-to-ODF converter – won’t be able to support most of those most valuable documents, so that’s a significant part of those few billion MS Office documents tossed out as obsolete.

    I have made attempts to make the ODF people see that once we use ODF as an HTML wrapper – that OO.org contains quite a good HTML editor shows that they’re aware of some aspects of it – the issue of PHP or Perl or TCL or so as a Macro language, raises its ugly head.  I think that will take some time to sink in.

    But, I think you’re playing the politics game as much as IBM is; you’re just not admitting it.

  17. Fernando says:


    Google the German name…

  18. nksingh says:


    Perhaps the macro language is not included in the spec for the document format because the macro language really CANNOT be specified.  It is just a bunch of COM Automation interfaces onto the codebase of Office.  People complain that OOXML is too big to be implemented, but to do Macros, you’d have to basically reimplement Office.  This is not a winning scenario for anyone.  

    Macros are not portable by a longshot.  They don’t even seem to be totally consistent between Mac and Windows versions of Office.  "Active" documents (which use Macros) are a relative rarity in most cases anyway.  Transitioning to ODF or to any other office suite will necessarily require the rewriting or retesting of these active documents which are so important to the business flow.  Hopefully, with the specification of OOXML, some of the tooling that’s built in VBA will be moved to external apps that modify the XML directly.

    I think the ODF’s tactics on this issue are quite despicable.  I can’t see what they’re trying to do though the words of Rob Weir and Sam Hiser other than riling up the easily-enchanted OSS sheep.  Look, I like OSS and the philosophy of sharing as much as the next guy, but I will say that there is a large body of people who love it to the point of intellectual dishonesty.  That’s Sam Hiser.

  19. Wesley Parish says:

    nksingh, I can see your point.  It’s something that nobody seems to have considered, though, from the earlier DOS, and Mac (and other OSes).

    As far as I can see, however, there are two things we’re talking about when we talk about Office Suite macros.  One is the recorded keystrokes thing, when the application itself interprets a set of keystrokes and records them for replay later; the second is when you use a more formal specialist programming language – VBA IIRC is the MS Office one (I’m not a regular MS Office user – I need something crossplatform).

    That’s what makes Macros so messy – which one is the one we’re talking about?  I’m talking about the programming language – and that can be specified, reasonably accurately.  Microsoft has done it with C#, so doing the same thing for VBA should not be too difficult.  In effect, they could even get it ported to Linux – there’s several Basics that run quite happily on linux, some of which are Open Source, and the maintainers would blink perhaps at receiving patches from @microsoft.com addresses, but would weigh them on their merits.

    And you mention COM Automation interfaces.  I think I raised a question like this in relation to ActiveX a few months ago (I help maintain a community centre’s Community-based Technology and Learning Centre, and ActiveX has proved to be a source of grief.), where I argued that its functionality should be abstracted, so as to be applicable to anything Unix as well.  If Microsoft hasn’t done this in relation to ECMA 376, it’s not my problem, but it is a problem, and the less attention is paid to it, the bigger it will get.

    As far as reimplementing MS Office goes, one might argue that the process of reverse-engineering the MS Office file formats, is the first step on the path to doing precisely that.  Nobody’s bothered to go any further – so I guess it’s not that enthralling a project when you can write your own Office Suite and show the world how it really _should_ _be_ _done_. 😉

  20. hAl says:

    So IBM partly wrote the Kenyan reponse to ISO. I already wondered why Kenya of all countries would have the longest responses of all the national bodies.

    Has anybody from IBM already responded to this ?

  21. I admit that I wrote the Kenya paper.

  22. Pamela Jones says:

    Bob’s Avatar: I’m upset that you are taking credit for the Kenya ISO "standards" paper.  I am the original IBM composite fictional character and I deserve the credit for sneaking around and trying to deceive the international standards community more than you.    

  23. Neznam da li pratite ovu sapunicu oko ratifikacije / standardizacije Open XML (OOXML kako ga popularno