Microsoft Office Open XML Format does not require upgrading to Office "12"

This is old news to a lot of you, but I wanted to call attention to this again because I’ve read some articles where it appears that some folks still aren’t aware of the work we are doing to make the new XML formats backwards compatible. I’ve already talked about how the existing legacy XML formats will continue to be supported going forward. Even more importantly though, the new Office “12” XML file formats will work in existing Office versions as well. That’s right, you don’t need to upgrade to Office “12” to use the new XML formats.

We will provide free updates for Office 2000, XP, and 2003 that allow them to both open and save the new XML formats. This is great news for solution developers; IT admins; and end users. I know that for a lot of you who have been reading my blog for the past few months, you’re probably already aware of this. I just wanted to call attention to it directly incase you were worried about what the costs were for moving into the new XML formats.

This work is something I’ve been really proud of since we first started doing the work. It was a really big investment on our part to actually port this back to the past three versions. To just support reading the formats would be one thing, but support both read and write was a pretty big task. It’s something I’d always viewed as a must have though. I think there is so much value to these new formats and getting this to as many people as possible is really huge. It also makes it a lot easier for people that have moved forward to Office “12” to share their documents. They can either use the old binary format, or they can use the new XML formats.

Of course, anyone else is free to come along and build a tool that reads and writes the new formats too, so if you aren’t currently an Office customer there are still other possibilities. We’ve talked at length about the royalty-free licenses that we provide that basically allow anyone to build on top of the formats. There has been some discussion around a specific license that is not compatible with ours, but the large majority of licenses out there are 100% compatible so there are a lot of choices. I thought it was worth pointing out again what we at Microsoft are providing directly though.


Comments (65)

  1. Darryl Hover says:


    Can you comment on how the older versions will handle some of the newer features?

    For example:

    1)Will the older versions simply read/write the new format, or

    2)Will the Office 2000 interface, for instance, be updated to enable editing XML?

    3)How will Office 2000, for instance, handle documents built upon custom schemas? Will the documents be editable in accord with the definitions laid out in the custom schema?



  2. scot says:

    This seems like a good thing overall. As a customer i would be interested to know how long and how well you will support each XML version (for let say office 13,14,15..) on say Office 2000.

    Will there be a new version of Office Open XML for each iteration of office ?

  3. Scot, the format will not have any breaking changes made going forward. We don’t want to be put in a spot where we need to provide updates to Office "12" in order to support opening files from Office 14 for instance. Of course, as new features are added, we need to come up with ways to represent those in the format. We won’t change the way existing features are represented though, which is really important to maintain true compatibility.

    We have designed into the format a future extensibility mechanism so that as new features are added in future products, that functionality can be persisted in the format without breaking it’s backwards compatibility. Of course, we can’t build those new features into the older products so the older version may not be able to display the new feature, but it could show everything else. We’ve actually done work though to allow the future versions to also embed alternate representations for the new features so we could at least provide a fallback for the earlier version to display.

    Darryl, the support will be handled the same way is it is today when you collaborate on a .doc file. The older version can read and write the new format, but it will of course not be updated to support all the other new features. The new features will require the new product. So in your example of custom defined schema support, you would of course be able to open the file and save it back, but the schema validation would not be there in Office 2000. There are only a handful of features like this though, as most functionality can either be mapped back to existing functionality, or it isn’t a feature persisted in the format (such as the new User Interface).


  4. Inquisitive says:

    So Brian, to provide a more concrete example:

    Let’s say in 30 years time I suddenly get the urge to stop sysadminning and get back to doing some programming.

    Question (1)

    Will you guarantee that I will be able to take out a license for (the 2005) Word XML then (assuming I DON’T take a license now)?

    Question (2a)

    Will you guarantee that I will be able to take out a similar license for the Word 2035 formats then?

    Question (2b)

    What happens to the licensing program if Microsoft goes bankrupt in 2020?

    Obviously feel free to substitute any positive N instead of 30 above…

  5. publius says:

    What about Office for Mac? Will this be available for v. X, 2004, or future versions?

  6. anon says:

    Well, take Excel 12 conditional formatting improvements. Some of these are brand new and require to be persisted in such a way that older Excel versions ignore them. The downside to this is that for any person really taking advantage of the features, instead of writing custom VBA macros or layers of formulas, the older Excel versions will degrade to just poor workbooks.

    Let me give an example : colors. Let’s say you create new such conditional formattings with colors. Since older Excel versions don’t support the new conditional formattings, then no color gets applied to cells. Unless I have missed something completely obvious. Can you spend some time explaining the downmwards migration path ? This matters as much as the new features.

  7. roman modic says:


    Support for Office 2000 is news for me and it’s great. Does this mean only Office 2000 in Windows XP/2000 environment. It would be great if this would work in Windows 98SE …


  8. Rick from the Mac BU <a href="">has stated</a> that there will be XML converters for Mac Office v.X and Mac Office 2004. It will also be the native format for the new Mac Office (2006 or 2007?)

  9. Jim Rech says:

    >>existing legacy XML formats will continue to be supported


    If an entry is made past column IV in Excel 12 will that save in the binary format? If so, will older Excel versions be able to open that file now? After the patch? (Obviously older Excels will not be able to do anything with such entries. But where they might abort loading now they might fail more gracefully with the patch, loading what they can for instance).


  10. Darryl Hover says:


    Without schema validation (at the very least, when the document is saved) in the older versions, I don’t really see what’s so exciting about back-porting the new formats. Sure, it’s nice that the older versions can read the documents, but it certainly will lead to alot of goofed-up documents, confused users, and worthless data.

    Scenario 1:

    Entity A sends an Office 12 contract document with custom schema to User A, who has Office 2000, and asks him to review the information and make necessary changes. User A proceeds to make changes to portions of the document that the schema would otherwise protect, saves it and sends it back to Entity A. Entity A can no longer use their backend automation to accept the changes because the document does not validate. So, Entity A either needs to send a new copy back to User A with instructions on which parts of the document can/cannot be changed, or manually try to extract the properly changed info.

    Scenario 2:

    A large Corporate IT department is in the midst of a staged role-out of Office 12. New system specification templates are created using a custom schema. Sometime down the road, a poor user on Office 2000 is browsing through documents (created using the new Office 12 templates) for a system he’s working on and makes changes to portions of the specification that would otherwise be protected by the schema.

    Later, an automated build/verification process fails because the specification documents do not validate.

    I understand that the older UIs could not be updated to include real time schema validation and the other new features (after all, who would then need to upgrade, right?). But at the very least, validating the document against the custom schema before the document is saved could easily be done and SHOULD be done. In the case where validation failed, a message box could be presented telling the user that the document is not valid (and possibly what part of the document, specifically) and give the option to either continue or cancel saving.

    I _was_ getting very excited about the back porting, and was begining to design a new document system for our organization against it, but this is really making me think twice about it.



  11. Eduardo says:

    Interview with Gary Edwards on OFD and MS XML. The general impression I get is that MS XML is about to get run over by a Mac truck.

  12. Eduardo, most of the links you’ve been leaving are from OpenSource folks and OpenSource news sites, so it’s not surprising you’re seeing this as a one sided issue. I’m glad to see your interest and enthusiasm, but I think it’s a bit misguided to make your assumption about the Mac truck (I assume there was no pun intended on the whole "Mac" thing). I’ll be sure to look both ways though before I cross the street just in case. 🙂

    By the way, we had our Office developer conference last spring and at that time there was a study done by our research group to look into how many Office developers we had out there. The numbers were around 1 million people involved in developing some type of solution on top of Office. Of those folks about 1/3 of them were using the XML functionality. That’s about 330,000 developers building on top of Office XML support. That hardly sounds like a closed format that’s about to be overrun…

    Darryl, I understand your concern there. The custom defined schema support is extremely valuable for solution builders and is really an important part of the overall vision for opening up our documents using XML. The "reference schemas" (WordprocessingML for Word; SpreadsheetML for Excel; OpenDoc for OpenOffice) are really valuable if you want to understand the application level data. Custom defined schema is where you actually are able to truly open up the formats since you can mark them up with your own data, rather than being limited just to our predefined structures. I really do wish it was as easy as you suggest to back port the custom schema support to older versions of Office, but it would actually be a huge project in itself. It would essentially be the same as rewriting much of the application if we really wanted to preserve the custom defined tag structure. All the logic necessary to maintain well formedness while still the same editing behaviors was a lot of work that we took on in Office 2003.

    I think the best way to think about this is to look at the type of functionality we’re talking about. The support for reading and writing the format is essentially an update to our File I/O code. It’s a very isolated piece of functionality (essentially a translation), that was a lot of work, but still manageable. To support Custom defined schema would mean we’d have to actually modify the run time behaviors. We would need to change the way we read the text stream to understand the tags, and update the edit behaviors. This is really changing a huge portion of the Word .exe. I know it appears we should be able to just worry about this at save time, but that’s actually not the case. We need to do work to ensure that the structure is preserved while the file is being edited. I would love to have the ability to backport it, but this is really one of those pieces of functionality that you’ll need a newer version of the product for.


  13. Inquisitive – Steven Sinofsky (Senior VP for Office) sent a signed letter to the European Union directly addressing that very concern. We will continue to provide the licenses going forward and continue to represent everything in XML (other than obvious binary type structures like pictures, Active X controls, etc.).

    So the answer to your first two questions are yes and yes. I would assume that the last question is the same for any format that’s out there. I don’t see it being a problem (other than the fact that I’d be out of a job).

    Anon – Each new feature will have different behaviors when roundtripping through the older versions. This is directly tied to what we are able to roundtrip, or map back to a similar feature. Colors for instance will be mapped back to the closest match. We also allow the user to put the application into a "throttled mode", so that any new functionality that won’t work properly with older versions is disabled which allows you to ensure that you don’t introduce something that won’t work with an older version. If you don’t care about how it works with an older version, than you can of course upgrade the document to get out of throttled mode.

    We’ve done a lot of work on this over the past year. We’ve been meeting with a collection of about 20 customers for the past 6 months to work through the issues that will come out of this and to make sure that the collaboration experience is as smooth as possible. We have an even larger group of folks we are going to work with directly after we ship Beta 1 to see what issues we missed and what there is that we need to address before we ship. Of course, in every new version of a product you introduce new functionality that may not work with a previous version. We also knew though that in changing the default format, a lot of folks would have a more critical view of this type of behavior. There is only so much you can do, but we decided early on that if we were going to change the default formats, we had to go above and beyond what we’ve done in the past to make sure the transition was smooth.

    Jim – The limitation on the number of rows and columns was directly tied to the architecture of the old binary formats. So if you have a file in Office "12" that exceeds that previous cap on rows and columns, and you save into the old binary format you will be warned that the extra data will be removed. The same is the case if you save in the new formats and send it to someone on Office 2000. If they have the update they can open the file, but they will be warned that not all the data was preserved.


  14. Eduardo says:

    Microsoft may decide to add ODF support if its customers want it:

    Brian, this fits with what you said a while back.

  15. Eduardo says:

    Redmonk’s Stephen O’Grady:,289142,sid39_gci1132351,00.html

    Regarding my Mack truck comment: I read the David Berlind’s blog "Could ODF be the Net’s new, frictionless document DNA?" and now I think MS XML will be run over by a **fleet** of Mack trucks.

  16. Craig Ringer says:

    I can understand concerns about documents with custom schema being mangled by old versions. I wonder if it’d be possible to put a flag in the document that could tell old versions’ load/save code "don’t load this document, its author said it requires Office 12" or "this document should be treated as read-only, and can not be saved, in versions of Office older than 12".

    That way, backward compat would be preserved, but document authors could still ensure that documents with custom schema they must preserve would not get damaged.

  17. Chuck says:

    When will the updates for office 2000, XP, and 2003 be available?

  18. Craig – We’d thought about doing something similar to that, but after talking with some folks decided it was too confusing of an experience. We can look into it again though after Beta 1 if folks think it’s something we should solve.

    Chuck – The updates will be made available at the same time as Office "12".

    Eduardo – Thanks for the links (even though they were pretty biased). None of the folks in those articles seem to be aware that most customers that are interested in accessible and open data care more about their own schemas than they do about a schema originally defined by StarOffice or Microsoft. I that the articles are still a pretty close minded when looking at the differences between our XML formats and the one from StarOffice (that madpenguin article had a number of inaccuracies for example), but even then it’s only part of the story. Like I’ve said before, it’s the combination of the reference schemas (WordprocessingML) and the custom defined schema (whatever the customer’s data is) that really gives the true interoperability with business processes (aka document DNA). You don’t need to worry about the Mack trucks. 🙂


  19. Todd Knarr says:

    Brian: I think the custom-schema stuff, while initially attractive, fails on the business level for the reasons Craig and Darryl pointed out. It basically closes the format down: if you use Office 12 custom schemas, you can’t safely let anything other than Office 12 touch the documents lest they be rendered unusable by Office 12. This is not, from a business standpoint, a good thing. If I can afford to upgrade my entire business to Office 12 and can either keep the documents internal or require everyone I do business with to use Office 12 I can use custom schemas safely, otherwise I have to avoid them completely. I think that’s why traditionally custom schemas have been reserved to custom applications, not general-purpose ones like word processors.

    What I’d do in the plug-ins for older versions is deliberately avoid preserving features I couldn’t support. Office 2000 could read and write the format, but if it couldn’t support for example custom schemas correctly it wouldn’t write custom schema information back into the saved document. That causes loss of information, but at least it avoids silently creating a broken document.

  20. Craig Ringer says:

    Todd, I find the custom schema support really interesting. I think it’d be *nice* to be able to protect documents against being saved over by older Office versions, but there are non-technical and alternate technical approaches available too. Nothing stops me just limiting who can write to those files using standard filesystem permissions and groups, or if possible perhaps using a macro to do version detection and refuse to save. I don’t think the lack of any sort of lockdown in the support being added to the older versions is a fatal flaw or failure.

    On a side note – I do hope the next Publisher gets XML support. I *really* like the idea of being able to dynamically generate job templates with predefined PDF export settings. My employer doesn’t use or intend to use Publisher, but some of its customers do. Built-in PDF support will be wonderful when accepting jobs from those clients, but we still need to actually get users to use the right settings – always a challenge.

  21. Eduardo says:

    Replies to foxnews article:,2933,172063,00.html

    Brian, if you don’t like the links I post here, why don’t you post some, from people who don’t work for Microsoft or an organization funded by it, that support your position?

  22. Eduardo says:



    check out especially the long Gary Edwards comment.

  23. Inquisitive says:

    Hey Brian, any chance of an answer to my earlier question?

    If I didn’t phrase something clearly enough then please let me know, though I tried to come up with usage scenarios that are amenable to simple yes/no type answers.

    I think it’s important that my (future)grandchildren will be able to write software to read the documents I produce today… but as things currently stand, I understand that you won’t guarantee they will be able to. This seems to me to be a huge shortcoming in your licensing and a very valid concern!

  24. Craig Ringer says:

    Inquisitive: I don’t think that really makes sense (re future readability by grandchildren etc). The schema license is entirely fuss free, it’s only the patent license that’s mildly troublesome. Any patents that currently exist and apply will expire in a few decades on the outside. Additionally, nothing can stop you from simply ignoring the patents and writing a suitable XSLT filter to read the documents. Not unless someone invents the automatic patent rights enforcement brain-stem implant, anyway 😛 .

    That said, I don’t imagine writing a custom filter to convert from some ancient format – especially if you want full layout, not just content – would be at all fun. On the other hand, is there much chance that apps will support any current format by then anyway? My bet is on "no".

    I think issues with future readability stem more from concerns about changes to the format and software over time, such that we end up in a situation where we’re looking at these new formats like we’d now look at a Word 2.0 for Mac file – "huh?".

    How valid these are remain to be seen – they were certainly very reasonable with regards to the Office binary formats, though I understand that’s despite the best efforts of the Office folks. XML is hardly perfect, but well documented XML *is* more amenable to manipulation and conversion, and since the specs can be saved there’s little reason to worry about the documentation. I’m personally not all that worried, though I’d *prefer* an open standard core for a document format.

    Do remember that some standards fall into disuse and are superceded, so an open standard document format is still no perfect solution. A well designed one that was suitably extensible would sure be a nice start, though.

    Really, I think the patent licensing is more of a question for the short/medium term, in terms of how it affects what others’ who wish to use the formats can do. I’ve explained my reasoning on this as well as I think I can earlier, and won’t repeat myself.

  25. Eduardo says:

    He is posting so much on this topic, I wonder if he is getting ready to write a book:

  26. orcmid says:

    Eduardo may be right. The book would be interesting.

    I found that latest, lengthy account from David Berlind to be pretty thoughtful and informative of the possible disconnects that occured.

    I think Berlind’s approach is pretty balanced, to the degree that I can compare it against experiences of my own. With regard to the extrapolations for the future, who knows?

    I also found Mike Miller’s comments to be usefully constructive, including the one at

    This won’t get us the whole picture but it does provide a sense of the foibles and agendas that cloud complex undertakings like this. I admire that Berlind is willing to live with gray and look for fog-clearing, confirmable detail, rather than going polar black vs. white, good guys vs. bad guys, etc.

    There are others who don’t ground their speculations so well. I am impressed by Berlind’s accounts generally.

  27. Patrick says:

    Is it true that MSXML has a binary key?

    From the IBM website:

    It mentions amongst:"Because there is a binary key in the header file of every MSXML document, it is impossible to perfect a basic XSLT XSL-FO transformation of MSXML to ODF and back. The binary key holds critical layout definition information, and the only way to crack it is to reverse engineer it."

    Could anyone explain more to me about this binary key?

  28. Patrick says:

    There is another source which mentions the binary key.


    That binary key holds a great deal of the information that we need about the layout definitions of the Microsoft XML file format. We can do a content-based transformation very well. Microsoft’s content is in perfect XML file format. Their styles, though, are locked up in that binary key. To make any kind of exchange possible with Microsoft XML documents, we have to first figure out how to cope with that binary key."

  29. Eduardo says:

    More history of the Massachusetts decision:

    One point I found interesting from the long Berlind piece is that Massachusetts says that MS XML would go back on the list if it met their openess requirements, in particular full documentation, no patent hinderances, and joint stewardship. Either that or Office could add ODF support.

    Massachusetts position is that there is a continuum of openess, and it set the dividing point where it did because it believed that was the minimum needed for it to achieve its IT goals. What is odd about Microsoft’s response is that it is not really arguing, for the most part, that MS XML’s lessor degree of openess is sufficient for Massachusett’s IT goals. Instead it seems to be arguing that Massachusetts has an anti-Microsoft bias.

  30. Eduardo says:

    Brian, I have some questions about MS XML as compared with ODF.

    ODF, from what I understand, is designed to be a universal file format. That is, the goal is that all the information in any file format should be able to be stored in ODF without loss. This would allow it to be use as the native format in many applications, and, most importantly, a universal translation method between any two different formats.

    To that end, the ODF TC spent years studying many hundreds of different file formats running on applications on various OS’s. ODF cannot yet handle all file formats, but it has moved a very long way toward that goal.

    Now the questions. Was it Microsoft’s intention that MS XML also serve as a universal file format? Or is it designed to handle the data from a narrower range of formats, and if so, what is covered and what isn’t? And during developement, how broad a range of formats was investigated? In particular, what about formats that don’t run on Windows applications but rather only on other OS’s?

  31. I don’t really know where the talk of a "binary key" is coming from. I won’t speculate whether that comes from lack of information or if it’s just malicious, but you can look at the formats yourself and you’ll see that the reports are 100% incorrect.

    I challenge anyone to find a Word XML file that has a binary key containing "crucial layout information." If you do find such a file, remove the binary information and reopen the file. What’s changed in the layout? Of course things like pictures are stored as binary objects, as well as embedded OLE objects since we don’t control what formats the OLE object wants to store itself.

    It looks like there is just a misunderstanding out there that is unfortunately causing folks to jump to some pretty extreme conclusions. All you have to do is look at the facts. Our files are stored in XML, and all the XML is fully documented. You can freely download the documentation, so there is no need to "reverse engineer" anything.

    Eduardo, I’m surprised that you would conclude the ODF format is designed to handle all forms of documents. It was an effort to standardize the existing StarOffice format. I’m sure they will continue to evolve it going forward, but I don’t see it being anywhere close to a universal solution. In that article you referenced Gary Edwards said that they were looking to replace HTML, which has been evolving for over a decade, so I’m not really sure what their true target is. We already saw that a feature as basic as formulas in a spreadsheet isn’t represented. That’s a pretty common one, and if that’s not there, who knows what else is missing. What about custom defined schema support? How can you support all document types if there isn’t support for everyone’s schema? We allow anyone to add their own schemas to our files, which is true extensibility.


  32. Eduardo says:

    Gary Edwards on the binary key, and much more:

    Brian, regarding ODF being a universal file format: You say that it was just a standardizing of the StarOffice format. That is completely false. The ODF TC started with the OO format, and then spent three years looking at hundreds of other formats, including everything they could figure out in Microsoft formats. The goal is a universal file format, at least for all types of documents.

    See for instance this from Berlind:

    "ODF isn’t just for front office productivity applications (word processors, spreadsheets, etc) as has been often implied by the way it is so often tied to (sidebar: it will be supported by other MS-Office substitutes as well; for example solutions from IBM and Corel). There’s no reason, for example, that, regardless of what proprietary markup languages the different wiki solution providers use to put a pretty face on Web authoring, that they cannot natively store those documents in the XML-based ODF. Come to think of it, what documents can’t be stored in ODF? What about browser-generated documents that are authored in GMail, Yahoo Mail or even blogs? Once a few key providers of these different document authoring tools decide to natively store their documents in ODF, then the ODF format could enter a viral stage that turns ODF into the underlying DNA to anything capable of generating text. Were this to happen, Microsoft would have no choice but to support ODF (something it’s apparently considering) since at that point, it would not only be odd man out, the number of ODF-compliant documents being generating by all the ODF-compliant authoring tools in total would begin to catch up to Microsoft’s file formats."

    read the whole post:

    Brian, I think you are running scared on this. You know how fantastically useful a universal document format would be, how it would catch on like wildfire, and how it would be an absolutely mortal threat to Microsoft, so you try to trick people into believing ODF isn’t just that, or at least something pretty close.

    OK, I know I will never get you to admit that publically. However, everyone else can read Berlind’s post and see what I mean.

  33. SlashDotJunkie says:

    Eduardo, you need to buy yourself a clue. Heck, a couple dozen of clues might not be sufficient <g> Here is one free of charge:

    ODF cannot be (usefully and universally) employed for something for which it does not have a well-defined ontology. So far it contains fairly well-defined ontology for OOo/SO supported subset of file formats. If you try to use it for storing something other than that, guess what, nobody would be able to correctly interpret your data before you define ontology for it. And even if you do, it would be useless until included in the standard – the same way as spreadsheet formulas currently are.

    As for Berlind’s post, my suggestion to use TAR/GZIP container stays. To paraphrase: "Come to think of it, what documents can’t be stored in .tgz?" None, really, yet I don’t see "how it would catch on like wildfire, and how it would be an absolutely mortal threat to Microsoft".

    Love, SDJ

  34. Eduardo says:


    You wrote, "ODF cannot be (usefully and universally) employed for something for which it does not have a well-defined ontology. So far it contains fairly well-defined ontology for OOo/SO supported subset of file formats."

    This is incorrect. See this interview with Gary Edwards:


    "When the Open Document Technical Committee talks about legacy systems, we’re talking about at least 30 years of legacy information systems that cross an incredible spectrum of information and file format types. Boeing is an excellent example, and ODF TC member Doug Alberg was a most important driver in the first 18 months of ODF TC work, a period I always refer to as the “universal transformation layer” period because interoperability with legacy information systems was our primary concern. So during that period the legacy needs of large publishing and content management systems like Stellent, Documentum, and Arbortext drove the specification work. It really had very little to do with the ideals of an application independent desktop productivity file format."

    "Boeing is a great case in point. Doug Alberg had to make sense of so many different CAD systems, drawing systems, report systems and federally-mandated filing systems, there was no other way of dealing with the problem other than to come up with a common XML transformation layer. Many of these different information systems were long ago orphaned or outright abandoned by their vendors. Even though they’re no longer supported, they’re still on line, doing exactly what they were designed to do. Much of the world is like this, especially in governments. Which is why SOA is all the rage. You have older systems still on line, still doing what they were meant to do for some business process that’s important. That legacy data needs to be brought into the information flow, where it’s available to global Open Internet systems. XML can do that, and do it easily.

    The first 18 months of the Open Document project were to perfect the Open Document XML as a transformation layer, where all of these legacy systems could be connected to the transformation layer. Once it’s in the common transformation layer, then you can pick and choose which publishing and content management system you would want. You have much more choice. Indeed, many of these next gen systems are excellent for certain solutions, but lacking in other areas. Interestingly, Boeing used each one of the OpenDocument enterprise publication and management systems; Stellent, Documentum, and Arbortext. Imagine having to figure out how to connect all three of those systems to all of the legacy information structures that were still producing their intended functions. With a common transformation layer, you only need write your connectivity once."

    So no, it is not just for the OOo/SO subset. You should read the article and get yourself up to speed on ODF.

    You said, "If you try to use it for storing something other than that, guess what, nobody would be able to correctly interpret your data before you define ontology for it."

    Well yes, but once one of the thousands of users of a particular file format (or the ISV that supports it) develops an ontology for it and contributes it to OCF standard, then it is there for everyone else to use. You seem to think this will happen only rarely, and even then only after many years. I disagree. Extending the OCF standard to include more file formats brings big benefits, and so we can expect that people will be motivated to go ahead and do it.

    Wouldn’t it be fantastically useful if this was done for all file formats, or at least a very large proportion? Or do you think it would be better for enterprise IT users, ISV’s, and open source programers to stay away from working on this?

  35. Eduardo says:

    A detailed report by OASIS on the Massachusetts decision:

    Read this and you will get a lot better sense of Massachusetts reasoning.

  36. orcmid says:

    I think it is awful to keep repeating allegations and claims for which no one makes a demonstration. If there is naughty stuff in file formats, it can be confirmed by inspection. Where is that stuff? And what is its impact?

    For example, there is this recurring claim that Office Open XML is just a wrapper for proprietary binary. I can’t find that. The only thing in binary is what needs to be in binary (e.g., PNG — an open standard — files, OLE persistent-objects because they have to be binary, etc.) So, do all of the GIF files in the SXW of the OpenDocument specification disqualify it as an open format?

    I’ll give you an example of how these crazy speculations get turned into nonsense. The OpenDocument specification comes in two flavors, one PDF and one "OpenDocument" (actually, an Open Office document).

    If you look into the Zip file and examine the XML files, you’ll see that each has a Document Type Definition as part of the XML Prolog. Now that’s interesting, because the ODF specification doesn’t say anything about the XML Prolog and what’s permitted there and how to do it, and the schema doesn’t address the prolog.

    If you open the XML files in a validating XML processor, such as Internet Explorer, you’ll get a failure because the processor can’t find the DTDs that are refered to. The only file locations are local paths like "office.dtd" and "manifest.dtd", and I have nothing like that on my computer. They are not in the ODF file and they are not at a globally-public URL. So I hunted around on the site and found them in the CVS tree.

    So, is this a conspiracy to prevent those documents from being usable by any other software product? Of course not. Carelessness maybe, but not malice.

    (I still haven’t checked them, by the way. They are broken up into multiple files and I haven’t done all of the work to install them properly so that I can get the validation to work without modifying the XML in the ODF. There are also a couple of things that seem odd about how those DTDs have XML prologs themselves and use parameter definitions, and I need to figure that out as well.)

    I did turn in a question on the ODF pubic comment list asking whether there are any rules about XML prologs that need to be added to the spec. For me, this is just an early example of the work that needs to be done before interchange of these formats becomes commonplace and smooth. This is just the start of a long journey.

    The key thing is to get the evidence and put it where others can verify it. And check the sources. You can check everything I just wrote about. (The DTD is hard to find. It is here:

    Now, with regard to universality, I don’t believe any claim to that, even by one of the 18 people whose name is on the ODF specification. If they had done all of the work that is speculated about, there’s be a non-normative appendix in the specification about how that was part of their charge and what the results are. There’s no mention of this in the charter, in the spec, or in any of the statements of work. Also, if it was done, and it means they have approaches to accomodating some well-specified and public formats, I have two questions. First, do they have a way to accomodate Open Document Architecture (ODA), the already-standardized ISO specification alongside of SGML? And if you think that’s not important, how about DocBook? Is there a mechanism for roundtripping DocBook?

    The thing about being universal in the way I am using it, it means you have to somehow accomodate the other folks’ document model. I believe they may have cherry-picked features, I don’t believe they’ve dealt with document model coherence. Because it would be a big deal to point that out in the spec (if it had actually been their charter).

    Saying that you can shoe-horn something into ODF is not an accomplishment. Microsoft Open Office XML can make the same claim. If and when there is an XML format for interchanging Wiki content or micro-content, for example, I would be shocked if either schema were used. I’m sure people might bother to come up with mappings to word-processing document formats, but it is simply not the straightforward way to accomplish interchange of wiki content.

    Let’s have more facts and fewer claims with no evidence provided to support them.

  37. Eduardo says:

    Last year the EU looked at ODF and MSXML and decided ODF met its requirements and MSXML did not:

  38. orcmid says:

    Or, put another way. Where is this universal transformation layer described? There is no language or functionality like that anywhere I’ve looked in the ODF spec. I haven’t read it all, but if it is so fundamental to the architecture of ODF, where is it addressed?

  39. Eduardo says:

    ODF **is** the universal transformation layer. Once you have converted a file format to ODF, then it is in the same langauge as any other file format that has been so converted. See the Gary Edwards interview linked above. You also might wnat to check out the Wikipedia article.

  40. orcmid says:

    Eduardo accepts the claim: "ODF **is** the universal transformation layer" based on "Once you have converted a file format to ODF, then it is in the same langauge as any other file format that has been so converted."

    That’s a tautology and meaningless. It presumes that someone else has already arranged the conversion. I can say the same thing for XML without ODF, I can say the same thing for Office Open XML, and have added no content to this discussion (and my statement is just as accurate).

    I just did a search for all uses of the word "transformation" in the ODF specification. The majority of them are about how transforms are specified for drawings and images (and format objects).

    A few others are about some things have been done to make XSLT go easier.

    A couple of those assertions seem to be mistakes, since you actually have to check attributes of elements for some information even though the idea stated elsewhere is that you can ignore the tags and just use the text. But then it seems that material marked as hidden will be revealed in a straightforward transformation. Heh.

    These few steps by which certain transformations "should be simpler" are about possible transformations *out* of ODF for repurposing purposes. There is nothing that demonstrates how well ODF serves as a universal carrier or any work that was done to assure that it is any more so than any other highly-functional office document format using XML.

    I don’t doubt that the member from Boeing did a lot of work, I just don’t know what it was and where the concrete results are. They are not reflected in the specification in any material way that has surrendered to my inspection so far.

  41. Eduardo says:

    The whole discussion is good:

    and start at the comment "Which Binary key?"

  42. SlashDotJunkie says:

    orcmid, it’s time for us to leave trolls to themselves. There is no chance they start thinking critically or stop their copy-n-paste arguments, so I say let them gnaw on their own feet 🙂

    Love, SDJ

  43. orcmid says:

    Look, it is just another claim, extrapolated to problems about getting a particular kind of transform to work.

    It’s simple. Show us the key. Show us that it has anything to do with style preservation. But first, show us the key. Brian proposed a trivial experiment.

    Find what you think is the magic key, delete it, then see what difference it makes to Microsoft Office. That’s *really* simple.

    I just did the following thing. My M.Sc dissertation draft is a large Word Document. I just opened it in Word 2003 and saved it as WordML (not Open Office XML, the current Word 2003 XML format). I notice that there are attributes so that the files saved in XML can be opened by the original application, but I got around that by opening the document in FrontPage (where I have that feature turned off). I then asked FrontPage to verify the formatting and it was happy. I then asked FrontPage to pretty-print the layout. The file got a lot bigger, but then I could scroll through it and know what I was looking at.

    All binary (actually, hex and BinHex encodings of one kind or other) were in two kinds of elements, <w:BinData> and these are all images that I created outside of Word and pasted into the document, and <w:fldData> which are tiny encoded elements I have no idea about.

    What I do know is that a 3rd-party PDF plug-in is able to navigate the document and make a perfect PDF of it, preserving everything that is visible, including the table-of-contents, cross-references, and hyperlinks in the thesis. This isn’t a complete test, but it suggests that maybe the breakdown the Groklaw people observed has a different explanation and the presumption of the worst trumped that. Perhaps.

    I can’t find anything that looks like a "key" or any other secret code sort of thing.

    You can do this yourself. Open in a non-Microsoft XML editor and have it fix the layout (so you can match up start and end tags) and see if there is a problem. Then we can look it up in the WordML documentation and see what we’re really dealing with here.

    How can I tell that Groklaw is telling the truth when they don’t say how to verify what they say. The formats are real tangible things. The WordML documents are available for inspection. Where’s the key? Show me a key (in context, please).

  44. Yuki says:

    Translation of this blog entry:

    This may be old news to you but, since they guys from Information Control told me to, I’ll tell you anyway.

    Since nobody’s going to change to Office 12 because it looks completely different and most of Office and Windows users shit their pants if an icon changes shape, we’re going to use the Closed XML format to force them to upgrade. The first step is infecting all the other Office versions with it then, after our customers have developed a dependency on that evil format, things will suddenly start to break here and there and everyone will have to buy Office 12 even if they never wanted it in the first place. Office 12 will also work much better under Windows Vista (another product only MS fanboys want) then under Windows XP (what everyone is using and what everyone will try to keep when they find out Vista looks completely different).

  45. Patrick says:

    If there is no binary key, as Brian mentioned, than I am willing to believe that.

    Interestingly though, as mentioned by Gary is the following quote:

    "If the MSXML binary key and software bindings do not exist, then Microsoft (and everyone else for that matter) should be able to provide the marketplace with clean clear transformation filters enabling easy conversions from MSXML to ODF and back? If they did this, then their software would meet the Massachusetts requirements. But they don’t!"

    Indeed, why does the MSXML not suitable for MA?

    They did mentioned something about missing documentation of a sort during their speech they gave.

  46. orcmid says:

    I worked up an example where I went searching for the binary key that is being talked about here. I think I choked the blog comment filter on the markup, so I recreated the comment as a post on one of my blogs:

    I did everything I could to cross-check with the reports about the binary key, but I got lost when Exchange Server and invisible converstions to XML and back were dragged into it.

  47. Patrick says:

    In your example (on your blog), if you replace the ‘’ with e.g. ‘’ you document get scrambled.

    I don’t know about which ‘binary key’ Gary is referring too, but changing the ‘microsoft’name into ‘microsofty’ and having such impact on the document is not something I would expect.

  48. Patrick says:

    If you have to use ‘’ in the xml file, which serves no purpose at all further, it is odd.

    The only thing I could think of why it does that, is in case the format is patented, in which case your documents have to refer to a patented scheme.

  49. Patrick, I’m not sure if you’ve worked with XML much, but there is a feature in the XML standard called a "namespace." This is how you can uniquely identify what type of XML you are reading.

    Namespaces are really powerful, and without them it would be difficult to know what you are looking at and what schemas should be used to validate the files. When you change a namespace, you are essentially saying that XML is now a different type of XML. In Office, we support opening everyone’s XML files. If it’s a namespace that we don’t understand, we just treat it like any other custom defined schema. If you read my older blog posts that are all titled "Intro to Word XML", you can see more about how we work with custom defined schema.

    You could just have easily changed the "http://&quot; part of the namespace and had a similar experience. There’s no conspiracy there… just standard XML practices. 🙂


  50. orcmid says:

    I wanted to add to Brian’s comment about namespaces. All of those xmlns:mumble="someURI" attributes in XML elements are very important.

    However, the someURI doesn’t have to refer to a real web page or even be in the form of a URL. What it has to be is a unique identifier that someone owns, usually by owning the domain used in a URL. I could make up "…" URLs and it wouldn’t be kosher. I’d get boo-ed by the XML community, at least. The other part is that rules of the format are determined by schemas that are tied to these namespaces. If you change it, and it isn’t a namespace that Word recognizes, it will do something else, as Patrick saw and Brian explained.

    The schemas get published and the namespace they go with (if they go with a namespace) is declared in the schema. Schema-aware software caches these schema definitions and then applies them where they see the namespace be used.

    Applications, like Word, may have namespace sensitivity built-in, but the schemas are published anyhow in support of interchange and interoperability.

    The same is true of the OpenDocument format, and the Relax NG schema (and ODF documents) use namespaces heavily.

    BIG TIP: The prefix (mumble in my example, above) doesn’t determine the namespace. The someURI does. The prefix is an abbreviation that is used for the namespace and the prefix can be changed to whatever’s useful in a given situation. It’s a kind of alias. Microsoft keeps theirs real short because they want to keep the XML file compact. Other people have theirs be more descriptive because they are intended for people to use or at least to understand.

    I think Brian posted about this earlier in comments elsewhere on this blog.

  51. orcmid says:

    I took another look at the business about universal transformation. My post about it is at

    I don’t think this has anything to do with what Massachusetts was after, based on the accounts I read. There conditions about open formats and open standards don’t require universality in any way I noticed. Otherwise, why add PDF to the list? Why leave the door open to other formats?

  52. scot says:

    I am a Mac head and would like to know what version of Office for the Mac will be supported and when with any MS XML ?


  53. The Mac Office team has said that the next version of Mac Office will support the new XML file formats. They also plan to provide updates for older versions to also support the new formats, but I’m not 100% which versions they will do that for.

    Here is a bit more information on that:


  54. Inquisitive, I’m sorry I never responded to your question on the long term availability of the licenses (ie can your grandchildren use them). As I’ve mentioned before, the head of Office (Steven Sinofsky) sent a signed letter to the European Union guaranteeing that we would continue to make our formats open and available under the royalty free license from this point forward. That’s about as much of a clear cut statement as I could imagine on the matter.

    You also asked what would happen if for some reason in the future Microsoft no longer exists. In that case there wouldn’t be the need for the license, so that’s not an issue either. This means that everyone will be free to use the formats going forward and we’ve guaranteed that.


  55. I’ve had a few folks ask me about the XML format from Word 2003, and whether or not it would be supported…