Interoperability of the Office Open XML formats


A comment was posted today that had a lot of thought put into it and rather than just replying to it in the comments stream I thought it would be worth talking to directly. I have to apologize for not replying sooner to it, as it looks like John actually originally posted this a week or so ago (thanks for reposting it John). Here is what John had to say:



Sorry for posting twice, but it kinda sunked in your forum …


Hello, Brian,


I have been reading your blog nearly since it started.  Let me first stae that i have been using Office at work for nearly 10 years. Great piece of software, I like it. But, one main nasty thing is inside. That beast is creating things that can be readed only by herself.


Sad to say, my initial thoughts have come true. The game on interoperability which you play on behalf of Microsoft is very strange. Here is why.



  1. The main reason why you started with interoperability was extreme pressure from governments, companies and users, who were  for years locked in your own DOC, XLS, and PPT formats, and nobody in the galaxy except your Office (moreover:last version of it) could properly open the files created. I have been dealing with this as a network administrator for a long time.

  2. Your marketing guys know the power of this. They know, that the main selling point for Office is not the interface, not its user friendliness, not its relatively low system requirements, but the fact that they and only they can offer their customers compatibility with their own created documents.

  3. But because of that extreme pressure, you must have reacted some ways. So you decided

    1. to create your own format, which will be somewhat documented, but so complicated that nobody in the world at least for next 2 years will be able to open files the same way as your last Office 2007 will. Really great window for competition. You will have open format, but your mantra – real world compatibility with documents – will be again possible –  only by your software. Aim achieved.

    2. constantly show the imperfections on ODF. Maybe it is true – but your format also make step-by-step approaches .. i am not very much into this kind of stuff, but from i ahve read on your blog, it is clear. Every draft of OpenXML is cheered with passion, and strange problems with ODF are analyzed. Even they are real – why this approach? You know why … the reputation of ODF must be diminished. Aim achieved.

    3. put the conversion job to another company, which you sponsor. That is also great for you, because the users (when the converter won’t work perfectly – and it won’t, primarily because it CAN’T, your marketing does not want this) will not blame MS, but that poor CleverAge company or what their name is. So you will kinda support that bad nasty ODF format, just to respond to the requests. But you will not be responsible for the software that is doing it. Aim achieved.

  4. You are doing this in spite of that you still can’t sell the software that is doing this all … called vaporware, I know that Office 2007 will deliver finally. But all this hype and type is here with good reason … to make users forget, that now and for many more months they are still stuck will DOC, XLS and PPT Office 2003 version, which are good everyday work tools, but still continuing create the files in closed formats, despite all that talk about new interoperable ones. That’s why you CAN talk about those gazillions of documents created and having responsibility about – if you did not lock your users, you would not have that responsibility. Your users did not want you to take one. And by this talk, you are masking the fact that you gained one more year (at least) of creating it. Aim achieved.

Finally, I am impressed by what Microsoft has done. You are talking about all what your customer wants, blah, blah. But the single thing you are capable and really wanting to do is to deliver your next version of Office, which will be great and i will use it …. but still not able to feed the interoperatibility beast. It would be fair to say, that it was never your goal anyway. If it were you would



  1. develop the two sided converter OpenXML-ODF on your own, stop bitching about its imperfections and glitches, and seamlessly integrate it into Office without all this hype. If your developers are having short-time of problems, you could have bought a company or employ people  which are doing it. You did it many times in history – Windows Defender being the last example.

  2. as a member of nearly all the file format commitees in the world, be constructive and not obstructive. If you managed to create another “open” format, focus on its interoperability and not the glitches in other formats. Your goal is to work with them as well as possible, just as the other world must have dealt with yours.

  3. make a version of Office viewer (at least !) or editor for other OSes, the same way as you did with Internet Explorer, when you wanted to gain market share – you even did it for Solaris! and Windows 3.11! And stop telling the world such a things that it is not possible because of this and that. The case of IE shows that Microsoft can do everything to gain its market share.

But, you will never do those things, and many more which you could for real interoperability. That word is just a cliché for Microsoft, only working as a one-path migration to your products. And then there is no way back.


John


I’ll actually reply to each point separately and hopefully this will help clear things up.


Reason for building an interoperable format



The main reason why you started with interoperability was extreme pressure from governments, companies and users, who were  for years locked in your own DOC, XLS, and PPT formats, and nobody in the galaxy except your Office (moreover:last version of it) could properly open the files created. I have been dealing with this as a network administrator for a long time.


This is definitely not true, and I’m sorry if I haven’t done a good enough job of explaining why we’ve been moving towards open formats for years now. The initial binary formats were designed back in a time when performance of reading and writing from a floppy disk were more important than shared documents via the internet. Things have obviously changed dramatically since then, and we’ve been working on HTML and XML formats since back in the late 90’s. The whole reason we wanted to move to an open format was that it increases the value of Office as a platform. That’s a direct benefit to us. Look at the example I gave earlier this week of MindJet using the Open XML format. That kind of interoperability with Office is a huge value add not just to our customers but to us. We didn’t move to open formats simply because there were a few of us who were morally opposed to closed formats. We are developers and we get the most excited when thinking about all the things people can build if we open our formats. We moved to open formats because it was the best business decision, and the more people building solutions and operating of Office documents, the better.


Now, if you actually were talking more about the standardization and licensing pieces, and not the open/transparency piece, then you are slightly correct. We wanted the license to be open because (as I said above) the more people building on top of our formats the better. We went with what we thought was a good royalty-free license (meaning you don’t have to pay anything to Microsoft) back in 2003 for the Office 2003 reference schemas. We had also planned on using that same approach with the new Office Open XML formats, but we started to get feedback specifically from those in the open source community that the royalty-free license was still too restrictive. So last fall we changed our approach and moved to use the CNS which was essentially a non-assertion statement where there was no obligation on the developer (no license required). We simply state that you can use the formats as you wish and if we have any patents on the format we promise not to enforce them. This was very well received, and open source lawyers such as Larry Rosen came out and stated that the approach was good. We have since started to apply this same approach to other Microsoft technologies as you may have read about last week.


In terms of the standardization, that decision was definitely strongly influenced by governments. We’d always planned on having rich complete documentation for the formats, because otherwise no-one would use them and it would have defeated the purpose of all the work I’ve been involved with (I certainly didn’t want to waste the past 6 years of my life). We had a number of governments tell us that they would be more comfortable if we gave the formats to an independent standards body so that regardless of what happens to Microsoft, the documentation for the formats will always be available. It’s really a matter of stewardship of the documentation. Aside from the management of the format issue though, we’ve actually significantly benefited from the standardization process because it’s given us more opportunities to work with customers, partners, and competitors on ensuring that the formats truly are interoperable. There were actually some areas that we’d completely overlooked but the Ecma TC pointed them out and as a result changes were made. This is very obvious just by looking at the differences between the initial submission and the most recent draft.


Compatibility with existing documents



Your marketing guys know the power of this. They know, that the main selling point for Office is not the interface, not its user friendliness, not its relatively low system requirements, but the fact that they and only they can offer their customers compatibility with their own created documents.


You are absolutely 100% correct that compatibility with the existing base of Office documents is crucial when we sell versions of Office. We have entire teams dedicated to backward compatibility and migration issues that may come up. That was why we had to continue building on our own XML formats rather than switching over to another format like ODF.


Focus while developing the Open XML formats



But because of that extreme pressure, you must have reacted some ways. So you decided



  1. to create your own format, which will be somewhat documented, but so complicated that nobody in the world at least for next 2 years will be able to open files the same way as your last Office 2007 will. Really great window for competition. You will have open format, but your mantra – real world compatibility with documents – will be again possible –  only by your software. Aim achieved.

  2. constantly show the imperfections on ODF. Maybe it is true – but your format also make step-by-step approaches .. i am not very much into this kind of stuff, but from i ahve read on your blog, it is clear. Every draft of OpenXML is cheered with passion, and strange problems with ODF are analyzed. Even they are real – why this approach? You know why … the reputation of ODF must be diminished. Aim achieved.

  3. put the conversion job to another company, which you sponsor. That is also great for you, because the users (when the converter won’t work perfectly – and it won’t, primarily because it CAN’T, your marketing does not want this) will not blame MS, but that poor CleverAge company or what their name is. So you will kinda support that bad nasty ODF format, just to respond to the requests. But you will not be responsible for the software that is doing it. Aim achieved.

Hmm, I’m afraid I may have appeared more critical of the ODF format than I really ever intended. Let me address each of these:



  1. Somewhat documented? Have you read the spec? I’ve actually heard the criticism that it’s overly documented, but not “somewhat.” We go into great detail on every single element in the spec. Not only that, we have a separate part of the spec that is a few hundred pages on its own and it designed to be the ultimate “tutorial” for using the spec. We have bent over backward to document every single thing that could possible exist in these formats. Is it complex? You Bet! I don’t think anyone has the delusion that you could build the equivalent of Office overnight. Word, PPT, and Excel are extremely powerful application and there is nothing else out there that comes even close to matching the feature set. So if someone wanted to build an application that matches, they will have some work ahead of them. You don’t have to implement the whole spec though (look at the MindJet guys). There will be plenty of people who come along and don’t want to go to the point of matching Office feature for feature. Instead they’ll want to build more lightweight applications and that is absolutely attainable. That’s the key scenario we had in mind when we developed these formats in the first place.

  2. If you look at my blog, I probably spend less than 5% of my time discussing ODF. The only reason I talk about it is that people have asked me why we didn’t use it as our default format. A simple “it wouldn’t work” answer obviously isn’t good enough, so I had to show specific examples to help explain my view. ODF is perfectly fine for some scenarios and not for others. Open XML is perfectly fine for some scenarios and not others. HTML is perfectly fine for some scenarios and not others. DocBook is perfectly fine for some scenarios and not others. The reason I get excited and talk up the progress on Open XML is that I work on the thing. It’s my baby and my coworkers, my fellow Ecma members, and I am extremely proud of the work. Look at the IBM and Sun blogs discussing ODF and compare that to my blog. Who is being more critical? Most of the ODF supporters dedicate the majority of their time criticizing everything we do.

  3. Here in Office, we get thousands of feature requests from our customers. Unfortunately we can’t do everything, and ODF support wasn’t even on the radar until late in the game. I think the fact that we’ve help create this open source project and are offering resources to help drive the project is awesome. The fact that it’s an open source project that allows for 100% transparency is even better. If people don’t like how the converter works, they can offer suggestions for how to make it better. I want it to work because it’s a great example of the power of open XML formats.

Office 2007 vaporware?



You are doing this in spite of that you still can’t sell the software that is doing this all … called vaporware, I know that Office 2007 will deliver finally. But all this hype and type is here with good reason … to make users forget, that now and for many more months they are still stuck will DOC, XLS and PPT Office 2003 version, which are good everyday work tools, but still continuing create the files in closed formats, despite all that talk about new interoperable ones. That’s why you CAN talk about those gazillions of documents created and having responsibility about – if you did not lock your users, you would not have that responsibility. Your users did not want you to take one. And by this talk, you are masking the fact that you gained one more year (at least) of creating it. Aim achieved.


What do you want me to do here John? We’re working as fast as we can to get Office 2007 and the Open XML formats out the door. We’re providing free updates to older versions of Office that allow them to read and write the formats. We’re standardizing the formats and providing thousands of pages of documentation that describe every last detail about how they work. Do I wish we already had this all out there? Of course! It takes time though. Office 2000 users have the HTML and RTF formats available. Office XP users have the HTML, RTF, and SpreadsheetML formats. Office 2003 users have the HTML, RTF, SpreadsheetML, and WordprocessingML formats. And once Office 2007 ships, they will all have the Office Open XML formats as well. We’re working on it Jon, but it takes time.


Interoperability as a key pillar of Office 2007



Finally, I am impressed by what Microsoft has done. You are talking about all what your customer wants, blah, blah. But the single thing you are capable and really wanting to do is to deliver your next version of Office, which will be great and i will use it …. but still not able to feed the interoperatibility beast. It would be fair to say, that it was never your goal anyway. If it were you would



  1. develop the two sided converter OpenXML-ODF on your own, stop bitching about its imperfections and glitches, and seamlessly integrate it into Office without all this hype. If your developers are having short-time of problems, you could have bought a company or employ people  which are doing it. You did it many times in history – Windows Defender being the last example.

  2. as a member of nearly all the file format commitees in the world, be constructive and not obstructive. If you managed to create another “open” format, focus on its interoperability and not the glitches in other formats. Your goal is to work with them as well as possible, just as the other world must have dealt with yours.

  3. make a version of Office viewer (at least !) or editor for other OSes, the same way as you did with Internet Explorer, when you wanted to gain market share – you even did it for Solaris! and Windows 3.11! And stop telling the world such a things that it is not possible because of this and that. The case of IE shows that Microsoft can do everything to gain its market share.

I hope I’m not being too redundant here, but interoperability was a huge goal. That’s the reason for the Office Open XML formats as well as the custom defined schema support. Let’s address these three points individually:



  1. I already talked about this above. We get thousands of feature requests and we can’t do them all. The most important thing we focused on in terms of interoperability were the Office Open XML formats. There’s no way I wanted anything getting in the way of making that format the best it could be. Requests for ODF support didn’t come along until much later, and at that point it made the most sense to help start an open source project. We care deeply about the success of the project. I’ve personally helped answer questions on the formats so the transformation can be as good as possible. It’s an amazing example of the power of open xml formats.

  2. We haven’t done anything to interfere with the ODF formats development and standardization. We were on the ISO committee that approved ODF, and we didn’t raise any objections. We’ve never been opposed to ODF. The only thing we’ve been opposed to are governments mandating ODF exclusively. The only times I’ve said negative things about ODF are when I try to explain why ODF was not an option for us in terms of a default file format.

  3. We have a free XSLT available that goes from WordprocessingML to HTML from Office 2003. There are already projects up on openxmldeveloper that are trying to do the same thing with the Office Open XML formats. I’m sure that people will build all kinds of viewers. I don’t think we’re going to shift resources on the Office dev team around to build a Linux viewer, but we’ll definitely help out as much as we can if other folks want to build one. I think the first thing we’ll see are Open XML -> HTML filters which can be used on any platforms. We’ll probably also see folks leverage the converter project to build standalone Open XML -> ODF filters. I’ve never said a viewer isn’t possible, not sure where you got that from.

Thanks


John, thanks again for taking the time to write up all that feedback. I hope that my answers helped you out.


-Brian

Comments (21)

  1. Stephan Jaensch says:

    Brian Jones wrote:

    <i>This is definitely not true, and I’m sorry if I haven’t done a good enough job of explaining why we’ve been moving towards open formats for years now. The initial binary formats were designed back in a time when performance of reading and writing from a floppy disk were more important than shared documents via the internet.</i>

    What a classic straw man argument setup: You refute his argument (no openness) by giving reasons why the past format was binary instead of text-based. The former (open, documented format) has nothing to do with the latter (binary/text format). Images are typically saved in a binary format, still there are many image formats which are considered open.

    You could have opened the binary formats for a long, long time, but you didn’t. You chose to keep them closed and closely guarded. If you were being honest, you would have just admitted that. Everybody knows it and you are moving to a new, open format anyway. Or is it the corporate doublethink culture at Microsoft which doesn’t let you see that? Honest question, I would really like to see your viewpoint on that.

    – Stephan

  2. BrianJones says:

    Stephan, I was actually tempted not to reply to your comment because you are focusing on a very small piece of the overall message I’m trying to get across. We’ve focused hard on making the binary formats a thing of the past. I can’t just ignore your points though, so here it goes 🙂

    I have to admit that I disagree with you on your point about there being two seperate issues. From my point of view, moving to a text based format and being open really do go hand in hand (at least in this case). The move to a text based document format has made it actually manageable to both document and support interoperability. It would have been nearly impossible with the old formats to achieve this. If we really wanted to block interoperability we wouldn’t have invested in RTF, HTML, and XML over the past 10 years.

    The legacy binary formats were extremely complicated and even with the documentation (which we actually did provide to a number people who requested it), it was very difficult to work with. Do you know that the majority of corrupt documents we get from customers are corrupted by 3rd party applications? It’s extremely difficult to work with the binary formats. They are essentially a dump of the internal memory structures of the applications and that made them pretty hard for others to use. There isn’t anyone in Office who feels like our format allows us to win some type of competition though; it’s the features and functionality that make Office so great.

    There are plenty of competitors out there who support the Office binary formats. We’ve never tryied to block people from implementing them. We just weren’t able to put resources into supporting them as an interoperability solution because they weren’t designed that way. It would have been a huge support issue. But if you look at applications like Open Office, Gnumeric, etc. they do a great job working with the formats. In fact, when I use Open Office, it actually opens and saves spreadsheets in the XL binary format significantly faster than in the ODF format.

    Now, I know that the views in the software industry haven’t always been what they are now. It’s true that in the past, some viewed file formats as a way to win some type of competition. But I don’t think anyone has felt that way for quite some time (at least around here). File formats are just a way of persisting the data, and the easier those things are to work with (while still allowing for all the data to be preserved) the better.

    So, if you are upset that the formats weren’t documented sooner, I understand and I’m sorry. I think the work we are doing now is really going to help out there though, and to be honest many folks don’t realize yet that we are doing this. We’ve been talking about it for over a year now, but I still see people speculating that the XML formats are just a bunch of undocumented binary goo, which couldn’t be further from the truth.

    -Brian

  3. orlando says:

    IMHO

    most office software users _dont care_ about file formats.

    I predict this:

    after 4 years of Office 12 delivering, 95% of MS Office users  ( any version ) will still be saving as .DOC , XLS, etc.

    They don’t care about this things, they just want to save his work and let other people see it ( mostly via e-mail attachment ).

    If you want "openness" just document the legacy binary formats. Let this XML thing to specialized users and be honest when you talk about "interoperability" ( i.e.: release a Linux .doc, .xls, .ppt viewer ).

    ( sorry for the grammar/spelling/etc )

              -orlando

  4. Francis says:

    "The legacy binary formats were extremely complicated and even with the documentation… it was very difficult to work with. Do you know that the majority of corrupt documents we get from customers are corrupted by 3rd party applications?"

    Not just 3rd party applications! All versions of Office I have used (through 2003) periodically corrupt their own files.

    Interestingly, this has not once happened to me when using the Open XML format with Office 2007 betas. If that’s not a reason to move to simpler, open formats, I do not know what is! 🙂

  5. Patrick Schmid says:

    orlando,

    I highly doubt that this will be the case. The decision to have the Open XML formats be the default has already had significant impact. Despite Office 2007 still being in beta, there are already several messages a day asking how to open the 2007 formats in earlier Office versions, because users uninstalled 2007, or asking why someone can’t open the 2007 file they emailed to the person.

    I think that many, many users will (without making a conscious decision) end up using the Open XML formats with Office 2007. The only once making a conscious decision will be administrators for their companies and power users.

    Patrick

  6. Bob says:

    Brian,

    Speaking of interoperability, are there any plans to provide DOCX import and/or export converters for WordPad?

  7. Juan R. says:

    Interesting post by John. I read it several times and still are unable to fix he is really stating there.

    Is John claiming that the previous movement to Open format from others was not caused by pressure from market and loosing the Office competition with Microsoft? Are those others non-profit organizations? Are those ‘non-profit organizations’ promoting open formats in other fields of interests to governments and users?

    Microsoft uses their own format, whereas others reuse stuff from W3C. Well, is John claiming that standards becoming from the W3C are purely scientific oriented? If are not then why votes and core design options are only left to consortium members –i.e. folks from paying organizations and invited expertises-. Yes, you can do suggestions to W3C WG (I did many on the MathML list) but they can accept it or reject them in basis to their own criteria (often economical rather than purely technical).

    About “to create your own format, which will be somewhat documented, but so complicated that nobody in the world at least for next 2 years will be able to open files the same way as your last…”

    What does John mean? After 2 years, most of MathML tools are unable to open c-MathML format (none browser i know support it natively -Carlisle, XSLT client side to presentation code is not native support 😉 and even the most sophisticated (e.g. Mathematica 5.2) software got problems to process certain c-MathML files doing use of the most sophisticated features such as OpenMath extensions. Is John stating that MathML format was designed to be complex enough because market share purposes?

    Does John knows that last XSLT 2 (rejected by Microsoft, was not?) is critized for being so complex that is being not implemented and even some folks doubt that XSLT can be completely implemented in practice because extreme complexity of the W3C spec?

    It is true that ODF is imperfect (as Microsoft new format is also!), but I do not see the point here. Why cannot Brian critize ODF approach? Why cannot I applause some of design options of the new Office Math format against the p-MathML reused in ODF? And maybe more important, why W3C MathML people can state, in both active and passive ways, by both formal and informal channels, the limitations of previous approaches as TeX or ISO-12083 when defending themselves from not reusing both, but Brian cannot state here the limitations of ODF and rationale for not reusing MathML in next Office?

    About conversion issues I predict many surprises to people using ODF. In some ODF blog (now I do not remember) I read about an experiment where a piece of Math generated from Mathematica was copied and pasted to ODF and errors generated.

    If my proposal for a profile attribute is finally implemented in next MathML 3 the situation would improve in a future but still, sorry to say this, conversion will be not guaranteed by using W3C standard formats.

    Juan R.

    Center for CANONICAL |SCIENCE)

  8. Anonymous says:

    For those of you mentioning that the binary format was undocumented, I suspect you never tried searching the web.  I’ve found the documentation for Word 97 and Word 6.0 by doing a search for these terms –

    "word file format dxagaphalf"

    Now I know no one would guess that last search term, that’s the name of one of the properties in the format and I only put it in to ensure I’d get some good hits.  And you’ll get different pages if you mix and match some other search terms such as "documentation binary" etc.  I even found documentation for Word 1.0 files once.

    Yes, I realize no one has information on anything after Word 97 but there isn’t that much that has changed, and a fair amount of it is more of the same, not that bad to reverse engineer.  I’ve seen reasonable non-MS viewers that worked with just this information.  Of course, they had their limitations.

    BTW, the comment about why Word was binary originally?  WordPerfect was binary, as well as AmiPro, it wasn’t like MS was being malicious by being the only binary format out there.

    The trick however is not just the format.  How does one implement a full blown text processor?  It is no easy task.  Even if you can read the format, as we can now with Office 2007, it will take a long time for anyone to create something that shows a file just like Word, there are just too many options that interact with each other in unexpected ways.

    Sometimes I get the sense that some people think this open format will magically allow them to easily create an application that will view Word files exactly like Word.  I’m not saying it won’t be much easier to implement a simple or even a pretty good viewer, for certain it is, but the format is really only the tip of the iceberg if you are looking for full-fidelity.

    Brian, this isn’t an attack on the format, nor its interoperability, that has all improved wonderfully (MindJet being an example).  This is just making the point that the format, even when binary, wasn’t always the main obstacle to having full-fidelity, non-MS viewers of Word files, and this isn’t MS’s fault either, it is the nature of such a complex text formatter.

    I don’t know how many applications support ODF, but I’d wager a guess that if you opened the same file in all of them, there will be many small, and in a complex document, several significant, differences between the display of the file.  

  9. A User says:

    I think John makes a lot of correct points, but misses the forrest for the trees.  

    Recall how the interoperability issue played out in the days of the OS wars.  Everybody’s computers could interoperate when everybody used the same operating system.  You could move information between applications reliably by using ascii TXT.  To be interoperable was to be reductionist.

    Something completely different is happening this time.  reductionism and monoculture are at an end.  Is MonoSoft looking out for its own interest?  You betcha:  to avoid being left behind.

  10. Marbux says:

    Brian, I am intrigued by your references to the binary formats in the past tense and repeated references to "moving" from the binary formats to the XML format.

    Has something changed? My understanding is that the apps are too brittle for the surgery required to remove the binary formats and replace them with the XML formats, that the Ecma Office Open XML ("EOOX") format is part of a translation layer from and to the binary formats used for internal processing. Has a decision been made to perform the necessary surgery nonetheless?

    And isn’t the fact that EOOX is only part of a translation layer not used for internal processing the very reason that the ODF Translator can not set ODF as the default file format in Office? As I understand the situation, the Translator performs a transformation between EOOF and ODF.

    And what is this about viewers for other operating systems? Has Microsoft decided to open source the Windows dependencies in EOOX such as OLE objects? Is it not a fact that EOOX implementation on non-Windows operating systems poses enormous development barriers for some document features?

    I do not wish to rain on anyone’s parade. But my evaluation of the situation is that the file formats used internally for Office are still not open and even EOOX is only partially open with closed binary code still incorporated for its full implementation.

    In that regard, I note that the patent license for the necessary Microsoft Open Packaging Conventions for EOOX has now progressed beyond a draft stage and is now in final form. See http://www.microsoft.com/whdc/xps/pkgpatentlic.mspx

    As I explained in my in-depth article soon after the Ecma standardization effort was announced, that license is absolutely incompatible with free and open source licensing requirements and specifically excludes licensing for "general word processing, spreadsheet or presentation features or functionality, operating system technology, programming interfaces, protocols, and the like." Moreover, the patent license prohibits sublicensing. Nothing seems to have changed in the those regards. See http://www.groklaw.net/article.php?story=20051129101457378&query=donnybrook#A3

    I also note your statement on August 4th, ""I think any of you folks who’ve been frightened by some of the FUD that has been spread about the Ecma Office Open XML formats should take a look" at a linked "legal opinion" on EOOX intellectual property issues.

    http://blogs.msdn.com/brian_jones/archive/2006/08/04/688932.aspx

    I hope that your statement about FUD was not directed at my article. I bent over backward to approach the subject from a neutral standpoint. And the "legal opinion" Microsoft commissioned does not seem to address a solitary point made in my legal analysis. So far as I know Andy Updegrove and I are the only people who have actually dissected the relevant documents as a whole and published our analyses. My own article was couched as suggestions for Microsoft to improve and clarify its relevant legal documents. I believe that both of us took a principled approach to our discussion. So I hope that you might clarify whether you intended your remark to encompass my article.

    Best regards,

    Marbux

  11. eshwar says:

    first of all great blog. you are really doing a fine job of providing insights into the new office 12 xml based formats. i agree with you it was in microsoft’s interests to move to the new format since if you would want to have developers building solutions on top of office you will have to move from the binary format since supporting that format would be a nightmare.

    but you have to agree with the fact that governments and many large corporations played a huge part in your decision as well. atleast for the ecma standardization part wouldn”t have been done without pressure from some govts. but whatever it is, i think office is moving in the right direction. i hope to see many cool applications developed on top of these new formats. i hope to soon find some time to play with these new formats.

    once again great blog and continue the good work.

  12. davidacoder says:

    "Has something changed? My understanding is that the apps are too brittle for the surgery required to remove the binary formats and replace them with the XML formats, that the Ecma Office Open XML ("EOOX") format is part of a translation layer from and to the binary formats used for internal processing. Has a decision been made to perform the necessary surgery nonetheless?"

    Are you suggesting that Office should use EOOX as its internal data format at runtime in memory?!? If yes, what a nonsense request. I hardly know ANY app that uses its storage data format as the data format for its runtime data structures. You would get the slowest app you could possibly dream of. Almost all apps use different data structures from their storage format at runtime to get good performance. Just imagine what a slow dog you would get if everytime someone types a character in Word you need to actually parse and change your XML, if you use that as your data structure… But, once you accept that the runtime data structure just simply HAS to be different from the storage structure (and by the way, this is almost 100% true for OpenOffice as well), then of course the storage format (i.e. EOOX) is different from the binary representation of the document in-memory at runtime and you have a translation going on when you open or save the document.

    You are not seriously requesting that any app that claims to support an open format HAS to use that format as its runtime in-memory strucutre, are you? 🙂 Loading and saving in the open format should surely do.

  13. davidacoder says:

    "And what is this about viewers for other operating systems? Has Microsoft decided to open source the Windows dependencies in EOOX such as OLE objects? Is it not a fact that EOOX implementation on non-Windows operating systems poses enormous development barriers for some document features?"

    There are a couple of points with regards to OLE. With OLE, essentially ANY third party app can embed data into a Word document. Word (or MS) does not control the format of this embedded stuff in any way. So you can ALWAYS end up with a Word document that has embedded OLE data (say from some third party graph editor, whatever) that is in a format no one but this third party app understands. And of course many of those apps will only exist on Windows. So, how could a viewer on Linux show a Word document with an embedded OLE part for which there is no software for Linux to decode that embedded stuff? Well, fairly simple. For every embedded OLE object Word also saves an image (I believe in WMF or EMF or something format, certainly something documented) of how this embedded object looked at save time. So, to just view the Word document, you DON’t need to understand the binary format of the embedded object, you can just render the image that is saved along.

    I think this is as ideal as you can get it. Things can be viewed on different platforms, and if there is an editor for the embedded format for Linux or another platform, it could possibly also be viewed.

    The one thing Microsoft could not have possibly solved is to rule out these binary, non-standard format data islands coming from embedded OLE objects in the new EOOX format. They come from third party apps, not MS apps. MS can’t force them to use any particular format. So, they had two choices: Ban embedded binary OLE data (and therefore break compatability with millions of old documents that just have them) or do what they did.

    How would you have solved this problem differently?

  14. hAl says:

    @marbux

    It would be more usefull to use OOXML or Ecma OOXML for the format as that is the termenologie used by Ecma itself.

    Firstly I do not see why the internal processing of a format would be of interest to you. XML in it’s essence is a horrible internal format as it is cluttered with tag’s that have no use in the internal structure of a program. Only idiots use internal XML structures.

    Secondly I found this in your writings:

    "Such problems are compounded by the lack of definitions for the word "conform" and its variant "conforming." If read restrictively, those terms could reasonably be interpreted as prohibiting implementation of subsets and supersets of the Office Open XML specification."

    However the Ecma OOXML draft clearly has a passage  that states conformance to the format and those clealry allow subsets of the format if documented as such and also the draft discusses extensibility therefore allowing supersets as well.

    I lack simular information in de OpenDocument specs so that it legally unclear when a document that is a subset or a superset can be called OpenDocument. This is potentially bothersome as OpenDocument seems an OASIS foundation trademark that might not be allowed to be used when you aren’t conforming (allthough it is not clear when a format does). So if anything it seems rather unclear when a document can be called OpenDocument and rather a lot clearer when a document can be called OOXML.

  15. Anonymous says:

    @Marbux:

    "In that regard, I note that the patent license for the necessary Microsoft Open Packaging Conventions for EOOX has now progressed beyond a draft stage and is now in final form. See http://www.microsoft.com/whdc/xps/pkgpatentlic.mspx&quot;

    I’ve seen this nonsense before. One of the classic FUD attacks being perpetrated by OpenOffice zealots is to state that while the OpenXML format itself *may be* open, developers would still have to separately license the patent for Open Packaging Conventions. The link for XPS is provided to give the FUD some veracity.

    In my understanding the Open Packaging Conventions is an integral part of the OpenXML – (http://www.ecma-international.org/news/TC45_current_work/tc45-2006-335.pdf) and fully covered by Microsoft’s Covenant Not to Sue.

    More Marbux:

    "the file formats used internally for Office are still not open and even EOOX is only partially open with *closed binary code* still incorporated for its full implementation."

    Oh no! Is it you Gary Edwards???

  16. Bryce Leo says:

    I’ll be honest, I just don’t like Microsoft Office. I happen to use AbiWord, or the online ZohoWriter, because I just don’t need that massive amount of *stuff* that’s inside of word, and I don’t need all of that powerpoint *stuff* either, i just use S5, and heck I don’t even touch spreadsheets and if i want a database i’ll use MySQL and a php front-end thank you.

    You guys have a product that’s designed for a specific type of user, and if you look at OpenOffice there’s still tons more things that the current MS office versions can do that it can’t.  I couldn’t imagine being able to squish all of the odd fucntionality and whatnot into the ODF format rather than one of your own formats. I don’t really like that you have your own format, but I can see why. You’ve got a great tool too, but for the very small subset of features that I use I don’t need it, just like I don’t need Windows either. Hell I use PuppyLinux and it does all of mydevelopment (PHP, Python, Perl, C++) which is just fine for me. It’s all about needs some people need all of that, and some people don’t. It shouldn’t make what you do any less impressive. It’s a shame that it does. Keep up the great work.

  17. John says:

    Hello, all

    it was a huge surprise that my post has appeared as a main article and i have to say big thank you to brian for carefully answering the points. And i admit that he really did explain thoroughly some things – again, great work and nice move from Microsoft and its employees, without irony.

    Let me also state again – I am a network administrator of Active Directory domain, knowing and using MS Office since version 97. Currently running 2007 Beta TR, and hats off, its awesome. I also believe (after you explained that my concern of somewhat documentation is false) that the documentation for OpenXML will be really complete, not as in past version of Windows and DOS which contained undocumented features that only your software could use. But …

    Brian, your arguments are perfectly valid from your point of view as a developer and evangelist of the new formats. Surely, they are a huge step forward – I completely agree.

    And, if

    – Microsoft would not have near monopoly in Office applications

    – the history of your behavior concerning "interoperability" was not so perfectly known all around the globe

    I would not argue any further. But this is not the case. I am simply afraid, that the problems that I experience daily in my network, that the users are unable to work with their .DOC documents except when they created in the same version of Word. Between OpenOffice and Word this is even worse.  And this happens with the simple documents just containing graphics, some wordart and formatting. They just do not look the same and my users are angry about it. That is the interoperability how Microsoft currently does it. And the result of it – when i want to read or edit DOC received in mail properly, I have to buy MS Word and MS Windows. Hmm – that is the best marketing you could have. Will this be better with the new formats??

    Somebody here in the forum wrote, that the key of interoperability is making things simple. I accept you point, that making DOC, XLS, PPT documents working is not easy job, especially when all those documents floating around. But, how can i know, that the complexness of the documentation is driven solely by those real needs, or if it is driven also by need to make those formats so complex that nobody in the world will be able  reproduce them as a whole?

    Well, I am afraid, despite all the good effort beign put in the new formats, that the simple thing – making  a document in Word or Excel and being able to work (that means – editing) with them with software other that MS Office will be as difficult as it is now. I bet that at the same time many companies will build solutions on top of new formats, which could be opened in MS Office. The thing is not about viewing, the key is the editing of documents and that is the main concern I have.

    I could have argued much further. I hope that I made clear where are my main concerns. I am not nearly as experienced as Brian in this area, I simply have big problems with interoperability now and still I am in doubt that they will be solved.

    Sincerely,

    John

  18. John,

    We had no difficulty obtaining the full spec for the Excel BIFFx file formats from an MS Press book and our customers have exported and read in many millions of Excel files over the years. Admittedly there were various errors in that documentation, but you would be insane to say they were put in on purpose to obfuscate the format.

    You figure the issues out pretty quickly (if you do QA that is!)

    There are also other 3rd party native BIFF writers on the component side such as ExcelWriter etc, whose developers have gone through the same process.

    Even the <;)>unmentionable</;)> OpenOffice does a pretty good job in reading and saving binary Excel.

    MS also made it easy to read and write Excel files with DAO/ADO/OLEDB etc, so admittedly, not multi-platform solutions, but they were hardly trying to lock-in people to the Excel app itself.

    So whatever some people and organizations come out with, you will ALWAYS at least be able to get all the underlying data out of any version of any Excel file, even if MS goes out of business next week, all it’s source code (and escrowed stuff) and internal documentation is destroyed in a fire, etc etc.

    We have already developed support for reading and writing OpenXML spreadsheets in the next version of our app and are awaiting the final nailing down of the spec and the documentation of the XLSB / XLSM formats, which MS have promised to deliver, as well as the documentation on encrypting XLSX.

    It was a hell of a lot easier than the old Excel BIFF, that’s for sure.

    We are not a huge company and have bunches of other features to deliver for our main product, so the fact that we were able to do this in a couple of months (even creating our own Packaging IO engine in C++, rather than using the .NET ready rolled one) speaks well to the new formats.  As Brian correctly points out, you don’t have to support the ENTIRE spec for it to be valuable.

    Brian: I hope you or David G will let me know if you get any corrupt / odd files that were generated by our products.  I seriously doubt it, since we have not had a support issue on it since Excel had a smaller market share than Lotus;)

    I would also like to thank Brian and David Gainer for their excellent blogs over the Office 2007 release cycle.  There has never before been this level of quality information flowing about an unreleased version of Office in Microsoft’s history.

    Gareth

  19. hAl says:

    John, have you seen interoperability between applications using opendocument ???

    That is currently a disasterarea as well.

    If you edit an ODF wordprocessing file, created in OOo, in KOffice you are likely to loose parts of both text en formatting. Having the specs does not guarantee interoperability especially not with complex formats.

    So the lack of interoperability not just due to lack of documentation (allthough that documentation certainly was not very good) but also due to a lack of communication between the different implementations about the formats.

    And still today you have several parties that do not communicate with MS and OOo communication going mostyl trough weblogs like this…

  20. Chris Capossela (head of the Information Worker business group) was over in the UK today and I got to…

  21. Chris Capossela (head of the Information Worker business group) was over in the UK today and I got to