MS Office Open XML Formats and OpenDocument XML format


I’ve had a number of questions and e-mails asking if the new Office open XML formats are going to be the same as the OASIS OpenDocument format. Rather than reply to the various comments & e-mails separately I figured I’d just attempt to summarize everything as a new post. Sorry it took so long to reply to this issue, but I have been distracted by TechEd for the past week. Scoble actually swung by and talked with Jean Paoli and me about this just before I left for teched. You can watch that video here: https://channel9.msdn.com/ShowPost.aspx?PostID=76169


The primary question I’ve been getting is whether or not the two formats are the same. The two formats are very different even though they both use ZIP and XML because they use different schemas. The basis for the OpenDocument format work was the OpenOffice.org XML file format (http://www.oasis-open.org/committees/office/faq.php) that originated I believe with the StarOffice product, where the goal of that group was to create an open and interoperable format. Similarly, our goal in Microsoft has also been to create an open and interoperable format. That’s why we made such a big push to use both ZIP and XML, because they are already so widely in use. Actually a lot of other people in the industry also use XML with ZIP to create XML based formats, for example in the CAD industry it’s great because XML compresses so well with ZIP and provides an easy to use container. That wide use makes it easier for people to take our formats and build on top of them. This is where the similarity between the two formats stops though: Our primary goal at Microsoft was to create an open format that fully represented all of the features that our customers have used in their existing documents, documents that have been created using the existing Office products over the past couple decades.


Office has over 400 million customers, and we have a responsibility to continue to support all existing documents and all the existing functionality.  There are billions of documents that we are going to help move into our new XML formats, and so a key constraint on all of our efforts was that these new formats had to support all those existing files and features with absolutely no loss. To give you an idea of how big of an undertaking that can be, we have more than 1600 XML elements and attributes that reflect the features in Word alone in Office 2003. This is why we had to design a new format instead of shoehorning our features in another existing format (Jean Paoli explains this in the video on Channel9).


Let’s talk a bit about the interoperability of the two formats, since that’s an important topic to be clear on. Because both formats are open and documented, it is possible to create a transform (or filter) that goes between the two. The interoperability problems will start to come up if there are features that are present in one application but not present in the other application. You have to assume this will be the case since every application out there has a different set of customers that request different features. From the Microsoft point of view we have so many features we built over the years and it would be extremely unlikely that those features work exactly the same way in other applications. Believe me, there are *tons* of features in Word, Excel and PowerPoint, and we have a responsibility to our customers to continue to support them.


I’m hoping that over time, as we publish these new schemas and provide documentation, people will start to build tools for going from our formats into other formats (and vice versa). We already did this with Word 2003’s XML when we build an XSLT to transform into HTML that you can find here: http://www.microsoft.com/downloads/details.aspx?familyid=19676b18-1bcd-4852-93ba-0b5a203ea731&displaylang=en. There is also an example up on the web of how to use XSL-FO with WordML. I’m also going to push hard for us to build more of these transforms that we can post up on the web. I’ll probably start posting some example stuff for the Word 2003 XML over the next few months since that’s the format that is currently out there for people to play with. Let me know if there are any simple transforms you’d like to see. Also please tell me about your experience if you do have converters that you built for Office 2003 XML and how we can make things easier to build.


-Brian

Comments (56)

  1. James Snell says:

    Hello Brian,

    Good post. I posted some comments over on my developerWorks blog @ http://www-128.ibm.com/developerworks/blogs/dw_blog_comments.jspa?blog=351&entry=83640

  2. Gene Myers says:

    I’m sure that I’m not alone when I say how substantive I found the sessions and talks hosted by you, Scott W., Shawn V., and Chad R.

    Thanks for presenting these well prepared and dynamic events. I hope that your team got as much from them as I (we) did.

  3. BrianJones says:

    Thanks Gene! I had a great time last week talking with everyone about the new formats. It was well worth the trip down there. I’m glad you found it as useful as we did.

    There will be plenty more information to come over the next several months. If you can, you should try to make it out to PDC in September. We’ll have some more sessions on the file formats, as well as individual sessions for the different applications. I hope to see you there! Also, let me know if there are specific topics you’d like me to drill into, and I’ll try to write some stuff up.

    -Brian

  4. Christoph says:

    hello,

    as far as I know both OpenOffice and the OpenDocument-Folks are an open group. You could have talked to them about your problems with the format and I suppose the would have helped you.

    StarOffice/OpenOffice has also many customers that are using all the different versions and features but they’ve managed to integrate it with the new format. The real problem is NOT the format. It’s the will to lock people in on MS-Formats.

  5. Thomas says:

    Please don´t blame OASIS and the OpenDocument file format not to be able to not represent Microsofts features. OASIS welcomed every entity to help in the process of defining what is now know as the OpenDocument file format. But OASIS-sponsor Microsoft chose to better not to so you can now rant about how bad this file format represents Microsofts features and why you cannot use the OpenDocument file format and so you ultimatley have to define your own file format… bla

    ===

    I’m hoping that over time, as we publish these new schemas and provide documentation, people will start to build tools for going from our formats into other formats (and vice versa).

    […]

    Let’s talk a bit about the interoperability of the two formats, since that’s an important topic to be clear on. Because both formats are open and documented, it is possible to create a transform (or filter) that goes between the two.

    ===

    Ok let´s push the clouds away.

    This means after aquiring or upgrading Office 12 i can open/save Microsoft *.doc formats and Microsofts XML file formats. But like Microsoft Office today I — better: Microsoft Office — do not have the ability to open/save (or import from / export to) the OpenDocument file format.

    All you say is "it is possible to create a filter …" — so yes, it´s possible. *But* is it possible for you? And will you guys provide such a filter out of the box or as separate part? Or will you guys just sit there and tell me that "it is possible" and "people will start to build tools" for going from one format to the other?

    ===

    Our primary goal at Microsoft was to create an open format that fully represented all of the features that our customers have used in their existing documents, documents that have been created using the existing Office products over the past couple decades.

    ===

    Hmm let´s see. OASIS welcomed others to help in defining what is known now as the OpenDocument file format so their features can be represented in this file format. Alas Microsoft sponsors OASIS it refused to take part on the standardisation instead you guys implemented once more a proprietary solution insted of working together and implementing an standardized solution others have agreed on?

  6. orcmid says:

    Thanks Brian. That’s helpful. It also sounds like it is going to be complicated to deal with the feature sets in the schemas too.

    I made a pass at getting the metro materials, but I was too tired to wade through the agreement page so I passed on the download.

    I’ll give it another run, because I like the generic use of Zip as a packaging mechanism, and if you guys have worked out good interoperable conventions for that, why not play along. Makes me think of DocFiles II, but using public specifications (whatever that means) and existing standards (including the de facto Zip!).

  7. Anon says:

    Brian, while it’s true that MS Office and OpenOffice.org will differ on features, it’s also understood that most people only use a small subset of the features available. Does Microsoft have to have 100% feature compatibility before they will support the OpenDocument format? I argue that there is no need. You support the features that you can and you ignore the rest. It’s not like these features will make the document not readable. The whole point of XML is that it’s flexible. Please reconsider at least some support. If your format is truly open (meaning Open Source can read/write without license problems), OpenOffice.org will probably support your format, and if you want true interoperability you will do the same. After all, who has more resources available to them?

  8. ghibertii says:

    I would have to be a litte skeptical of this proprietary MS format. Just another way to lock users in to your products?

  9. frindly says:

    which format have office 97 documents?xml???

  10. Kaiwai says:

    Just a side note, the OpenOffice.org format isn’t zipped, its gzipped, which is a different format.

  11. orcmid says:

    I played hooky and ploughed through the Metro specification. Interesting. I satisfied myself that there are ways to anticipate this format in some "package" applications I have in mind, and they should be able to be Metro hybrids in the future. I know what names not to use to avoid future collisions, in particular [;<).

    I have a few comments and while the document seems to invite feedback, I don’t know where to provide them. Do you have any tips about that?

    I am also interested in any discussion area where private (per the license) but non-confidential discussions and crosstalk might occur, since there seem to be good prospects for heterogeneous interoperability and sharing use cases. It would also help avoid working up comments where that has already done and where changes are already anticipated. I hate to waste energy in duplicative work.

    Finally, other visitors to your blog might find the general Metro materials at http://www.microsoft.com/whdc/device/print/default.mspx to be of interest, especially the FAQ, Fact Sheet, and the document lifecycle materials. This approach addresses a lot of problems in the scan/create – store/manage – present/print scheme of things, though I am not sure it works for giant documents and fire-breathing, tree-eating publishing engines. I’m sure others are worrying about that.

    On the other hand, I think it makes a great number of performance cases simply simpler under a nice framework. This looks like a packaging that could be every mid-range multifunction device’s dream-transfer vehicle. But hey, what do I know.

  12. Sean Clarke says:

    Good post Brian, however I am sceptical of your reasoning and justifications.

    As mentioned in previous posts OASIS/OpenDocument is an "open" group, that means they could have been approached, you could have create a subset, used it as a basis etc.

    To be honest about your comments about maintaining backwards compatibility is a bit of a joke – I’m sure most people reading this will have had the same issues as I in opening documents between different versions of MS Office where formatting gets screwed etc.

    Do I think this is another attempt at lock in? Yes I do, it is the long standing MS business practice that has served them well over the years.

    Like I said, good post Brian – don’t take my comments personally.

  13. Wesley Parish says:

    To tell the truth, Brian, what worries me the most is not what Microsoft has left out of Microsoft’s Own Reimplementation of Zipped XML, but what may sneak back in, under the guise of "Backward Compatibility".

    To wit, Microsoft Office’s file formats have generally been secret enough to prevent easy cloning, but not easy virus hosting. (A cute young women once bestowed on me the best smile she had in her armory because I had, just by converting her resume from .DOC to .RTF, got it through the anti-virus protections Hotmail.com had adopted – by thus wiping out a virus Hotmail accused it of hosting. 🙂

    That’s one "legacy" application of Office file formats that you’ll be only too glad to see the end of – the only problem is that you have turned your back on the broadest industry collaboration that could have helped you bury it for good.

    I don’t want to be the one to rain on your parade – but this has evidentally not been taken into account. I’d be more comfortable with Microsoft’s Office Suite if you had.

    Thanks

    Wesley Parish

  14. Thomas says:

    ===cut===

    on Wednesday, June 15, 2005 12:28 AM Kaiwai wrote:

    Just a side note, the OpenOffice.org format isn’t zipped, its gzipped, which is a different format.

    ===cut===

    wrong

    please read the OpenDocument v1.0 specification

    http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office

    especially chapter 17

    ===cut===

    17.1 Introducion

    […]

    OpenDocument uses a package file to store the XML content of a document together with its associated binary data, and to optionally compress the XML content. This package is a standard Zip file, whose structure is discussed below.

    […]

    ===cut===

  15. Josh says:

    The reason I actually drop Win as development platform is this kind of "open" doubletalk. It is perfectly legal to have proprietary format but please don’t call it "open" followed by reasons why you technically couldn’t (and god knows you wanted) to use OASIS format. We are not morons!

  16. Russ Stemler says:

    And will the world actually care?

    With the admitted lag in pick-up for Win XP, the core gutting of the perennially delayed Longhorn, the GAW (Generally Accepted Wisdom) that MS’ business practices are at the least constraining if not actually detrimental to users’ best interests, the wide availablity of ‘good-enough’ (a previous MS strategy, BTW) free alternatives, who can MS cajole or beat into submission to utilize the newest and greatest lock-in?

    No, I don’t think I’m bitter. It is just that Microsoft has educated me and I have naturally moved on.

  17. BrianJones says:

    These are all great comments. I’ll try to pull together a larger writeup that addresses the different concerns at some point today (It’s a pretty busy day though, so it might not come until later in the day).

    Russ, thanks for your comments. Not really sure I have much to say in reply. Sorry you’re bitter. Hopefully as you read more about what we’re doing with the formats you’ll start to gain more interest.

    Josh – I know you guys aren’t morons. If I thought that I wouldn’t be wasting my time on this blog. What are the specific issues you are concerned about with our formats that would make you say they aren’t open formats. What are some examples of formats you view as open and what are the differences between those formats and ours that make you feel that way? We have freely available documentation, royalty-free licenses, and we’ve listened to customer input when designing them. I’m assuming there is something else you are looking for. Most people I’ve talked to are pretty excited that we’re moving away from the curent binary formats and giving them a new open format (that’s the default) they can take advantage of.

    -Brian

  18. Jared White says:

    OK, I’m not really a Microsoft fan, and in fact use a Mac and avoid MS products most of the time, but I believe in giving credit where credit is due. Microsoft moving to an XML-based format for Office is a tremendous achievement and one which will help Microsoft and competitors both in different ways. Microsoft will gain the benefits of integrating into an XML workflow, and competitors will be able to transform the Office XML (calling any XML format "proprietary" is pretty silly you know) into any other format they want. Everybody wins, except the whiners who like to dump on Microsoft for no real reasons other than spite.

    Folks, you must realize that while MS may have stomped on a lot of people in the past, they are improving their strategy and supporting a lot more open standards going forward. Let’s encourage that, instead of continuing to throw stones.

    Jared

  19. Josh says:

    Document format is "Open" when:

    1) Applications using this format should not wary about patent issues.

    2) Format full specification is available on the web

    3) (optional) C++ stand-alone library for reading/creation of documents is available and does not require additional proprietery technologies (like ActiveX).

    So, is your document XML format open?

  20. SMC says:

    I (and I suspect most other readers) DO actually appreciate you taking the time to discuss these formats online.

    Still, I can understand (and agree with) the reasons which are driving people to be somewhat abrasive in some of the posts. Microsoft has definitively proven that it will do whatever it can to lock its customers in and avoid competing on technical (and now price) merits. If Microsoft’s corporate attitude is actually changing for the better now (after all that rhetoric about the GPL being "cancerous" and "un-American" and so forth) it will take serious, prolonged, positive action on Microsoft’s part to prove it, in light of its previous behavior and criminal activity.

    Two issues have come up repeatedly about these new formats specifically – the ’embrace/extend/extinguish’ sort of approach MS appears to have taken to the OASIS formats, and the almost-certainly-intentional devising of license terms for the format that prohibit GPL licenced software from using them.

    On the first issue: Microsoft is evidently a MEMBER of the OASIS group, and as many people have pointed out, it ought to have been possible for MS to participate in the development of the standard to allow for whatever special "support for old proprietary features" data is necessary. Instead, Microsoft seems to have refused to work with others and instead took the concept that came out of OASIS and made their own special version – a common tactic MS seems to like to use.

    On the second issue: I previously commented on the license problem. The terms have been carefully chosen to require the addition of restrictions (the "advertising clause") to the licenses of any software that uses this format. Since the fundamental purpose of the GPL (as I see it) is to avoid having unscrupulous entities (individuals, corporations, whatever) poison a downstream derivative of a GPL-licensed product with additional restrictions as the price one pays to license the GPL’d product, this blatantly forbids the use of the GPL. Given the over-the-top ravings that have previously come out of Microsoft regarding how much Microsoft hates having to agree to such terms to use the software (unlike the BSD-style licenses which allowed Microsoft to, for example, get a ‘free’ TCP/IP stack which could be made proprietary, without giving anything back in return) one can only assume this was an intentional act. Is THIS the reason MS developed their own special standard? Because the existing, documented standard would not have MS-imposed anti-GPL restrictions? (This is a serious, fundamental question that I wish someone would honestly address…)

    Despite my well-warranted skepticism, though, thanks for trying to answer the questions that have been coming up.

  21. BrianJones says:

    Jared – Thanks for your comments. I can’t stress enough how big of a deal this new shift is, and how many new scenarios are now possible.

    Josh –

    1) There is a royalty free license (as I discussed in this post: http://blogs.msdn.com/brian_jones/archive/2005/06/02/424517.aspx

    I’m not going to give you my interpretation of that license since I am an employee and my interpretation could be seen as binding. There is a great FAQ though that was put together by the lawyers that tries to help people interpret the license.

    2) The schemas for Office 2003 XML can be freely downloaded here: http://www.microsoft.com/office/xml/default.mspx

    All the schemas are fully documented. I admit there are some of the more obscure areas of the format that aren’t documented as well as others. I’m going to push to make sure we get even more details and good documentation out for these new formats.

    3) Just use and available ZIP tool & XML parser to start with. If you want richer support, we can look into potential tools to pull together.

    SMC-

    I already talked a bit in the initial post around why the two formats are different. There have been a number of questions though about why we didn’t work with OASIS to extend the OpenDocument format to work for us. It’s a bit odd because at the same time there are also complaints about not wanting us to "embrace and extend." Making the OASIS format fully compatible sounds like a nice idea if you are just looking the surface, but that’s a more simplistic view. I’ll try to explain this better later.

    As far as the licensing issues go, like I already said I’m not actually able to give interpretations of the license. Instead, what I can do is work with our lawyers to help add more information to the FAQ that will hopefully help answer the questions people are asking. I can tell you that there is no attempt to be sneaky with the replies. It’s really more that it becomes a rather touchy issue if people want you to officially publish information that is interpreting someone else’s license. Like I said, I’ll try to dig into this more and get a more updated FAQ as soon as possible.

    -Brian

  22. Ben Hourigan says:

    I’d love to see a really good filter for converting Word XML to LaTeX and back again. I currently don’t use any Microsoft products, but I could consider going back to Word if it worked better with LaTeX than OpenOffice does.

  23. Thomas says:

    ===cut===

    What are the specific issues you are concerned about with our formats that would make you say they aren’t open formats.

    ===cut===

    for starters: take a look at your license for the existing office 2003 xml file format:

    Office 2003 XML Reference Schema Patent License

    http://www.microsoft.com/mscorp/ip/format/xmlpatentlicense.asp

    ===cut===

    Microsoft may have patents and/or patent applications that are necessary for you to license in order to make, sell, or distribute software programs that read or write files that comply with the Microsoft specifications for the Office Schemas.

    […]

    You are not licensed to sublicense or transfer your rights.

    ===cut===

    Need a little help to what I want to point you at? Then read the posting from Carsten Svaneborg from 11/17/2003. The URL is http://www.abisource.com/mailinglists/abiword-dev/2003/Nov/0262.html

    And another good read is "Patently ridiculous" from Judith Wusteman from 02/2004

    The article is available from http://www.ingentaconnect.com/content/mcb/238/2004/00000022/00000002/art00014

    Focus your read at the end conclusion Judith comes to:

    ===cut===

    Imagine a scenario in which a major digital library of several million documents is archived in Microsoft Office 2003 Word format. It can be saved as XML so, of course, it is future-proof and hence an appropriate archival format. After a couple of years, Microsoft upgrades to Word 2006. A couple of years later, it upgrades again, this time to Word 2008. At this point, Microsoft "sunsets" Word XP, that is, it ceases to support it. Word 2008 may be able to read Word 2006 files but history tells us that it may not be able to read Windows 2003 files. But the files are in XML so it should be easy enough to create a reader for them — except that there’s a patent on the format so this would be illegal until that patent has expired. The result is several million unreadable documents. Archiving documents in formats encumbered by patents will always be a bad idea.

    ===cut===

    ===cut===

    What are some examples of formats you view as open and what are the differences between those formats and ours that make you feel that way?

    ===cut===

    What´s wrong with actively participating in an standardization gremium — like the OASIS OpenDocument — to define a truly open file format that serves as the main file format for not only one software product from one vendor.

    Like others said in this comments here…

    of course it would be nice if Microsoft adopt OpenDocument as its main file format but if that would — as you claim — not be possible because you think you need "more than 1600 XML elements and attributes that reflect the features in Word alone in Office 2003" than theres nothing wrong with doing your own thing. No one can stop you from doing that.

    But how about implementing an import filter for OpenDocument file format? If you are truly desperate you could even implement an OpenDocuemnt export filter. Don´t mind that some features of Microsoft Office wouldn´t be possible to be saved in OpenDocument but export as much information as you can. That would be a good start.

    And to quote you " it is possible to create a transform (or filter) that goes between the two". The specs for the OpenDocument file format is freely available to you. You just have to implement it properly.

  24. BrianJones says:

    Thanks Thomas:

    You should probably check out the FAQ for the open royalty-free licenses we are providing. It answers your concerns about patents, and also shows that Judith doesn’t need to worry. One of the big benefits of our new format is the fact that people can now archive their Office files long term and have absolutely no lock in to Microsoft for accessing those files. Here is a clip from the FAQ:

    <start>

    Q. If Microsoft obtains a patent for the Office 2003 XML Reference Schemas, does that in any way affect the royalty-free license?

    A. No, the license is unaffected. Under the patent license for the Office 2003 XML Reference Schemas, Microsoft offers royalty-free rights both to its issued patents and patents that may be issued in the future.

    Q. The patent license associated with the Office 2003 XML Reference Schemas states that "Microsoft may have patents and/or patent applications that are necessary for you to license in order to make, sell, or distribute software programs that read or write files that comply with the Microsoft specifications for the Office Schemas." What does this statement mean and to what specific patents and/or patent applications does this statement relate?

    A. As an industry leader in the design and development of innovative computer technology, Microsoft has made a significant investment in research and development (R&D). With an annual budget of nearly $7 billion, Microsoft’s R&D commitment is among the highest of the world’s major technology providers, both on an absolute basis and as a percentage of sales. Like other major technology providers, Microsoft routinely applies to governments around the world to obtain patents on our inventions. A patent establishes ownership of an invention, enabling the patent owner to benefit commercially from investments in innovation. A patent is granted if government patent examiners conclude that an invention is a true innovation compared with existing technology. Microsoft has been awarded thousands of United States patents, and our worldwide portfolio continues to grow.

    Under the patent license for the Office 2003 XML Reference Schemas, Microsoft offers royalty-free rights both to its issued patents and patents that may be issued in the future as an outcome of the patent process. To learn more about Microsoft’s intellectual property policy and to find links to government patent offices, we encourage you to learn more about Microsoft Intellectual Property at the Microsoft Web site.

    We have chosen a simple and straightforward licensing approach that should appeal to a wide variety of potential licensees because it broadly covers all applicable patents and patent applications instead of only those that are enumerated.

    </stop>

    You also made a great point at the end of your post. Both formats are completely open and documented so anyone can build a transform to go between the two. That’s why we are talking about this so soon. We already have people building transforms on top of our Word 2003 XML format, as well as the SpreadsheetML format that we actually started building back in 1999. Anyone that wants to can build it! Microsoft has thousands of partners that we work with to build solutions on top of our applications. These new formats open up all kinds of possibilities for those partners, or anyone else out there, to build solutions to map from our formats into any of the other XML formats out there (there are tons of them!).

    -Brian

  25. Thomas says:

    Brian wrote on June 15, 2005 2:58 PM:

    ===cut===

    I already talked a bit in the initial post around why the two formats are different.

    ===cut===

    That wouldn´t be if you would have actively participated in OASIS working group…

    ===cut===

    There have been a number of questions though about why we didn’t work with OASIS to extend the OpenDocument format to work for us.

    ===cut===

    And that is a good question to ask. So *why* didn´t you?

    ===cut===

    It’s a bit odd because at the same time there are also complaints about not wanting us to "embrace and extend."

    ===cut===

    Than just don´t "embrace and extend".

    To stress it once again: If you would have actively participated in OASIS OpenDocument working group than you would have had the chance to help to design a file format that is "suitable for office documents containing text, spreadsheets, charts, and graphical documents" like http://www.oasis-open.org/committees/office/charter.php states.

    But you chose not to participate in that effort. Instead you have waited. Waited that the OASIS working group comes to a file format and now you whine because the bad bad OpenDocument file format is not suitable to save all of your Microsoft Office features.

    ===cut===

    Making the OASIS format fully compatible sounds like a nice idea if you are just looking the surface, but that’s a more simplistic view. I’ll try to explain this better later.

    ===cut===

    Please do.

    I´m curious why you think that.

  26. John says:

    >Both formats are completely open and documented so anyone can build a transform to go between the two.

    True, but unless it comes built in it might as well not exist as people will not find it easy to go between the two.

    >Anyone that wants to can build it!

    See point above. In addition people are afraid that they might get sued. Sure you say, "we won’t" but then you usually prepend that you’re not a lawyer. So how about we do have one of you lawyers definitely saing: "The GPL [is|isn’t] comptatible with our terms" That should be easy enough for a FAQ entry. This way people can be sure that MS won’t go back on their word and weasel out through some lawyer-speak.

    Thanks for this blog and thank you for taking the initiative toward an open format. It sure is better than it was.

  27. orcmid says:

    You got that? Practically identical royalty-free patent license, except that Sun apparently doesn’t require notice to be carried, but it has the same limitations as the Microsoft RF license on the XML Reference Schemas. And yes, that would appear to make software that accesses OASIS OpenDocument formats as vulnerable as people seem to think they are with the Microsoft license.

    It’s all here: http://orcmid.com/blog/2005/06/microsoft-ox-vs-oasis-od-is-it-really.asp

    Now, you can still use it with the GPL, but you don’t want to envelop code that accesses the OASIS OpenDocument format under the GPL, because you don’t want people making derivative works in ignorance of the license stipulations. I spent some time fussing over that and figuring out how to stay clean here: http://orcmid.com/blog/2005/06/heavy-lifting-toward-open-formats-in.asp

    I could of course be wrong about what’s necessary to play safe. It won’t be the last time I’ve been off about something. Maybe compliance is easier. Fine, then it is for both of them because both licenses work the same way.

  28. Thomas says:

    To orcmid:

    ===cut===

    That OASIS declares a 706-page unimplemented specification as being an "OASIS Standard" is fairly amazing. 

    ===cut===

    OpenDocument will be implemented in OpenOffice.org Version 2.0 and in KDE KOffice Version 1.4

    Unimplemented specification? Do we talk about OpenDocument or about Microsoft Office Open XML?

    If you think OpenDocument is unimplemented than you are uninformed. Please take a look at OpenOffice.org, KOffice, IBM Workplace, StarOffice. OpenDocument is not a dream. It is a real format with enough support to present a real alternative.

    ===cut===

    It will take substantial effort to reality-check that specification, and it will be a little while before anyone confirms multiple, interoperable implementations.

    ===cut===

    Could it be that all members of the OASIS OpenDocument working group already did that? How do you think they agreed on that standard?

    ===cut===

    It seems to me that OpenDocument must be demonstrated to accommodate the Microsoft Office format, not the reverse. 

    Can OpenDocument accurately represent documents created in Microsoft Office, preserving all of the features of those documents?

    ===cut===

    If Microsoft cannot store its office document features with the existing OpenDocument file format than you cannot blame OASIS. OASIS welcomed everyone including Microsoft to help designing a new file format based on OpenOffice.org´s XML file format.

  29. Yaniv Golan says:

    I’ve had very good experience with using ZIP as the packaging format for iXF, a more general purpose XML-based representation for a collection of entities (classes, behaviors and objects), relationships and associated files. Choosing ZIP made working with iXF archive files so much easier.

    One of the lessons learned as iXF evolved is the importance of defining the logical packaging model separately from the physical packaging model, allowing the same logical format to be represented in ZIP / gzip / folder on the disk / files on a web server.

    Note that when defining reference to external resources (Relationships, TargetMode=external in Metro, File Description behavior, location in iXF), you may want to add special syntax for handling files which currently reside externally to the package, but NOT in a URL-addressable location (e.g. on the local hard disk).

    You can find the specs for the original version of iXF at http://www.ixfstd.org/std/docs/1.0/ixf/IXFSpec1.0.doc.

  30. Gogs says:

    Brian,

    I’d like to ask you a simple question. Can you guarantee 100%, not 90, not 95 or even 99, but 100% that Microsoft will not in ANY way in the future attempt to limit non Microsoft applications from opening, editing or converting from your xml format, whether by patent enforcement, licence changes or by any other means.

    Maybe the Leopard IS changing it’s spots, moving from proprietary to completely open file formats, but please understand why people remain nervous when the words Microsoft, Licence and Patent appear in the same context.

  31. L Joe says:

    Brian,

    Thanks for the ability to discuss you blogging.

    Recently, I’ve performed some *very limited* work with an XML format.

    You said "…because XML compresses so well with ZIP and provides an easy to use *container*." This is exactly what I understand XML to be. A system of containers.

    I note in your previous/initial article: http://blogs.msdn.com/brian_jones/archive/2005/06/01/424085.aspx

    Which states: "4. Backward compatible: There will be updates to Office 2000, XP, and 2003 that will allow those versions to read and write this new format. You don’t have to use the new version of Office to take advantage of these formats. (I think this is really cool. I was a big proponent of doing this work).

    5. Binary Format support: You can still use the current binary formats with the new version of Office. In fact, people can easily change to use the binary formats as the default if that’s what they’d rather do."

    Which leads me to conclude your "NEW" system is a format with may containers; however each container has a proprietary or patented document format. You will have a Word 2000 XML container tied to a proprietary/patented embedded binary which just happens to reside in an "open" XML container; and a Word XP XML container tied to a proprietary/patented embedded binary which just happens to reside in an "open" XML container; etc.

    I can understand the workload necessary to support and create this multi format XML container that supports all of MicroSoft’s previous proprietary format binaries. I hope I’m wrong. I hope MicroSoft is listening to us technical people who want to understand the details; however, my previous experiences with MicroSoft sets my disposition to be PESSIMISTIC. I would be extremely supprised if interoperability is allowed. That has never been MicroSoft’s goal.

    This appears to be nothing more then "open" marketing spin around a huge industry buzzword.

    Regards,

    L Joe

  32. orcmid says:

    My primary interest in this conversation is over interoperability and interchange among desktop and office productivity applications and the licenses that enable that.

    I have concerns about the Software Patent Miasma (the monster under the bed, as it were) and the associated chills. But this is by no means particular to Microsoft. The current approach to royalty-free licenses for essential claims is commonplace in W3C and OASIS and elsewhere.

    I also think we need to stop presuming that Microsoft could actually have influenced the OpenDocument activity in an useful way.

    It didn’t happen that way, and I don’t see how it could have without Microsoft first completing the work that went into Office 2003 XML formats and the remaining difficult effort now needed to get to full fidelity in Office 12.

    That would have been an interesting challenge to a modest-sized technical committee that proposed to complete the central part of the work six months after starting from an OpenOffice.org XML format offered by Sun Microsystems in December 2002 (with the usual reciprocal royalty-free conditions). Look how long it actually took without having to negotiate over Microsoft feature harmonization!

    Meanwhile, the working out of interoperability agreements when the OASIS OpenDocument format has no requirement for even a minimum set of elements is going to be challenging.

    What we have now is timing a timing problem. There will be a long road to convergence on portable formats and document-processing interoperability, the kind that preserves content that matters to people and provides assured access and re-use into the future.

    At the moment, we have too many people telling Microsoft what to do and how to do it, but only one hand in the air for accepting the heavy lifting that preserves the varied investments of 400 million users in multiple languages and cultures. Please don’t underestimate the importance of that and the difficulty of the effort.

    I am resolved to be patient and also cross my fingers in a desire to see the Microsoft Office XML Formats be successfully completed and a spur to wide use.

  33. Charles says:

    I still wish there was one standard and distinguishing features were added via namespaces. I suppose with the two standards, you can transform between the two and place features of one that are not in the other in a namespace for safe keeping.

  34. Ian Thomas says:

    This is perhaps a bit off the topic, but since the media has determined that the new "Metro" XML formats are a competitor for Adobe Acrobat PDF files, this is relevant. Besides, it’s a question that stands on its own, too.

    There are (apparently) some Truetype fonts that have an embedding flag, which is dependent on the user-application respecting the embed flag. For Adobe Acrobat’s PDF files, I believe that this flag _is_ respected and so some fonts cannot be embedded – though a subset of characters may be allowed. Other fonts (apparently) also employ a similar flag, but may implement the "convention" incorrectly so that Adobe Acrobat rejects them – ie, they don’t get embedded, even as a subset of the full font set.

    Now, I expect that Metro will have a similar mechanism, since some font makers are very protective of copyright or software licence agreements.

    Can you clarify that?

    Now, leaving PDF files aside, with MS Office products (and others, of course) there has always been the issue of font substitution in documents. A nicely-formatted document when sent to another user may not appear exactly as desired, because the recipient user does not have the same font set installed as the document’s creator has. Often, this is predictable because of variations in Windows OS version (ie, the set of fonts that is standardly installed is different for difefernt OS versions), but it may be quite unpredictable unless font embedding is employed routinely – which causes severe file "bloat".

    Can you illuminate us about the way that Metro / Office 12 will manage font substitution? Is it any different than is used currently in the MS Office 2003 product line?

  35. Mark Baird says:

    I am a Microsoft Office power user and programmer.

    The OpenDocument standard, at first glance, seems to a better format then WordML. I would have thought that Microsoft would have more closely followed OpenDocument and then added onto that standard when needed. For example, managing revisions in the OpenDocument format is much simpler then wordML. WordML inserts an annotation and then splits the text run into two separate nodes making it much more difficult to work with the XML. "Annotations" are much easier to work with in OpenDocument.

    Sorry, I just don’t believe your arguement for not adopting OpenDocument.

  36. Anthony says:

    The simple answer: People don’t need the features of the Microsoft Office XML format to make information available to the public.

    MS could just include support for the format in MS Office and allow users the choice to open their files in the Microsoft software.

  37. Wow, there were a ton of great comments on my last post. While there were a large number of them, there…

  38. gustl says:

    I have seen a lot of talk here about MS Office XML being as "completely open" as OpenDocument.

    That is simply not true!

    For the OpenDocument formats I can build an office application under ANY LICENSE ON THE PLANET! Even the GPL or BSD licenses are compatibel, because the OpenDocument specifications are fully SUBLICENSEABLE. And in order to display source code to the public it has to be sublicensable.

    On the other Hand, If I sign the XML license offered by Microsoft, I explicitly sign a contract which says, I cannot sublicense the format specifications. Since the source code of an import or export filter itself IS a format specification, I am not allowed to simply put this source code on the internet. Which effectively prohibits ANY open source project from using the MS XML specifications when writing a filter. This is why we will not see a MS XML format filter in OpenOffice.org anytime sooner than 20 years. StarOffice from Sun are able to implement a closed-source imp/exp filter into their product, since they do not have to sublicense their source code.

    And please, do not again try to confuse open with restricted. MS’ XML license is prohibitively restrictive if you are programming for OpenOffice.org, KOffice or AbiWord !!!!

    If Microsoft no longer wants to be viewed as a bully who uses lock-in tactics to keep its high market share, Microsoft would be well advised to try everything technically and LEGALLY possible, to let ALL of its competitors easily interoperate with its products.

    If you continue come up with vague "patents we may have" sections combined with "sublicenses not allowed" license contracts, you will be seen as hostile against open source software.

  39. Emil Per. says:

    ===

    To give you an idea of how big of an undertaking that can be, we have more than 1600 XML elements and attributes that reflect the features in Word alone in Office 2003.

    ===

    well, that does not seem so big … 100 elements with ~ 16 attributes each ? … I am not really impressed.

    "1600 XML elements" would be impressive …

  40. BrianJones says:

    Hey Emil, just a quick scan of the schemas show that there are about 780 elements and 885 attributes for Word 2003 XML.

    -Brian

  41. Henry Stanaland says:

    Granted, your explanation may logically explain why you created a proprietary "standard" to depend upon, but it doesn’t explain why you would refuse to support OpenDocument at all…the same way you support .rtf, .html, and .txt.

    You say that you can’t support OpenDocument because of all the wonderful features of MS products…but you support .html, .txt, .rtf, and others which are much much worse! Furthermore, if you added support for Opening and Saving OpenDocument, then Massachusetts could use Office AND my entire company could use Office (like Massachusetts, the company I work for also stores all documents in OpenDocument, PDF, or other non-proprietary formats).

  42. Peter says:

    Microsoft defined the binary .DOC format(s) in use by hundreds of millions of people, and DOC files have been around for something like 15 years. When the OASIS OpenDocument folks set out to define yet another format, wouldn’t it be reasonable to expect them to take the steps to ensure their format supported all the features of all those existing .DOC files?

    I’d think that would be the primary goal, and the measure of whether OpenDocument was successful – if I can "Save As" all my thousands of .DOC files in the OpenDocument format, and retain full fidelity, then OpenDocument "passes". Otherwise, it fails.

    If OpenDocument does not allow this, and from what I’ve read here it does not, I must conclude that the goal of OpenDocument was something other than helping me, the consumer.

    In my book, OASIS screwed up with OpenDocument.

    Who needs yet another format that doesn’t support the document features I use? And who are any of you to tell me what document features I should live without, just to conform to whatever agenda drove the OpenDocument format definition?

    If a bunch of you whackos want to partition yourself off from the hundreds of millions of Office users with your incompatible document format, go for it – we won’t miss you. But you’re creating the problem, not Microsoft, and you have only yourselves to blame.

  43. Zach says:

    Peter

    <quote>

    wouldn’t it be reasonable to expect them to take the steps to ensure their format supported all the features of all those existing .DOC files?

    </quote>

    The point is that no one can ensure 100% compatibility with doc because it aint open, it’s a big secret file format.

    <quote>

    I’d think that would be the primary goal, and the measure of whether OpenDocument was successful – if I can "Save As" all my thousands of .DOC files in the OpenDocument format, and retain full fidelity, then OpenDocument "passes". Otherwise, it fails.

    </quote>

    The primary goal in a MS$ centered society maybe. The problem is that you don’t have complete ownership of your own data. This is a primary goal for the future freedom of users. MS$ took away your freedom when they locked you into their secret formats, they haven’t been a team player.

    <quote>

    And who are any of you to tell me what document features I should live without, just to conform to whatever agenda drove the OpenDocument format definition?

    </quote>

    You misunderstand comments about supporting a subset of features. The point is to allow someone to open up an OpenDocument document in MS office and to allow you to share at least most of your documents features with someone who does not have office. Few people use those "extra features" and many of them cause compatibility problems with other version in any case.

    <quote>

    users with your incompatible document format

    </quote>

    Incompatible from whose perspective? Legal decisions are causing organizations eg. state of Massachusetts to move their valuable documents to a format which is free, open and which will be usable in 200 years from now.

    <quote>

    But you’re creating the problem, not Microsoft, and you have only yourselves to blame

    </quote>

    Many will not lament M$ not coming to the table because it will be to their detriment eventually. Open source initiatives aren’t creating problems, they are solving them in an open way. You misunderstand if you think that people are going to be so sad that the big dog didn’t come to play. You should look at M$’s campaign against linux and tell me they are not afraid and not doing any blaming.

    You are misinformed if you do not see the steady gains made by open source software. More and more public sector organizations will be moving. I’m from South Africa and our government has put an open source strategy on the table. They will be moving to open and free standards in the next couple of years. It looks like they will be opting for linux as an OS for example. They are definitely going to be migrating away from MS products, due to licencing and openness.

    You are welcome to keep on making a select few rich, but don’t call people wackos for moving towards systems which are open to all and the adoption of which would benefit all people and not a handful of businesses.

    If Microsoft really wants to be "open" with this new format of theirs it should make it as free as possible and hand it over to an independant standards body. I don’t want to hear in 2 years time that my 300 documents are suddenly locked in again because of M$’s control.

    Imagine someone patented the concept of a toilet and you had to be happy with whatever decisions the holder makes regarding YOUR future use of it. We’d have someone richer than Mr Gates.

    I do not trust M$’s intentions. They’ve applied for patents regarding saving wordprocessor documents in xml format in my country (and others), even though OpenOffice has been doing it for years. They do not play nicely.

    As a side note:

    Patents might have some place in the world, but imagine you have a unique idea and either don’t want to patent it or don’t have the money. Someone else hearing it can take your rights to YOUR idea away. That hardly seems just.

  44. I found your page from google but i like it so much

  45. Andrew Sayers had a great suggestion that I should have a page set up that gives an overview of the blog