Mapping documents in the binary format (.doc; .xls; .ppt) to the Open XML format


I wanted to call everyone’s attention to a few interesting developments in Ecma’s proposed disposition document related to the Office binary formats. There were a few comments from national bodies that asked about the documentation of the Office binary formats and the availability of those documents. We had already been talking about these issues in TC45 where there were a number of existing experts in the binary formats (including Apple, Novell, and Microsoft). Based on the feedback from the national bodies, Microsoft decided last week to take some additional steps in this area.


The first issue National Bodies were interested in was easier availability of the documentation of the binary formats (.doc; .xls; .ppt). It sounded like the main concern here was around the extra steps required to get the binary documentation. The current form of the documentation has been available since 2006, where anyone could get the documentation by sending an email to Microsoft as described as http://support.microsoft.com/kb/840817/en-us. The documents were available royalty-free under RAND-Z. We already have hundreds of companies, including IBM and SUN, as well as government institutions who have the documents. The new proposal we (Microsoft) made to Ecma TC45 was that we’d just get rid of the need to send an e-mail and we’d provide it for direct download under the OSP. TC45 thought this was a good solution, and here was the TC45 response to the national body comments:


Documenting the Microsoft Office “binary” file formats (i.e., .doc, .xls, and .ppt) (the “Binary Formats”) is not the intention or in the scope of DIS 29500.


However, Ecma International  discussed this subject with Microsoft Corporation. Microsoft indicated that the documentation of the Binary Formats has been available royalty-free under RAND-Z to anyone who requests it by sending an email to officeff@microsoft.com, as described at http://support.microsoft.com/kb/840817/en-us.  Microsoft indicated that many companies and public institutions have asked for and received the Binary Formats since Microsoft started providing access to this documentation. 


Nevertheless, in response to requests for even easier access to the Binary Formats, Microsoft has agreed to remove any intermediate steps necessary to get the documentation, and will post it and make it directly available for a direct download on the Microsoft web site.  Microsoft will also make the Binary Formats subject to its Open Specification Promise (see www.microsoft.com/interop/osp) by February 15, 2008.


The second issue we had feedback on was an interest in the mapping from the binary formats into the Open XML formats. The thought here was that the most effective way to help people with this was to create an open source translation project to allow binary documents (.doc; .xls; .ppt) to be translated into Open XML. So we proposed the creation of a new open source project that would map a document written using the legacy binary formats to the Open XML formats. TC45 liked this suggestion, and here was the TC45 response to the national body comments:


We believe that Interoperability between applications conforming to DIS 29500 is established at the Office Open XML-to- Office Open XML file construct level only.


Prescriptive guidance on, or tools to enable, transformation from Microsoft Office  “binary” file formats (i.e., .doc., .xls, and .ppt) (the “Binary Formats”) to Office Open XML formatted files is not the intention or in scope of DIS 29500.  As a result this request is outside the bounds of this process. 


It is important to note that substantial use is being made of both the Binary Formats and Office Open XML in the marketplace today.  Many products (such as OpenOffice.org) support the Binary Formats. Microsoft has indicated that many companies and public institutions have received the documentation for the Binary Formats, and are working with it at this time, and can create mappings between the Binary Formats and Office Open XML. Translators from the Binary Formats  to XML formats such as ODF have already been developed and are in wide use. For example, the Sun ODF Plug-in for Microsoft Office (http://sun.systemnews.com/articles/112/3/sw/18208) states that  “The plug-in allows users the ability to seamlessly convert Microsoft Office documents to and from ODF. The ODF plug-in supports Microsoft Word, Excel and Powerpoint”.


Likewise, there is widespread use of Office Open XML in the marketplace today across platforms and applications.  A few examples include the implementations released by Apple (Mac OS X Leopard, iWork 08, iPhone), Adobe (InDesign), Microsoft (Office 2007, Office 2003, Office XP, Office 2000, Office 2008 Mac OS X), Novell (Suse Open Office), Google (Search / Preview), Mindjet (MindManager), Intergen, OpenXML/ODF Translator (Open Source project on Sourceforge), Dataviz (DocumentsToGo on Palm OS, MacLinkPlus on Mac OS X Leopard), NeoOffice, Altova (XMLSpy), MarkLogic (XML Content Server), Datawatch (Monarch Pro), QuickOffice  (QuickOffice Premier 5.0 on Symbian), Altsoft (XML2PDF Server 2007) and those under development by Corel (WordPerfect), AbiWord, Gnome (GNumeric),  Xandros, Linspire, Turbolinux and others.  These implementations are now available on many platforms, including Linux, the Macintosh, Windows, and handheld devices (PalmOS, Symbian, iPhone, and Windows Mobile).


The widespread use of both  Binary Formats and Office Open XML formats indicates that, at this time, 3rd party can use both formats and build mappings between them.


Nonetheless, Ecma International discussed this subject with Microsoft Corporation, the author of the Binary Formats.  To make it even easier for third party conversion of Binary Format-to-DIS 29500, Microsoft agreed to:



  • Initiate a Binary Format-to-ISO/IEC JTC 1 DIS 29500 Translator Project on the open source software development web site SourceForge (http://sourceforge.net/ ) in collaboration with independent software vendors.  The Translator Project will create software tools, plus guidance, showing how a document written using the Binary Formats can be translated to DIS 29500.  The Translator will be available under the open source Berkeley Software Distribution (BSD) license, and anyone can use the mapping, submit bugs and feedback, or contribute to the Project.  The Translator Project will start on February 15, 2008. 

  • Make it even easier to get access to the  Binary Formats documentation by posting it and making it available for a direct download on the Microsoft web site no later than February 15, 2008.  The Binary Formats have been under a covenant not to sue and Microsoft will also make them available under its Open Specification Promise (see www.microsoft.com/interop/osp) by the time they are posted.

We will modify DIS 29500 to include an informative reference to the SourceForge project.


I think that both of these items are great news for folks interested in documents and document file formats. There will be a lot more information around both of these pieces of work over the coming weeks, but I wanted to make sure people realized that this was already in the works.


-Brian

Comments (94)

  1. Doug Mahugh says:

    Brian Jones has some good news today for developers who want to work with both the binary Office formats

  2. Brian Jones has some good news today for developers who want to work with both the binary Office formats

  3. Andre says:

    Why not attach it to the ISO document annex? I don’t want to sign a non-disclosure agreement to get a bloody spec.

  4. Oliver says:

    And therein is the beauty of the OSP… nothing to sign, free and available to all.

  5. Very good news, indeed!  Thanks for the information.

  6. lori says:

    The OSP is not sublicensiable, thus forbidding GPL distribution:

    http://www.microsoft.com/interop/osp/default.mspx

    "This is a personal promise directly from Microsoft to you, and you acknowledge as a condition of benefiting from it that no Microsoft rights are received from suppliers, distributors, or otherwise in connection with this promise."

    Even Microsoft acknowledge the lack of sublicensiability:

    http://www.microsoft.com/interop/osp/default.mspx#EYH

    "There is no need for sublicensing."

  7. Rajiv Shah says:

    Any chance Microsoft will want to take the next step and turn the binary formats into an open standard ala Adobe/PDF?

  8. OMG says:

    @Rajiv: Oh noes! Please let them die already!!

  9. consumer4beta@hotmail.com says:

    Awesome. Looks like MS is really opening up. Now we can finally bypass the email to obtain process. And will the translator project support batch conversion from binary to OOXML?

  10. Miguel de Icaza says:

    Lori,

    The OSP is just fine with the GPL, the rights that you have are also enjoyed by recipients of the software (section 7).

    Miguel.

  11. Acum ca formatele Open XML au prins viteza, ce vom face cu alea "vechi", binare? Specificatia pentru

  12. Brian Jones carries the news that Microsoft will make the Binary Formats (.doc; .xls; .ppt) directly

  13. Brian Jones provides the text of one of Ecma’s responses to the comments received from National Standards

  14. Brian Jones writes in his blog about some new developments with the Microsoft Office binary file formats.

  15. Mike MacCana says:

    If the spec still says ‘bullet points like Microsoft Office version X’ and nothing more, without any guidelines about how that version of MS Office lays things out, then it’s still worthless to anyone but Microsoft.

    Many of those third parties can only use Microsoft’s binary formats due to reverse engineering – since the spec doesn’t actually contain information on how to do things, just references to older versions of Microsoft software.

    Brian: If this has been fixed, and references in the document to performing things in the manner of other Microsoft products have been replaced with actual specifications that do not require a license for Microsoft source code, please say so.

  16. Brian Jones writes in his blog about some new developments with the Microsoft Office binary file formats

  17. Nektar says:

    I don’t understand. Up until now, Microsoft has been claiming that access to the binary formats was difficult without a signed license as these binary formats were so closely related to the way that the Office applications lay out data structures in memory that it was impossible to produce a specification of the binary formats without giving also away the source code or at least the inner workings of Office, which might change in the future anyway. This was the excuse, from what I remember, for not documenting the binary file formats for all these years (over a decade and more). I don’t understand this sudden change of mind, which is welcomed of course, but it begs the question if you were honest all these past decade saying that the main reason that you couldn’t provide the binary formats was not for any unfair competitive practice (as your competitors were claiming) but purely because of technical difficulties since revealing the binary formats would enable competitors to learn the inner data structures of Office, which were difficult to document anyway. I don’t understand.

    Also, it seems that all the noise nowadays is around Office document formats. However, Office does not only contain doc, xls and ppt but there are other binary formats in there also potentially useful for other applications to be able to read. What about Onenote and Infopath? Shouldn’t I be able to have access to my notebook on every device and for ever? Or is it because for the time being Onenote and Infopath are not very popular and thus but no strong competitor to release xml specificaions first so as to prompt Microsoft to release their own? I don’t understand.

    And what about Access and Outlook. What about an open database format? Isn’t that important? If governements want their citizens to have access to their documents for ever, then should the same be applicable to the data stored in public databases? Or, is Microsoft waiting for Oracle to announce an open database format first, wait for it to be standardized and then "remember" that you should create your own competing open database format? And then again "remember" that you should document the binary Outlook PST and Access+MSSQL formats as well? I don’t understand. Where is the stradegy, the ultimate goal? If the goal is simple access to information and easy portability then you should at least document all the Office and other Microsoft products binary formats and give guidance on their access or at least give a roadmap or formulated stradegy.

  18. Az Open XML szabványosítása körüli felhajtásban kicsit elfeledkeztünk

  19. Bisher waren die Spezifikationen für die Office-Binärformate nicht zugänglich? Nicht ganz richtig. 2006

  20. Ouf, vous avez réussir à lire le titre jusqu’au bout alors voici les détails de ces deux annonces introduite

  21. Bisher waren die Spezifikationen für die Office-Binärformate nicht zugänglich? Nicht ganz richtig. 2006

  22. hAl says:

    [quote]Many of those third parties can only use Microsoft’s binary formats due to reverse engineering – since the spec doesn’t actually contain information on how to do things,[/quote]

    So it is actually a lot like the ODF specification that also does not tell you how to do things. Amazing.

  23. hAl says:

    @lori

    [quote]The OSP is not sublicensiable, thus forbidding GPL distribution:

    http://www.microsoft.com/interop/osp/default.mspx[/quote]

    GPL distributions are about copyrights on source code. Not about rights on a format specification. If you build source code based on the format specification that source code has its own copyrights and can of course be distributed fine under the GPL.

    And if in your source code comments you want to refer to the format specification than it is common practise to do so by referencing the original source and now that can also be done easily.

    So in fact the OSP licensing of this format is actually fully compatible with implementations in GPL.

  24. Anonymous Coward says:

    One thing in this story alerted me about what is going to happen in the times to come:

    "

    Likewise, there is widespread use of Office Open XML in the marketplace today across platforms and applications.

    "

    How there can be widespread use of OOXML when the spec is in turmoil in the ISO/ECMA process and nobody can say what will become of it? Who knows are these suggested applications implementing the first version, the current version, or some upcoming version of the spec?

    MS Office 2007 was listed but there’s just no way to get OOXML out of it, at least that dialect that is currently being evaluated in the ISO/ECMA process.

    And this the real killer here, folks. So many dialects of the spec that in the end we have to revert to the old method of checking how the reference implementation (MS Office 2007) does things. And to add to the insult, even this reference implementation to come do not follow the spec!

    But, of course, who really cares in the end? The whole point of this OOXML excercise was to get the possibility to MS marketing folks to announce that their product implements an ISO documentation standard, governments, feel relieved, and continue with us.

  25. Mike Lieman says:

    HERE is the Gotcha.

    I don’t see *any* promises or commitments that ALL FILE FORMATS will ALWAYS BE AVAILABLE, much less in a timely manner to ensure interoperability.  Note the use of "existing versions".

    From the OSP page:

    Q: Does this OSP apply to all versions of the standard, including future revisions?

    A: The Open Specification Promise applies to all existing versions of the specification(s) designated on the public list posted at http://www.microsoft.com/interop/osp/, unless otherwise noted with respect to a particular specification (see, for example, specific notes related to web services specifications).

  26. Dokumentace k binárním formátům Microsoft Office je veřejně dostpná již delší

  27. Dokumentace k binárním formátům Microsoft Office je veřejně dostpná již delší

  28. Ian Easson says:

    Mike McCana said about ‘bullet points like Microsoft Office version X’:

    "Brian: If this has been fixed, and references in the document to performing things in the manner of other Microsoft products have been replaced with actual specifications that do not require a license for Microsoft source code, please say so."

    He did (in this blog).  So did the ECMA in an announcement.  

    The details of exactly how to implement "autowordspacinglike Word95" (or whatever) are specified in an appendix to the new recommended spec.

  29. Plugger says:

    Much better URL for the Sun ODF Plugin 1.1 is to use the official page:

    http://www.sun.com/software/star/odf_plugin/index.jsp

    One can only wonder whether the bizarre page linked was selected on purpose?

  30. Ian Easson says:

    Mike Lieman wrote:

    "HERE is the Gotcha.

    I don’t see *any* promises or commitments that ALL FILE FORMATS will ALWAYS BE AVAILABLE, much less in a timely manner to ensure interoperability.  Note the use of "existing versions"."

    You seem to be totally confused.  Once ANY file format specification (not just OOXML) is published, it is ALWAYS AVAILABLE (unless everybody in the world accidentally loses their copy of the spec — not likely!).  You can always refer in the future to the spec and write code that makes use of the spec.

    As an analogy, once the ASCII spec for character sets was written down decades ago, it became ALWAYS AVAILABLE.  Got it?

  31. Ian Easson says:

    Anonymous Coward writes:

    "How there can be widespread use of OOXML when the spec is in turmoil in the ISO/ECMA process and nobody can say what will become of it?"

    To answer your question:

    – The spec is not "in turmoil".  It is undergoing the usual process of updating that happens to pretty well ALL standards.  The world has always been able to deal with standards that are improved.  You seem to be under the misimpression that a standard never changes.  On the contrary, they almost always do.  That’s a normal part of the standards process.

    – You are technically correct that the ISO has not yet decided who will maintain the standard.  But, it is nearly certain that they will appoint the ECMA, who has volunteered to do this.  

    – There IS widespread adoption of OOXML.  Obviously, the people who have to make real decisions about its use (as opposed to just theorists like yourself), have no qualms about its long-term viability.

  32. Ian Easson says:

    Nektar writes:

    "I don’t understand."

    You are right, you don’t.

    The documentation for the binary formats has been freely available since Office 97.  You just had to ask Microsoft for a copy. Many people and companies (IBM, Sun, etc.) have taken advantage of this over the years. The only difference now is that Micrsoft is making the process simpler.  Instead of asking for a copy, you will now be able to download it directly yourself.

    My guess is that you were probably just taken in by incorrect statements made by the anti-Microsoft folks about this matter.  That’s likely the source of your confusion.

  33. matthew says:

    Who cares about doc, xls and ppt.  The big dissapointment is that it doesnt cover Visio .vsd and template

    formats

  34. User says:

    Hey this is great news!

    Can you please point me to where can I get the URL for the documentation of binary Outlook .PST files for Outlook 2003?

  35. lori says:

    @hAi

    "GPL distributions are about copyrights on source code."

    Well, if you followed the discussions about GPLv3 and if you read the GPLv2, there are certain provisions for redistribution if you own a patent which is covering the software.

  36. JBooze says:

    @Nektar

    InfoPath does not use a binary format for its data files; they are just xml.  An InfoPath solution template file (.xsn) is simply a cabinet file.  If you want to see what’s inside, just change the .xsn to .cab.  You should now be able to open the file in Windows.

    You can also see an extracted .xsn by using "extract form files" in InfoPath 2003 or using "save as source files" in InfoPath 2007.

    See http://blogs.msdn.com/infopath/archive/2004/05/04/126147.aspx for more info

  37. S says:

    @Brian Jones,

    Congrats for making it to techmeme and slashdot.

    I’ll get back to you when the dust settles. Way too crowded here right now.

  38. Dave S. says:

    If the documentation has been so freely available, why hasn’t any industrious hacker provided a fix for the current Excel vulnerability?

    An independent parser should not fail in the same way the MS Excel parser apparently does, so it should be very easy to clean it up, or at least identify mal-formed Excel files without causing execution of undesireable software.

    On the matter of always-available. Does MS hold distribution rights as part of its copyright or does it allow others to place copies on the web? I know a book can be always-available, but it remains a copyright violation to make and distribute copies of it.

    I am presently using VBA with Excel. Is there a VBA compatible environment being offered for any of the other implementations or are they only fragmentary implementations?

  39. Troy says:

    As excited as I should be (you have "come a long way, baby") about this, I just cannot bring myself to be so.

    This is still a vendor controlled format from a vendor with a rich history of using file formats as a weapon (on customers: "upgrade", and other vendors: "lock-in").

    Because there are REAL open file formats in the world today, this offering from Microsoft is still clearly a substandard choice.

  40. Pete Austin says:

    Miguel,

    OSP GENERAL Q2: "You must agree to the terms [of the OSP] in order to benefit from the promise"

    http://blogs.msdn.com/brian_jones/archive/2008/01/16/mapping-documents-in-the-binary-format-doc-xls-ppt-to-the-open-xml-format.aspx

    GPL 3.0: Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License … You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License.

    http://www.fsf.org/licensing/licenses/agpl-3.0.html

    I think I cannot release a product under the GPL if it includes IP that requires the user to agree to the OSP in order to exercise his/her standard GPL rights to run, modify and propagate that work.

    You say that this does not matter and the program could "referencing the original source". But the the OSP specifically covers rights over "making, using, selling, offering for sale, importing or distributing any implementation", not just rights to use the documentation. So I think you are missing the point.

  41. Access developer says:

    Please tell me honestly what about the binary format of MS Access with random bytes inside coming from memory dumps (!). For many of us it is a big problem.

  42. Pete Austin says:

    >> January 17, 2008 1:29 PM. That first link to the OSP GENERAL Q2 should be

    http://www.microsoft.com/interop/osp/default.mspx#EYH

    Also the previous replies I referred to are from Brian as well as Miguel.

  43. HopefulPedant says:

    Re: Pete Austin

    First of all, let me say that I’m very encouraged that Microsoft are making some of their old/legacy binary file formats more available/accessible and possibly even usable. Even if some people are not satisfied (and I can probably be included as one of those), I think Microsoft should be congratulated and encouraged to positively reinforce this kind of behaviour. Thank-you Microsoft.

    I agree with Pete Austin’s analysis and would hope that Microsoft can be persuaded/encouraged/cajoled into ensuring that  there is no question of there being legal problems with GPL (2 or 3) or BSD use. Perhaps dual licensing with OSP, GPL/GFDL and BSD or similar?

    I don’t know if this covers old binary formats for Microsoft Project, Visio and Outlook – I hope so, as there is an awful lot of important legacy information tied up in documents in formats as well.

    I’ll say again – thank-you for this step, and I hope is it the first of many in the same vein of open-ness.

    HopefulPedant

  44. Ian Easson says:

    Mathew wrote:

    "Who cares about doc, xls and ppt.  The big dissapointment is that it doesnt cover Visio .vsd and template

    formats"

    It doesn’t at present.  They can’t do everything at once.

    But, I just tried a "save as" in Visio, and amongst the file types were some interesting ones:

    – XML drawing (*.vdx)

    – XML template (*.vtx)

    – XML stencil (*.vsx)

    – SVG (*.svg)

    – Compressed SVG (*.svgz)

    – Web Page (*.htm)

    That sounds like a good list to me, for someone who wants to build interworking solutions.  I could be wrong — I’m not a Vizio wiz.

  45. Ian Easson says:

    Dave S writes:

    "If the documentation has been so freely available, why hasn’t any industrious hacker provided a fix for the current Excel vulnerability?"

    I don’t follow security issues that much, so I don’t know what vulnerability you are mentioning.  But, there is an obvious answer to your question:  Since you say it is easy to do given the documentation, why don’t you do it yourself and report back to this blog on your fix?

  46. Bob Bushman says:

    Hi,

    The appearance is that Microsoft is trying to move in the right direction with this. I think that would be an awesome thing on a broad range of topics from developer effectiveness to the efficiency of the global economy. It genuinely appears that you are trying to move in the right direction here.

    But one thing confuses me. It seems that by using the OSP which does not confer redistribution rights, there is skepticism on the part of many potential developers. I think that skepticism may be unreasonable, or it may be based on past experiences – and that it doesn’t necessarily matter which it is. The skepticism exists and is a hinderance if the real goal is to make these standards work for all of us.

    The basis of that skepticism seems to be that it is theoretically possible for Microsoft to stop distributing the documentation, and to enforce their copyright thus preventing others from redistributing the documentation. I find it difficult to believe that that is the real intent of Microsoft, so it seems like a silly thing to have redistribution stand in the way of this very worthy effort.

    So my question is, why not use some equivalent of CC-by-nc-nd? A license which allows non-commercial, attributed, unmodified redistribution of the documentation. Using a license like that would eliminate the (perhaps unfounded) fear that developers feel regarding the future availability of the documentation.

    Here’s a link to CC-by-nc-nd:

    http://creativecommons.org/licenses/by-nc-nd/3.0/us/

  47. Ian Easson says:

    Troy writes:

    "This is still a vendor controlled format.."

    "…there are REAL open file formats in the world today…"

    On your first point, you are incorrect.  Once Microsoft was persuaded by the ECMA head to submit the file format for standardization, it came under the control of ECMA.  Then, when ECMA submitted it to the ISO, it came under the control of ISO.  It is likely that sometime in the next few months, ISO will turn to the ECMA to maintain further changes to the standard, which will mean in practical terms that the control will then revert to the ECMA.  

    On your second point, there are lots of file formats, yes, but there are none that meet the design intent of OOXML other than OOXML.  (That intent, of course, is to maintain compatibility with the older binary format files.)

  48. erik says:

    I have to agree with "Anonymous Coward" that no matter how this ends, OOXML/MSOffice is going to be just what we’ve seen with HTML+CSS/IE.

    The HTML spec is a walk in the park compared to OOXML. Yet web developers (companies, individuals, communities) around the world are frustrated with the current situation where one implementation (IE) differs from the spec and forces developers to do double work, first by checking the spec and then checking the de facto reference implementation. Vast extra costs for those who would just wanted to create, innovate, provide information and content.

    Sure, IE7 improved things a bit (alas, broke also some sites) and IE8 is promised to get things even further. But why did this happen and Microsoft has finally forced to play by the book, as everyone else has been doing for a long time already? Only one reason: Firefox market share sky-rocketing in the past few years while  Microsoft showed nothing but complete lack of interest towards IE.

    I’m dumbfounded to read some of the naive comments how OOXML is somehow supposed to be vendor neutral. One can’t create a vendor neutral standard when there’s one vendor dominating the scene already. If the dominating application/vendor differs from the standard (not matter how much) then the rest of us just have only one option: to do the double work and check our implementation both against the spec and also for the quirks needed to be inline with the dominating implementation.

    Just thought about the previously mentioned HTML/IE saga again for a moment. There are two possibilities why MS did not to make IE compliant with HTML+CSS: 1) they were so incompetent, or 2) they did not want to. Nobody sane believes the first option, really. That leaves only one explanation: MS deliberately chose to broke the standard in IE in order to get and hold dominant market position. Too bad for them, they could have kept it for years to come but neglecting IE development for too many years backfired and let competitors get even to great benefit for all the Internet surfers. But it is important to realize here that IE *seemingly* followed the standards and those who were not familiar with these issues enough told that it is all good. But it was not, as already said. Web developers, competing browser developers, tool developers, etc. have suffered horrendously due to these "slight omissions" or whatever they were called.

    Without getting too sentimental here I must wonder how much better we, the humanity, could have used all those countless man-months what we’ve wasted by chasing the tail lights of one dominating but standard neglecting application? For the sake of all what’s good in life, let’s hope we don’t need to spend next 5-10 years in that dark road again until we all finally understand that we need to built on collaboration and equality, not on something that is seen and taken as a godsend among those who don’t know the history and are thus ready to repeat the mistakes already made.

  49. jones206@hotmail.com says:

    Do folks feel that OpenOffice does a good job of following the ODF spec?

    -Brian

  50. Ian Easson says:

    Erik wrote:

    "I’m dumbfounded to read some of the naive comments how OOXML is somehow supposed to be vendor neutral."

    No one (not me at least) EVER used that expression.  I think your dumbfoundedness is a result of your mistaken understanding of what was said here.

    OOXML is a standardization of a file format created by one vendor.  There is nothing wrong or unusual in that — go to the blog by the ex-head of ECMA where he explains that standardization of commercially-created technologies is extremely common.  (He gives the example of the DVD-R and the DVD+R standards.)

    What I am saying is that:

    – The future evolution of the standard is no longer in Microsoft’s hands alone.  The proof is that Microsoft already had to make a change to its Office 2007 software during the beta period, to accomodate changes demanded by the ECMA committee.  (Did you know that?)

    – Clearly, though, Microsoft has the leading role (but not absolute control) of the two dozen or so organizations in the ECMA committee (Apple, the British Library, etc.).

    I feel for you, based on your last few paragraphs, but you (and others) have but two choices:

    – Make the best of what exists (OOXML).  Join the large and growing list of people who are re-using the valuable information in their current archive of office documents.

    – Try and come up with your own better solution, and make it so compelling that they will choose to mine their own archive of documents using your solution rather than OOXML. (In my opinion, this is the "tilting at windmills" approach.)    

  51. Katbert says:

    "The OSP is not sublicensiable, thus forbidding GPL distribution"

    Others have addressed this, saying that this is not an issue in this case (I don’t know about that, or care).

    But I have a broader point to make on this, and that is GPL is the most unfriendly license in existence when it comes to "getting along" with other licenses.  Many OSI licences are compatible with almost any OSI license except GPL, and it’s the GPL’s fault for being so stringent.  And GPL fanboys demand that every other license bend to the GPL’s guidelines; never once has a GPL fanboy admitted that maybe the GPL should bend to the other licenses guideline.

    My second point: Why is the ability to "sublicense" needed when anyone can get the license from the original vendor themselves?  If a vendor is giving "free" license to everyone, then sublicensing is not needed.  Yet this sublicensing issue seems to always come up as a reason for GPL-incompatibility, and it’s just another of the GPL’s red herrings.  Maybe GPL should be changed to read, "A license need not be "sublicensable" in order to be GPL-compatible if the license in question is freely obtainable from the original source", and that would solve this tired issue that comes up over and over.  Of course, it’ll never happen, because GPL wants all licenses to cater to it, never the other way around.

  52. Dan Kent says:

    Otherwise I agree with erik’s lengthy post but two remarks:

    – minor: I definitely would not call it "chasing the tail-lights" when other browsers tried to mimic IE’s non-standard behavior. Surely IE6 was a good product when it came out but (as said) its development was so stagnated that for few years there has been no IE’s tail-lights to chase, only non-standard features to mimic in an inferior product (just think Safari, Opera, Firefox).

    – in general: I think only the time will tell how others do in this game. If, and only if, MS Office produces fully compliant OOXML by default rather soon after the final version OOXML is blessed by ISO (if ever) then I think things are much different than with IE/HTML. But if happens so that MS Office keeps to produce OOXML with those "slight omissions" etc then I’m also fearing that the history is repeating itself.

  53. Ian Easson says:

    Brian, you probbaly have not seen it yet, but the latest blog from Jesper addresses this question, at least in regard to SVG:

    http://idippedut.dk/post/2008/01/Embrace-and-extend—SVG-revisited.aspx

    If you don’t have time to read it in detail, just skip to the bottom where he shows SVG "images" in OpenOffice.  

  54. Bruno says:

    "I’m dumbfounded to read some of the naive comments how OOXML is somehow supposed to be vendor neutral."

    ODF certainly isn’t vendor nuetral, as it’s based on the "OpenOffice.org XML file format", the native format of OO.o 1.0.  OO.o’s own website even admitted as late as December 16, 2006:

    http://web.archive.org/web/20061216025929/http://xml.openoffice.org/

    And therefore ODF caters to OO.o’s feature set and code structure.  Three of the four major ODF implementations are simply rebranded OO.o ODF implementations.

    ODF is based on OO.o’s previous format, and OOXML is based on MSO’s previous format.  Neither has the "high road" on that score.

  55. Brian Jones posted yesterday about the availability of the docs for the binary file formats of Office

  56. jones206@hotmail.com says:

    Ian,

    Great link, thanks for posting. I had heard rumors that it wasn’t really true SVG support, but never had the time to look into it.

    ———————

    Bruno,

    It was even called the OpenOffice File Format up until a few months before the finished version 1.0 of the standard. 🙂

    ——————–

    Folk wondering about the OSP and GPL issues,

    The whole point of the OSP (and IBM’s ISP and Sun’s patent statement for ODF) is that they are not licenses, they are promises not to assert patents in specific situations.  Because they are not licenses there is nothing to sublicense.  Because they are unilateral promises, there is nothing anyone has to do or agree to in order to benefit from these promises.  The promise applies equally and simultaneously to the developer of the code, the distributor of the code and the user of the application that is an implementation of any of the specifications listed under these various promises.  In these instances no one has to provide anything to anyone because they have already been provided to them in advance.

    -Brian

  57. Brian Jones posted yesterday about the availability of the docs for the binary file formats of Office

  58. erik says:

    It is a bit strange that people begin to wonder the original name of ODF when I did not even mention ODF in my comment at all. I was trying to concentrate on the big picture.

    OOXML (or ODF for that matter) are not anything unique in the world of standardization. All sorts of things have been standardized, many even successfully. Some have been vendor neutral, some not. But these things are not the point.

    This is all about the situation which is too much alike what was the IE/HTML that I could be confident that we’ll be seeing more even competition in the near future. Whether or not OOo fully follows ODF does not matter in this context because its market share is nothing compared to MS Office. Although MS Office’s market share is probably not as high as what IE had in its heyday it’s well enough to make IE/HTML and OOXML/MSO situations comparable.

    Many might make a favor to themselves by really checking how much web developers have been complaining  lately about the current situation where a double effort is needed in order to provide content and service to their customers due to fore mentioned facts. They want people to be able to access their services and content, they should not pay that much attention to the name of a browser or a dialect of HTML. But they need to do that as long as there is a de facto reference implementation in the market which does not follow the standards. And that means extra costs for both you and me.

    As long as MS Office is clearly dominant in the market and it in practice defines how the standard should be interpreted same kind of double effort is needed from all other players. And, again, that’s extra costs for both you and me.

  59. Brian Jones posted yesterday about the availability of the docs for the binary file formats of Office

  60. Francis says:

    Someone: Microsoft has already released a tool (Office Migration Planning Manager/Office File Converter) to search for and batch convert old binary files to OOXML. It’ll probably be updated to ISO DIS 29500 when the format is approved. See: http://www.microsoft.com/downloads/details.aspx?FamilyId=13580CD7-A8BC-40EF-8281-DD2C325A5A81&displaylang=en

    Brian, aside from the SVG non-support, I can’t say whether Star/OpenOffice does a good job of following the ODF spec. In fact, can anybody (outside of Sun?) If you’d really like an answer to that question, probably the best way to go about getting one would be to fork OpenOffice. Simply download the source code and replace all "OpenOffice/Sun" instances with "Microsoft." That’ll sure invite scrutiny!

  61. carlos says:

    >There were a few comments from national

    >bodies that asked about the documentation

    >of the Office binary formats and the

    >availability of those documents.

    they asked a complete, authoritative,

    Microsoft generated and accountable mapping

    between binary/legacy-formats <-> DIS 29500 XML

    is the only way another software application not named Office 2007 could achieve the desired "goal" of

    backward compatibility with the gazillions binary documents super-spreaded in the world

    By the way, do you asked permission to your boss to show this stuff? careful! :

    http://www.groklaw.net/pdf/Comes-3078.pdf

    😉

      carlos

  62. Fuzzyeric says:

    @Katbert

    What makes you think that the OSP will be available to any potential user, redistributor, or distributor in the future.  The GPL requires that whatever I need (the binary, the source, description of the build environment, *and* the right to use it) all come together in one neat bundle.  This is just sane practice.  The OSP specifically denies the ability to transfer the right to even use an implementation, much less the right to redistribute an implementation.  Under the terms of the OSP, Microsoft may assert Microsoft Necessary Claims against anyone I distribute to and anyone they distribute to unless those parties execute the OSP themselves.  There is no provision to transfer protection to those parties.  There is no promise of perpetual access to execute the OSP.  In short, the OSP is subject primarily to sudden withdrawal and at that point, no further distribution can occur under the promise "not to assert".  I’m not putting anything into this interpretation of the OSP; this is exactly what is written there.

    @BrianJones

    "This is a personal promise directly from Microsoft to

    you" does not magically apply to some random third party, either a redistributor or a user, who receives an implementation from me.  The promise doesn’t apply equally to anyone other than the second party of the promise, "you" in that document.  Note that the OSP (in whole, or as it applies to a specific item) may be withdrawn at any time and thus, by definition, no human born after that time could ever be covered by the promise.

    @…

    I get that the *intent* of the OSP is "Everyone can use our specifications.  A person will not be sued by us for interacting with an embodiment of our specifications as long as they don’t sue us.  This offer extends in perpetuity."  But that’s not what it says.

    Applies to me:  Check.

    Irrevocable to me:  Check.

    Applies to recipients of embodiments:  Nope.

    Transferable:  Nope.

    Currently available to recipients of embodiments:  Check.

    Guaranteed to be available to future recipients of embodiments:  Nope.

  63. Golodh says:

    It might be that Microsoft will actually make specifications on how to read older .doc formats available. It might be that they will backpaddle, or simply release incomplete specifications.

    Let’s hold the applause until we actually see a working implementation of a translator

    into ODF or Adobe .pdf that gets 100% of the document formatting right. And not just any implementation. Only a GPL implementation will do.

    Oh, and Miguel,

    I applaud you legal acumen. Where the official Microsoft website explains that it cannot guarantee that the terms of release will be compatible with the GPL since, as it claims, those terms mean different things to different people, you give us that assurance. Brilliant!

  64. Katbert says:

    Fuzzyeric, it’s not like you’re going to do anything with this stuff, so whether it applies to you is irrelevant.  The fact is, 99.999% of the MS haters will do nothing with this stuff, just like they do nothing with the Linux code, or Firefox code, or OO.o code, or anything else.  The vast majority of MS haters are talkers, not doers, and couldn’t code their way out of a paper bag.

    As for those that are wedded to GPL, you chose that license so if any projects use licenses that are incompatible with it, that’s on you.  Stop letting RMS tell you what to do and explore the friendlier OSI licenses that aren’t crippled due to the RMS religious baggage.

  65. Seit l&#228;ngerem existiert die M&#246;glichkeit, auch in &#228;lteren Microsoft Office Versionen (&lt;

  66. Just noticed some good news earlier this week on Brian Jones blog about new initiatives to make it easier

  67. John says:

    I notice that Access (mdb/adp) and that native data formats used other applications that come under the "Office" umbrella – eg Publisher, Visio, Project, OneNote – aren’t included.

    Why is this?

  68. Just noticed some good news earlier this week on Brian Jones blog about new initiatives to make it easier

  69. Seit längerem existiert die Möglichkeit, auch in älteren Microsoft Office Versionen (&lt; 2007) die neuen

  70. Office Quiz I’m liking Ian Moulster’s quiz on Office 2007 .&#160; How many can you get – even I struggled

  71. Office Quiz I&#39;m liking Ian Moulster&#39;s quiz on Office 2007 .&#160; How many can you get – even

  72. Brian,

    "Do folks feel that OpenOffice does a good job of following the ODF spec?"

    Yes – I actually do think that. I think OOo sticks pretty much to the "technical reality" of ODF and the ODF-XML is pretty easy to understand and navigate through.

    I have three general critiques of OOo and the way it utilizes ODF:

    1. It really abuses (in the worst possible meaning) section 2.4 of the ODF-specification to a point where true interoperability is really hard to achieve. Just look at the values stored in settings.xml

    2. The naming of parts/objects in the ODF-package is really hard to figure out. Visual representation of embedded objects are stored as files with no extension and it seems that OOo quite often saves these visual "thumbnails" as GDI+-files which I personally believe is unnecessarily complex.

    3. OOo doesn’t seem to enforce the package-reference-model by using the manifest correctly.

    :o)

  73. Dave S. says:

    @Ian,

    Try to keep up – http://www.microsoft.com/technet/security/advisory/947563.mspx

    The correct answer to my question is: no hacker will have a fix for this soon because the documentation has not been freely available. There are, apparently, no non-Microsoft parsers that can detect or repair the defect.

    An interesting point on the advisory is "Users who have installed and are using the Office Document Open Confirmation Tool for Office 2000 will be prompted with Open, Save, or Cancel before opening a specially crafted document that is attempting to exploit this vulnerability."

    Doesn’t this apply equally well to -any- file opened through the ODOCT? Yes it does.

    http://www.microsoft.com/technet/archive/office/office97/downloads/confirm.mspx?mfr=true

    "Microsoft has released a tool that, once run, will require confirmation before opening any Office document (Word, Excel, PowerPoint, or Access) launched from within Internet Explorer"

    I do recall, however, that MS was previously unhappy about other organizations offering fixes to vulnerabilities.

    The concept that, with the documentation, the fix would be easy is an interesting one. How can it be that fixing a single problem is hard and implementing a complete application is easy? Aren’t hundreds of implementations the current claim for MSO-XML? Hasn’t one of those implementors looked at their own parser?

  74. Dave S. says:

    @katbert

    Interesting argument- "it’s not like you’re going to do anything with this stuff, so whether it applies to you is irrelevant.  The fact is, 99.999% of the MS haters will do nothing with this stuff, just like they do nothing with the Linux code, or Firefox code, or OO.o code, or anything else."

    99.999% of everyone happens not to carve elephant tusks or eat endangered animals. It doesn’t mean they are unconcerned about what goes on in the world.

    In the case of software I am happier with the knowledge that the source code is available to me, because that means it is available to most everyone. Not having to rely on a single supplier is a pretty good thing. Perhaps a competitor would have brought out a Word-compatible processor that did not deliver the big red X. More likely, a fix would be posted rather handily by those motivated to repair it.

    If you don’t know about this crippling defect, try google.

    Think about this. Suppose GM really can produce ethanol at $1/gallon and opens filling stations with nozzles that only fit GM vehicles made in 2008 and later, pushing the other car companies and gasoline companies out of business. Oh yes, the filler interface is patented so no one else can legally duplicate it.

    Probably would not bother you. After all, you never built a car and don’t refine fuel, so the monopoly position occupied by a giant multinational car company that’s been starved for cash for years would not affect you at all. Never fear the starving giant that sees you as a source of sustenance?

  75. Marbux says:

    Brian Jones said: "Do folks feel that OpenOffice does a good job of following the ODF spec?"

    Yes and no. Like OOXML, ODF is pretty much a blank check for developers. Both allow pretty much unfettered extensibility, which creates a nightmare for interoperability and thus for competition, unless one were to define "interoperability" as one-way interoperability rather than two-way interoerability, an issue Microsoft lost in the September Court of First antitrust decision.

    Both specs skip the basics like specifying "clearly and unambiguously the conformity requirements essential to achieve the interoperability," as required by ISO/IEC/JTC 1 Directives, pp. 11, 145. http://www.jtc1sc34.org/repository/0856rev.pdf

    ODF and OOXML are both vendor lock-in specs. In the context under discussion, interoperability, asking how well an application conforms to a standard — when the "standard" allows vendor-specific extensions to be classified as conformant — is asking the wrong question.

    The better question is when any of the big vendors are going to take the first step toward fulfilling the market requirement of non-lossy round-trip interoperability?

    Certainly, Sun, IBM, and Microsoft have not yet taken that step. The release of the binary format specs is a smoke screen. How about releasing the specs for the intermediary formats in the Office native file support APIs? Aren’t they subject to the injunction in U.S. v. Microsoft requiring disclosure of API specs?

  76. … and then what will /. have to write about? *NOTHING*, I tell you! *NOTHING*!!! Wait, that would be a good thing, huh?! *SWEET*! Please carry on… Brian Jones: Open XML Formats : Mapping documents in the binary format (.doc;…

  77. A bináris Office-fájlformátumokkal kapcsolatos hír újra felmelegítette az Open XML kontra ODF kérdéskört

  78. Troy says:

    Ian wrote:

    "On your first point, you are incorrect.  Once Microsoft was persuaded by the ECMA head to submit the file format for standardization, it came under the control of ECMA."

    I disagree, and I think the rest of the comment ignores what is real: this is an Microsoft controlled format defined by the output and input routines contained in the Microsoft Office product. You can say Microsoft will be bound to fastidiously implement whatever the ECMA process decides, but I regard that as an illusion.

    Ian also said:

    "On your second point, there are lots of file formats, yes, but there are none that meet the design intent of OOXML other than OOXML. (That intent, of course, is to maintain compatibility with the older binary format files.)"

    I care very little for the design intent of OOXML and concentrate more on what it is. I think that is the proper tack to take since I will not want my ‘file format’ position in the future to be dependent on what ‘maintained compatibility with the older binary format files’ today.

  79. jones206@hotmail.com says:

    Troy,

    Then don’t use the format. Open XML is very clear in what it was designed to do. If you don’t have a need for it, use something else.

    We were asked by the european commission years ago to submit our XML formats to a standards body. That’s what we did. We then worked with Ecma on modifying it to ensure it was interoperable across platforms, and we had to change the Office product based on those changes.

    -Brian

  80. Quiero compartirles lo que me parece una excelente noticia, que estoy seguro beneficiará a los desarrolladores

  81. Troy says:

    Brian:

    Worry not, I will use the format whenever I have need for it.

    I simply think this change, despite being a change for the good, is oversold in dramatic fashion despite being "very clear in what it was designed to do". Not necessarily oversold by you, but over sold here, yes. *shrug*

  82. jones206@hotmail.com says:

    🙂

    I think that anytime you look at a technology that folks are trying to evangelize, it eventually will feel oversold to those following along closely. This is due to the fact that you still have so many people who are not yet fully aware and you need to reach out to them as well.

    I still have daily conversations (even with Microsoft partners) where they don’t realize we’ve moved to an XML based open file format (let alone the standardization, etc.).

    -Brian

  83. Le fantomatiche specifiche del formato binario di Office sono qui

  84. As promised last month , the binary documentation (.doc, .xls, .ppt) is now live. In addition to this,

  85. As promised last month , the binary documentation (.doc, .xls, .ppt) is now live. In addition to this

  86. Hace unos tres o cuatro a&#241;os, no recuerdo por qu&#233; motivo, Microsoft decidi&#243; ofrecer bajo

  87. Wictor Wilen says:

    The Microsoft Office Binary File Formats (.doc, .xls, .ppt…) are now available for everyone under the Open Specification Promise, OSP. This is good news for all of you working with the traditional b…

  88. The binary documentation (.doc, .xls, .ppt) is now live. In addition to this, the project to create an

  89. Brian Jones, Senior Program Manager just broke the news in his post today. Quoting from him: &quot;As

  90. Two weeks back we made a commitment to open the binary formats and place them under our Open Specification

  91. Two weeks back we made a commitment to open the binary formats and place them under our Open Specification

  92. I’m catching up with a bunch of Open XML blogging from ages ago, so apologies if some of these are old