Links 09-28-2007

Well I thought I would have time to work more on the FAQ the past couple weeks, but that was naive of me. I’ve had zero free time, and had to struggle just to get a few free minutes to pull this post together. Hopefully things will slow down soon. (hAl, thanks for sending me the suggested changes in e-mail, I’ll get on it soon)

I had a few interesting links I wanted to point out today, some of them are a bit older but I still thought they were good to call out:

  • Jody Goldberg talks about implementing ODF and Open XML – I may have spoke a bit too soon in referring to Gnumeric’s support of OpenXML as being really rich, but that’s slowly changing. Jody talks about how he was able to pretty easily add in Open XML’s charting recently. There are some great discussions down in the comments of the post.
  • Article on Open XML interoperability between Linux and Windows environments – I saw this link on Doug’s blog the other day and thought it was good to call out.
  • Florian Reuter gives a brief update from the conference – I talked to Florian about the conference a bit beforehand but haven’t had a chance to sync up since. It sounds like it went really well though. Florian has been involved in a number of the ODF <-> OpenXML interoperability discussions, including the DIN work.
  • GemBox support for XLSX – Another product has built in Open XML support. According to their site it, “provides easy and high performance way to write, read or convert native Microsoft Excel files (XLS files in BIFF format, CSV files in text format or XLSX files in Open XML format) without the need for Microsoft Excel on either the developer or client machines. GemBox.Spreadsheet Free comes free of charge“.

Well, that’s it for now.

-Brian Quote of the Day:

BT Sistemi s.r.l. – Italy

“The integration of information sources (internal and/or external), which will allow more streamlined processes, increased productivity, and the creation of new business opportunities, will be greatly enhanced by the adoption of Open XML as a standard.”

– Pasquale Faccaro – Managing Director

Comments (22)

  1. luke says:

    Hi Brian, I have a question you should add to your FAQ page, as I’m sure it’s on everyone’s mind:

    "Why does Office 2007 not support opening and saving ODF documents in the same fashion it does RTF, for example?"

    Thanks in advance!

  2. says:

    Good call Luke. I’ll add that to the list.


  3. John says:

    Hi Brian,

     About a month ago or so, you said that the .doc binary formats were freely available and gave a link.

     I’ve been trying since then to get them, but have been having some troubles. They seem to want to post paper copies to me, which is kinda useless.

     I’ve tried to ask them if I can get them in some non-paper format, but it takes a long time to get a reply.

     Can you please clarify whether it is actually possible to get access the documentation for the .doc etc binary formats, in an online format ?


  4. hAl says:

    [quote]They seem to want to post paper copies to me, which is kinda useless.[/quote]


    They might not really want people to easily publish them on the internet which is probably what would happen if they spread them digitally to everyone.

    However for implementations use, a paper version should in general suffice and you are allowed to make copies as far as I remember.

  5. John,

    Well – consider yourself lucky … I have heard absolutely nothing yet since mailing the physical , signed letter to Redmond. Does anyone know what kind of processing time is to be expected from this?

    I seem to remember something in the aggreement that Microsoft reserves the right to send the documentation in binary form … boy do I hope they actually make use of that right.


  6. hAl says:

    Brian, do you know if Microsofts new on line Live Office web office suite will be using Office Open XML as its default format?

    I cannot find it in the faq:

  7. A says:

    We got the binary documentation (in paper format) very quickly after we sent in the request.  Probably 1-2 weeks at most?  Of course an electronic version would have been nice, if only for the ability to search, but I’m not complaining.  Now we should be able to figure out what some of those undocumented SPRM’s are for 🙂

  8. John says:

    Ah I got a reply from them:

    "The BIFF documentation is only hard copy.  It has to be tracked by serial number and received only after a signed agreement."

    Why am I never at surprised at how low Microsoft will go? heh.

    Anyway, I’ve replied asking if I’m allowed to scan it and put it online.

    If it’s okay, I’ll try to use the university scanners and put it up somewhere.

    To "A" – is the documentation in a form that would be easily scanned and OCR’ed?  


  9. hAl says:


    The binary format isn’t an open standard and does not pretent to be.

    So I do not see what you are complaining about. It is just a format that you can use on a free license that give you certain copyrights and patent rights on implementing the format.

    A hard copy suffices for that. If it is not ment for you to publishing it on the internet why would you do that ?

  10. John says:


     Why do you think that a hard copy suffices for that?

     In open source coding, it’s much better if all the coders have access to the documentation.  If you are implementing a file format, it’s best to be able to comment the code with a reference to the part of the spec that states it, and so on.

     If Microsoft does not want it on the internet just because they want it to be inconvient to get at, then that seems like even more of a reason to put it on the internet.

  11. nksingh says:


    There’s an obvious (non-malicious) reason why MSFT doesn’t want the docs to be so freely available: it’s easy to make corrupted documents and to destroy user data by incorrectly editing Office files.  You might even be able to craft files that crash or compromise older versions of Office.

    It’s not particularly "low" of Microsoft to give away its format documents freely with conditions that prevent wide, uncontrolled dissemination.  You might be breaking the terms by putting the docs online, so I’d read those terms carefully before taking any moves that would get you in trouble.  

  12. hAl says:

    <blcokquote>If Microsoft does not want it on the internet just because they want it to be inconvient to get at, then that seems like even more of a reason to put it on the internet.</blockquote>

    I think the how low can you go comment does not exactly applies to MS in this case.

  13. Dave S. says:

    nksingh – how does having information make it more likely to damage documents than, say, trial and error poking and prodding?

    The more interesting question is this – does every developer get the same documentation from MS in hard copy? That would be a great reason for MS not to want proliferation from an online source – individualizing document content. Everyone looking online could find if the docs changed.

    Serial numbering is a way that documents are controlled – that is, if an update is issued one can determine who needs the update. Since these are ‘legacy’ formats, that should not happen often.

    So far his easily obtained documentation has not been obtained.

  14. nksingh – are you saying that the old binary formats should not be disclosed in public forums.  This is just plain silly. How can that be refered to in an ISO wannabe standard. Dangerous to let people see and talk about theses things is it? They might get strange ideas or do things that might hurt them. We don’t want to let that happen, do we?


  15. We actually received the documentation on the day I made my previous post here (October 1st 2007) but I have yet to see the amount of papers received.

    We need it to get a better view of how the file format for the binary files is put together (not the document format, but the file format – OLE2 Compound Files). Granted, it would be convenient to have it in electronic form, but we don’t really have that requirement.

    And for those of you complaining about the terms for getting the specification: Then just don’t get the specification. The spec is Microsofts intellectual property so they actually get to decide what to do with it. Please remember that open source and free software is about being able to make choice … both as a software consumer as well as a software maker. Here Microsoft chose not to put the specs on . I do not necessarily aggree with with them on this – but it is their choice … so live with it.

  16. John says:


     So by your logic, OOXML is even more dangerous because everyone can download the specification and read it online?

     The old adage of that your software will have less bugs if you see the documentation.  Or something.


  17. nksingh says:


    The binary doc format is not an ISO standard.  There’s a significant difference between OLE structured storage and OOXML.  

    Remember, the software world is not all about Open Source.  Microsoft could do something that’s helpful to competitors and others who wish to view Office documents without releasing something that’s amenable to Open Source development.  And this is not necessarily a bad thing.  That is their right as owners of the IP.  There is no right to Free Software; Stallman’s declarations are simply a manifesto and are binding upon no one.

  18. nksingh says:


    The OOXML document parsing works an entirely different way from parsing a binary document.  These days, everything is carefully validated if it’s coming from a file, but there was a time at Microsoft (and many other places), when developers were content to just trust elements in the file like offsets and size fields.

    And before you accuse Microsoft of shoddy security practices, apply a file fuzzer to current and previous versions of OOo, WordPerfect, and SmartSuite… no one is immune from such practices.  

    This problem can’t happen so easily with XML because the data is parsed much more before being read into in-memory structures.  Also, you can destroy Word’s ability to read a DOCX, but the user’s data is still reasonably recoverable just by opening  the zip.  If your DOC program messes up a binary file, it’s a lot harder to get the pieces of text out.  

  19. John says:


     Right.  So you are arguing that its better for programs to blindly guess as the file format and try to reverse engineer it, rather than having easy to access online documentation.

    That makes sense!

  20. Andrew says:


    Thats funny, because I can’t make any sense of your last statement.