The myth of the Binary Key


I don’t know where this myth began, but I have seen enough reference to it at this point that I think it’s time to call it out directly. There is no such thing as a binary key that you need to unlock the Microsoft Office XML formats. They are just pure XML files that are fully documented (have been for awhile now). This isn’t something where I’m asking you to just trust me; instead you can go and look for yourself. Take Office 2003 and save any Word document you have as XML. Now open that file up in a text editor and take a look. (If you don’t have your own copy of Office 2003, try this free lab online that let’s you play with the XML functionality).


I’m trying to figure out how this rumor was started, and I have a couple ideas, so let’s try and track this down. Let’s talk a bit about the format so that you can understand what’s there. Take any XML file saved from Word 2003:



  1. Processing Instruction: As I discussed in this post (http://blogs.msdn.com/brian_jones/archive/2005/07/07/436647.aspx), if you try to open the file in IE, it will most likely be redirected to Word for opening because we put the following declaration in a processing instruction at the top of the file: <?mso-application progid=”Word.Document”?> If you want to open the file in IE, you’ll need to delete that PI. Is that the mythic binary key folks are talking about? It doesn’t affect the way the file is displayed. All it does is tell the shell that Word can open the file.
  2. Pretty Printing: As I discussed in this post (http://blogs.msdn.com/brian_jones/archive/2005/06/23/432018.aspx), if you open the file in a text editor, you’ll see that it’s pretty hard to read because we don’t “pretty print” the file. You’ll either need to remove the PI and open in IE, or open in an editor that has pretty printing built in (like FrontPage or Visual Studio). Maybe this is what has confused people into thinking there is a binary key? It’s obviously not though, it’s just a way of laying out the XML to make the files more efficient.
  3. Objects: Word allows you to embed images; video; ActiveX controls; and OLE Objects. These are all foreign to Word though, and when they are stored they need to be stored in their native formats. In 2003, we base64 encode them and store them in a binary tag. For Office 12, since we are using a ZIP package to store the files, we can just keep them as separate binary files within the ZIP (so a JPEG will just be a separate .jpg file in the ZIP). I really doubt this is the “binary key”, since it really isn’t even owned by us. Any format you create will need to store foreign objects, unless the application decides it’s not going to support those features.
  4. Handful of obscure legacy features: There are a handful of obscure legacy features where certain pieces of the data are stored in a <binData> tag. We did this because of resource constraints when building the original XML file. An example of this would be some of our old legacy fields. We just weren’t able to get to them, but we only did this for features where the use of them was very, very low. For Office 12, we’ve done the extra work so that even these features are now represented in XML. So if this is the binary key, then it will go away, but I highly doubt this would be the “binary key” people talk about as it occurs so rarely.
  5. VB Project: If you have code that is embedded in your document, that would also be stored as a binary object. This is an area I can understand that some folks might want to see stored as text, but we didn’t go that route. In fact, we’re moving away from storing code directly in the files in general, as I’ve already discussed in an earlier blog post (http://blogs.msdn.com/brian_jones/archive/2005/07/12/438262.aspx). The default format won’t even have these objects so if this is the “binary key”, it’s going away. I highly doubt this is the “binary key” though as it has nothing to do with the document itself, just with solutions that run on top of the document, and the majority of documents out there don’t have it anyway.
  6. Namespaces: Someone commented in my last post that the Office files have namespaces in them and if you change the value of the namespace the file behaves in a goofy way. Anyone familiar with XML knows what’s going on here, but I understand that a number of you are new to XML. Namespaces are a very important part of the XML standard. They allow you to identify what type of XML you are working with. If it weren’t for namespaces, it would be very difficult to work with XML files unless you had control over everything (their creation, storage, and consumption). The point raised here though is really an interesting one. Notice that if you change the namespaces around, Word can still open the file. This is because we support opening all XML files as a result of our custom defined schema support. You can take a WordML file and add your own XML tags in your own namespace, and we’ll support opening them, validating them while the file is being edited, and saving them out. The namespace issue obviously isn’t a “binary key”, and it’s one of the major building blocks of XML.
  7. Byte Order Mark (Unicode) [10/18/2005 – I added this one after it was brought to my attention by Dare]Dare points out that it could be that some folks unfamiliar with Unicode are having problems with the unicode BOM :



I wouldn’t be surprised if the alleged “binary key” was just a byte order mark which caused problems when trying to process the XML file using non-Unicode savvy tools. I suspect some of the ODF folks who had problems with the XML file would get some use out of Sam Ruby’s Just Use XML talk at this year’s XML 2005 conference.


My theory is that the “binary key” idea came about because someone just took a quick look at file format without really doing their homework. For example, if you combine #2 and #3, you would probably see a binary blob in most files that appears to be at the top. The reason for that is that if the file has a image or some other kind of object in it, and since the file isn’t pretty printed, the first line break would come from the base64 encoded data. That would mean that it would look like there is some binary data right at the top. The weird thing here though is that some of the folks that were saying there is a binary key supposedly spent a lot of time looking at all kinds of document formats and investigated them in order to create a universal file format capable of representing every document that ever existed. I would think they would have looked a little closer and seen that there really isn’t a “binary key” to unlock the documents. They are already unlocked.


To learn more, go check out the documentation. It’s up there for free and anyone can download it. Or play around with the free labs. Or read my “Intro to Word XML” posts. The easiest way for us to have good discussions on these topics is for everyone to actually look into it themselves rather than relying on random news stories. I understand not everyone has the time to look into it, but unfortunately there is a lot of false information out there.


-Brian

Comments (45)

  1. I think one of the things that bothered people about the XML format – though not, perhaps, this mysterious ‘binary key’ – is that the reference schemas you point to are available only packaged as an MSI file – these aren’t executables or a full-scale software deployment, so why should they be packaged in a properietary, Windows-only format? This sends a rather odd "You can only read the documentation if you’re running Windows" message, which could be easily avoided by packaging the files in a platform-neutral format.

    I’m guessing the reason for this packaging is to allow the files to be signed and verified, but I think it just generates pointless hostility without providing much value.

  2. BrianJones says:

    Hey Avner, I hear you on that. Did you see that for the Office 12 previews we actually provided two alternatives for the documentation? We provide the .msi file, but we also have a ZIP file that contains the XSDs and HTML files. Check that out and let me know if you like the format better: http://www.microsoft.com/downloads/info.aspx?na=46&p=2&SrcDisplayLang=en&SrcCategoryId=&SrcFamilyId=15805380-f2c0-4b80-9ad1-2cb0c300aef9&u=http%3a%2f%2fdownload.microsoft.com%2fdownload%2fb%2f5%2fb%2fb5b64679-4d6b-43ec-ba50-5891ca11cf15%2fOffice12XMLSchemaReference.zip

    The main reason they had decided to go with that other approach back in 2003 was that most documentation we provide in Office had been done that way. They create a a .chm file which is what the msi installs (in addition to the .xsd files). It’s typical for all of our help, and it gives you a pretty cool UI for navigating through all the topics. It of course has the negative impact of not working on all platforms which is ironic given the fact that we want to allow everyone the ability to work with these files. So as you said, it does send a rather mixed message. I’m sorry about that.

    I’ll look into it some more and see if we can backport the solution we used for the O12 schemas and provide it for the Office 2003 schemas as well.

    -Brian

  3. Todd Knarr says:

    Is it possible the "binary key" they’re referring to is the UUID stored in the XML headers? It doesn’t appear to have anything to do with the document, but if it’s not used for something it wouldn’t have been included. If that’s the case, then as far as I can tell it can be removed or just ignored safely.

  4. BrianJones says:

    Hey Todd, that’s actually just another namespace declaration. It’s saying that "dt:" prefix maps to that specific namespace. You’ll see that we don’t really use that prefix on many (if any) elements.

    Namespaces are just essentially URIs, so you can use a URL as we do for some of our namespaces, but it isn’t required.

    BTW, in Office 12, we actually aren’t using that namespace anymore anyway, so if folks did have an issue with it (which I can’t really imagine), it will go away.

    -Brian

  5. Hi Brian, I’ve recently written about a problem with undocumented binary data which I believe will continue to exist in the Office 12 XML formats:

    http://sixlegs.com/blog/java/please-document-emf-plus.html

    It would be great if you could look into this.

  6. Eduardo says:

    Phil Windley on Microsoft, ODF, and state governments:

    http://blogs.zdnet.com/BTL/?p=2024#comments

  7. Eduardo says:

    Berlind:

    http://blogs.zdnet.com/BTL/?p=2020#more-2020

    Hey you guys, I am posting all these pro-ODF links, why don’t you post some anti-ODF links? I mean, let’s make things even.

  8. n4cer says:

    Until Brian gets the 2003 documentation in zip format like the Office 12 format documentation, interested parties can check the online SDK for information on Word 2003’s XML format:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wordxmlcdk/html/welcomewordcdk_hv01147170.asp

  9. Todd Knarr says:

    Brian: that would explain it, because outside of the Windows-specific world a UUID *is* a binary key (AIUI used to look up the actual item referred to in the registry when there may be multiple entities with the same name that need to be uniquely identified). I’ve seen the "dt" namespace used in Office 2003 documents, usually it seems to be attaching a datatype to things so I’d immediately conclude interpreting it correctly was fairly important. There’s no references to that UUID in the Office 2003 XML docs I can find, though. So to someone not part of Microsoft, it appears to be a binary key referring to an actual data-type schema but there’s no documentation anywhere on how to find the schema this key refers to. Without explicit documentation to the contrary I’d assume I needed to know the schema, otherwise it wouldn’t be referred to in a namespace declaration in the document.

    I’d have to look at an Office 12 document to tell whether there’s variations on this same thing in them.

  10. mystere says:

    The MSI file format is really just a cab file, and there are tons of open source cab extractors out there. For example cabextract:

    http://www.kyz.uklinux.net/cabextract.php

    The MSI file isn’t preventing anyone from extracting the files, unless they just want to complain.

  11. BrianJones says:

    Chris, thanks for bringing that to my attention. I’m talking with the teams who own those formats to see if I can get some more information for you. I’ll try to get back to you in the next week or so.

    Todd, the only place we use that namespace is for custom document properties. For the 2003 schemas we decided to stay consistent with what we had done in HTML for properties. We’re actually changing this in Office 12 though.

    We use it for specifying what the data type is for a custom document property. For example, if I created a custom property called "Brian" who’s value was "foobar", the XML would look like this:

    <o:CustomDocumentProperties>

    <o:Brian dt:dt="string">foobar</o:Brian>

    </o:CustomDocumentProperties>

    -Brian

  12. Wondering.... says:

    I am curious if this comment on Groklaw explains where the myth of the binary key originates:

    http://www.groklaw.net/comment.php?mode=display&sid=20051016105739574&title=What%20Binary%20key%3F&type=article&order=&hideanonymous=0&pid=369059#c369217

    As I read his point (less the rhetoric) is not that you need a binary key to just access the document, but that one cannot do usefull transformations of the information in the document without the aledged binary key.

    ???

  13. Todd Knarr says:

    Brian: what you say only convinces me further that that UUID is the "binary key" being referred to. A simply publication of the schema and notes in the Office 2003 XML documentation about how to identify the schema for the "dt" namespace should suffice to clear up the problem.

    I’d also note that one of the conventions behind using URLs to identify schemas associated with namespaces is that canonically the XSD itself can be retrieved from that URL. This lets programs that don’t natively know the schema retrieve and process it. MS doesn’t appear to do this.

  14. BrianJones says:

    Wondering – I think you’re right that the myth is you can’t do transformations without the "binary key." The reason that’s not true though is that there is no "binary key." Anyone can come along and build transformations, as I’ve been posting about for the past few months. Everything is represented as XML and fully documented. I think unfortunately this myth is being spread because people just haven’t taken the time to look into it and instead are going off of assumptions.

    -Brian

  15. BrianJones says:

    Todd, I don’t really see how that namespace could be the "binary key" seeing as how it’s fairly irrelevant for parsing, consuming, transforming, and generating documents. According the articles this "binary key" somehow prevented people from writing transforms into and out of our format. The dt namespace just describes what the datatype is for custom document properties (that’s hardly a "key" to the document). Custom document properties only exist for a document if you go to File -> Properties -> Custom Tab, and then add a property yourself.

    You’re right though that we should document it. Here’s basically what the schema would be:

    <?xml version="1.0" encoding="UTF-16" ?>

    <xsd:schema targetNamespace="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"

    xmlns:xsd="http://www.w3.org/2001/XMLSchema&quot; elementFormDefault="qualified"

    attributeFormDefault="unqualified">

    <xsd:simpleType name="dtType">

    <xsd:annotation>

    <xsd:documentation>Defines the datatypes of custom properties</xsd:documentation>

    </xsd:annotation>

    <xsd:restriction base="xsd:string">

    <xsd:enumeration value="string"/>

    <xsd:enumeration value="dateType.tz"/>

    <xsd:enumeration value="float"/>

    <xsd:enumeration value="boolean"/>

    </xsd:restriction>

    </xsd:simpleType>

    </xsd:schema>

    In response to your second point, you are right that many folks use URLs for their namespaces and then either have the XSD, or links to resources at that site. We use URLs for most of ours and will continue doing so as we move forward. We haven’t yet set up a site though with resources for those URLs. Instead the schemas are all on msdn right now. I’ve been somewhat pushing for us to do this for some time (it’s never really been at the top of my list though), but we just haven’t pulled together the resources yet for managing the site.

    -Brian

  16. Todd Knarr says:

    Brian: the UUID may not be a "binary key" the way you’re thinking of it, but it sure looks like one to someone outside the Windows internals world. It’d be similar to you seeing ‘xmlns:dt="xlu:F37A019D"’ in the xsd:schema tag. That number obviously means something or it wouldn’t be there, but you’ve no idea what to *do* with that number unless I happen tell you it’s just a random number that always refers to a certain schema, or that you use it as a 4-byte unsigned integer (x = hex, l = longword (32 bits), u = unsigned) key in a particular database and the value for that key’s the URL to the schema.

    I’d add one really important thing to the documentation of the "dt" namespace: what it’s for. If it’s just for attaching type information to user-defined properties, say so in the docs. That’ll make it clear to people when they need to worry about that namespace and when they can safely just ignore it. I’d note that this is a regular complaint about Microsoft formats: there’s *always* something in there that isn’t documented anywhere, and you can’t tell if you can just ignore it or if it’s actually important. Eg. the Windows credentials in the optional credentials field in the Active Directory Kerberos implementation: it was filled in with something, there wasn’t anything anywhere on what was being put there (at least until MIT’s lawyers hauled out the LARTs), and ignoring it like every other Kerberos implementation did (it was an optional field, after all) caused Windows clients to fail for no readily-apparent reason.

  17. BrianJones says:

    I hear you Todd. It’s definitely important to make it as clear and straightforward as possible. We aren’t going to have this namespace in the O12 schemas, so it shouln’t be an issue going forward. I just wanted to make it clear that even for 2003 it isn’t really an issue as it has a small presence and it’s only used as I described.

    Going forward I’ll also talk to folks to make sure if there are other things like this they get properly discussed and documented so that it’s clear there isn’t a "key". 🙂

    -Brian

  18. orcmid says:

    Todd, that is the way schemas have been found for years, and there is no requirement that a URL with a real schema-related resource be used. Look at the namespaces used for OpenDocument, for example, especially the urn-based ones.

    [And I agree with Brian it is really great to have a resource that provides something authoritative about the namespace, such as the schema and anything else useful.]

    It would be splendid if someone who ran into the problem of getting styles to convert would show what it was that they couldn’t find that they thought was in a binary key. I’d like to see the key too. I don’t understand why that’s so hard. It would allow this to be cleared up.

  19. Eduardo says:

    Gary Edwards has another comment on the binary key. His position seems to be that there is one in 2003 ML. He thinks maybe it is going to be gone for Office 12 XML, but he doesn’t care because he is doing SOA, and the vast majority of microsoft desktops are not even on Office 2003.

    http://www.groklaw.net/comment.php?mode=display&sid=20051016105739574&title=Arent+the+Open+XML+spec+available+on+the+MS+website%3F&type=article&order=&hideanonymous=0&pid=369237#c369577

  20. James says:

    Are you suggesting the new MS XML format will be free for all to use? Will I be able to read/write to this "innovative" format with no patent or license worries?

    <p>

    OpenDocument is much better. With OpenDocument, I don’t need the permission of MS to read/write a freaking document that I OWN the copyright on!

    <p>

    How can you stand by this MS "innovation" when MS didn’t innovate at all. MS office is not the first office application to use an XML format (they are basically the last). MS is not the first company to use XML (they are basically the last). Yet the lawyers and executives at MS have found a way to patent using a basic technology and try to continue to lock people out of competition.

    <p>

    Is MS that afraid of competition that it uses lawyers instead of programmers?

    <p>

    I don’t hate all the products of MS, however I do hate all the business practices of MS which causes me to not want to use _any_ product of MS.

    <p>

    If the MS execs have ever wondered why people get behind OSS, it is because it is something they can believe in and something that belongs to everyone. It is something that most importantly follows OPEN STANDARDS and doesn’t bastardize the standard so that only MS software can use it. For MS to get a loyal following like the OSS community, it is going to take a lot more than seeing a fat, sweaty Ballmer dancing on a stage.

  21. Marcos Valerio says:

    Did you read this paper

    http://www.ccianet.org/modules.php?op=modload&name=News&file=article&sid=566

    http://72.14.207.104/search?q=cache:I3OAWmMjWfcJ:www.ccianet.org/papers/CCIA-XML.pdf+ccia-xml&hl=en

    http://www.ccianet.org/papers/CCIA-XML.pdf

    ?

    CONCLUSIONS

    XML can be a powerful tool for achieving interoperability. The support of XML as a data

    description language and the use of XML schema for application file formats is gaining

    widespread acceptance throughout the computing industry.

    While Microsoft has released a definition of the XML schema used by their Word 2003 and

    Excel 2003 applications, these disclosures clearly lack information which is necessary for

    interested parties to achieve complete interoperability with Microsoft Office 2003’s entire

    feature set. Despite the fact that Microsoft promotes these disclosures as a prime example of

    their interest in supporting interoperability, the disclosures are incomplete and therefore

    effectively unusable; as a result, they have very little value as interoperability tools. Further, if

    these disclosures are being promoted as interoperability tools, but if in reality they cannot be used as such, one might wonder about the true motivations behind the disclosures, and indeed if

    those motivations have anything to do with interoperability at all.

  22. BrianJones says:

    Eduardo, based on those articles, it would appear that Gary Edwards has not fully looked at the XML formats. As I’ve repeated numerous times, there is no such thing as a "binary key" that needs to be reverse engineered in order to support Word documents. I’m not talking about only Office 12, this is also true with Office 2003. Please go have a look for yourself and give me some examples of what doesn’t work. I’d really like to get your feedback. If there is something that you don’t like, please let me know!

    -Brian

  23. BrianJones says:

    James, I’ve never suggested we were innovative in deciding to use XML to represent our file formats. The whole reason we’re doing it is that XML is a wide spread standard and that allows people to easily access our files. You should know though that we’ve been using XML to represent pieces of our files since back in 1997 when we first started working on the Office 2000 HTML support. The latest move to default formats is just part of an evolution.

    On top of that, I don’t know of any other Office software packages out there that have even close to the level of support for custom defined schema. That’s pretty powerful functionality, since it allows *you* to use your own schema definitions to mark up the files that contain your data.

    There is a royalty free license that allows you to freely work on any of your files. That license is perpetual and we’ve publicly committed to providing the license from this point forward, so you’ll always have access to it.

    -Brian

  24. BrianJones says:

    Marcos, did that article lead you to believe that our formats aren’t interoperable. The examples of what is inaccessible is actually pretty weak (other than the obvious point that we haven’t yet fully XMLized Excel, and there is no XML format for PowerPoint yet). I completely admit there wasn’t a complete story for Excel or PowerPoint in 2003, but Word’s XML support was close to 100% and was fully documented. The examples raised in the article are pretty obscure. The only two things mentioned are embedded macro buttons, and embedded objects.

    Let’s talk about the first one. What percentage of documents out there have embedded macro buttons? We’re going to be even better about representing everything as XML in 12, but in 2003, you already have almost everything there, with only a handful of the more obscure features missed.

    The second example of embedded OLE object is just how OLE embedding works. Since Excel’s XML support wasn’t full fidelity in 2003, it would make no sense to persist an embedded Excel object as XML. An embedded object’s persistence is determined by the OLE server, not the container. That said, in Office 12, we’ve actually done the work so that embedded Word, PPT, and Excel files actually will be stored in XML, so this will be a non-issue.

    I’ll continue to explain that there is nothing preventing interoperability with pretty much any document out there. Like I said there are a couple minor things that are being added for Office 12, but for someone to claim that’s a "binary key" breaking interoperability in Word 2003 is just showing that they haven’t actually looked into it (or they are really stretching).

    If anyone has examples of serious interoperability problems they’ve come across, please let me know. I really want to dig into this issue.

    -Brian

  25. Yuki says:

    I don’t understand myself why this key thing is becoming enormous. Everyone knows the problem with anything by Microsoft are the licenses and business practices that are designed to lock you in Windows and Office, excluding anyone else. There has never been any other problem with anything else. Of course MS is widely known to produce horrible software that is _still_ prone to malware infection but who cares? Writing code that sucks should go straight into the list of the human rights. That counts also for Windows’ poor scalability and ridiculous uptime. Who cares about MS code? Who cares about binary keys? I understand everybody loves a conspiracy but let’s focus on what’s important, namely figuring out a way to stop them from excluding everyone else from the market. Save in ODF, use Linux/BSD/whatever and trash hardware that works only with Windows.

  26. Eduardo says:

    Brian, I don’t have a Windows computer, and I don’t have the technical knowledge to figure this one out myself. I’m not sure who is right on the issue, so I’ve asked Gary Edwards to look over the discussion here and make a comment.

  27. BrianJones says:

    Eduardo, if you have the time, you should check out some of the example files out there, and also play around with things like the XSLT we have that transforms from WordML into HTML. It’s a pretty useful tool.

    I’m not sure if you are able to get the online labs working, but if you have the time you should try those out. That in combination with the "Intro to Word/Excel XML" posts I’ve made should really help improve your knowledge on the subject.

    Also, let me know if there are some more "Intro to XML" type topics you’d like me to post on. As I’ve mentioned before these are still relatively new technologies and I want to help everyone understand what’s going on. It’s easy to talk at a really high level about the benefits… I’d like to help everyone understand the actual details.

    -Brian

  28. Ralph says:

    To mystere,

    I tried to download it, and it is not a cab file. It is an msi file. I don’t have a supported system for the msi, so I just draw the natural assumption. Microsoft only wants to pretend the schema is open. Otherwise, why would they make the documentation available in such a manner. You seem to be missing a point. Microsoft is trying to claim that its format is open. They have to convince the world. The traditional Microsoft tactic of requiring that people come grovel before them and take what they wish to share just won’t work. Brian has been told many times about this problem and no changes have been forthcoming. I conclude that the pretense of an open format is just that, a pretense.

    Good day,

  29. Ralph says:

    To mystere,

    Exev if I knew how to transform a msi file into a cab file(Is it just a name change or is there more?), the authors own web site for cabextract would stop me. He says it may be illegal to do so. Maybe he is wrong, but if he thinks that, why should I chance it?

    Good day,

  30. Ralph says:

    to BrianJones,

    Great, thank you. I missed that before somehow. I downloaded it and I will review it.

    Have a great day,

  31. BrianJones says:

    No problem. Have a look and let me know if you have any questions. They are an early preview of the Office12 schemas so there is still a ton of information that we’ll be filling in as we get closer to shipping, but it should serve as a great start.

    -Brian

  32. orcmid says:

    Good for Dare. I thought of that and shrugged it off as not something that fit into the "explanation" offered in the Groklaw account. But it seems Dare has found a deeper source for the problem, too.

    Although the practice of using a BOM on UTF-8 is not widely known, and I always thought it was a mistake, the XML 1.0 3rd edition specification of 2003-10-30 recognizes it in section 4.3.3 and in non-normative Appendix F.

    What’s funny, of course, is that any creation of an OpenDocument XML file in UTF-8 and saved from Notepad will have that very same "binary key," assuming that is what happened. And of course, Office Open XML (Office "12" flavor) can get it that way too.

  33. Dean Harding says:

    I was just reading that comment by Gary Edwards, and this sentence:

    "although i sometimes wondered if people know there is a difference between the traditional MS "binary formats", and the "binary key" that is in the header file of every MSXML file."

    From the use of the term "header file" makes me think he’s talking about the BOM. I don’t think the XML spec actually mentions whether a BOM is permissible in an XML document (certainly it doesn’t say anything about UTF-8 XML documents, which clearly don’t need a BOM anyway) but at the same time, it also doesn’t say they’re *not* permissible…

  34. Dean Harding says:

    Oh, I stand corrected, orcmid. Apparently it does explicitly say that a BOM is OK. Well, that just re-enforces my point anyway.

  35. orcmid says:

    Yes Dean, makes you wonder what happens if you use UTF16 and BOMs with OpenDocument XML files. I don’t think I’ll be trying that.

    I think I will add that to my request for clarification about XML prologs on the OpenDocument comment list though.

  36. BrianJones says:

    Eduardo, any luck yet on contacting Gary Edwards and finding out what he was talking about? I really want to know what this binary key is that he’s been talking to so many different folks about. Just this weekend I was looking around at some other sites and saw a bunch of references to this "binary key", yet noone had more detailed information.

    It was actually pretty funny. On one site (that one run by the guy with a paralegal background) someone even posted a fake Word XML document and folks jumped on that as being the binary key. I think the initial post was just as an example of what a binary key would look like, but not everyone got that.

    Anyway, I really do want to find out where this myth started since it’s being referenced all over the place and I’d like to find out if there is something I’m missing (so we can fix it).

    -Brian

  37. David says:

    Why don’t you just write Edwards an email yourself and ask him to leave a comment here? I found his email address via google, it is gary.edwards@OpenStack.us. Funny enough it was in an email in an mailing list archive where he mentioned your blog, so I assume he knows who you are.

  38. Eduardo says:

    Brian, no reply yet from Gary, or Florian Reuter, who I also e-mailed.

  39. BrianJones says:

    Thanks Eduardo. I just sent Gary an e-mail as well to see if any of the suggestions in the post were what led to the confusion.

    -Brian

  40. BrianJones says:

    Last week I sent this to gary.edwards@OpenStack.us but haven’t heard back:

    Hey Gary, I was wondering if you could help me understand what the Binary Key in the MS Office XML formats you’ve been referring to is. Is this something you’ve seen in the WordprocessingML format for Word? Or is it in the SpreadsheetML format from Excel?

    I posted some thoughts on my blog about what the misunderstanding might have stemmed from and I was wondering if any of those sounded like the culprit: http://blogs.msdn.com/brian_jones/archive/2005/10/17/481983.aspx

    I’d like to get this resolved soon so that we can make any corrections needed. Obviously the goal when we first started moving towards XML back in Office 2000 was to represent our data in an open and interoperable way. I feel like we’ve finally achieved that and would hate it if we somehow overlooked something that’s as big as what you’ve been saying.

    Thanks for your help.

    -Brian

  41. David says:

    Brian,

    I think this was a very reasonable, if not to say friendly, email. Just the right tone. I also really hope you will get an answer back at some point, since this seems to be such a silly discussion, that one should be able to settle in minutes, if everyone is interested in resolving this.

    If you ever get an answer, could you please post it as new blog article? I am getting a bit tired of checking the comments on this one, but would like to know if something comes up here.

    Best,

    David

  42. BrianJones says:

    Hey David, if I hear back I’ll create a new post, so you don’t have to keep checking back here. 🙂

    -Brian

  43. no one of consequence says:

    Could this be (close to) the origin of the binary key meme?:

    http://theequityexchange.com/OpenStack/docs/XML%20Security%20and%20XMP%20Metadata%20Header.html

  44. OK, forgive the random Sneaker Pimps reference and I promise we will move off this topic of ODF politics…