A few interesting links


Sorry for just posting links the past week or so. I haven’t had much time to pull anything new together. Hopefully I’ll have some more free time when I get back into the office next week.

Can anyone be objective about Open XML?

Great blog entry from Kyle McNabb of Forester where he drills into the real motivations behind some of the anti-OpenXML pressures you’re seeing out there. He asks folks to see through the motivations of specific companies and form their own opinions. Here’s what he says about the main players:

  • IBM has a vested interest NOT seeing OOXML adopted as an ISO standard.
  • Microsoft Office 2007 needs OOXML.
  • Without ODF as the leading standard, Sun and OpenOffice.org have little to stand on.

Convert your paper documents into Open XML files

I was really excited earlier this week when I saw the news from Nuance Omni where they now have the ability to scan paper and recognize shapes, text, etc. and then generate an Open XML document out of it. This is really awesome. One of the great benefits of the Open XML formats is the capability it brings to long term archivability of documents. It’s important that we can access these documents we are creating 100s or 1,000s of years in the future, regardless of what happens to Microsoft. That’s why the stewardship of the Open XML formats by ISO could prove to be very valuable.

Now with this technology from Nuance Omni, archivists can scan existing print documents and store them as Open XML for future access. The British Library and the Library of Congress both served on TC45 and brought the perspective of long term archival into the standardization process.

More on Apple’s Open XML functionality

Here is some interesting follow-up to the news of Apple’s iWork supporting the OpenXML format. It looks like the anti-OpenXML folks are upset and have now started a petition to ask Apple to also support ODF (Apple currently supports OpenXML on the Mac as well as on the iPhone). Bob Sutor posted a link to the petition, and there was an interesting comment from Klaus-Dieter Naujok:

Guess Apple’s market research has shown that there are more MS Office documents (old and new formats) around than there are ODF based documents. I am not at all surprised that they are supporting what is mostly used today and therefore needs to be imported into their iWork application. As an iWork user that receives clients documents which are all in MS formats, not ODF, I have no problem with their decision. As I see it, ODF may be the only International Standard for XML based Office formats, and may be technical superior to OOXML, but that is not what bean counters take into consideration. Bottom-line for any software company is what the product will need to support based on market penetration, not how good a competing standard maybe, especially if its usage is minor at this time. I am sure should ODF become the dominating used format, causing sales in iWork to drop, that Apple will support it.

I think this is probably what you would expect to see from any company. They will add functionality based on what their customers want.

-Brian

 

OpenXMLCommunity.org Quote of the Day:

Patheon, IncCanada

“As a leading publicly listed provider of drug development and manufacturing services to the international pharmaceutical industry Patheon relies on electronic documents for records management, government submissions and our internal policies. For drug development, all electronic submissions have to be in XML format as it is the defined approach for the FDA & Health Canada. Having OpenXML as an internationally recognized document standard would be very beneficial to us now and in the future. We look forward to OpenXML becoming another recognized international standard with ISO.”

– Tom Ferguson

Comments (38)

  1. Andrew Sayers says:

    I completely agree that people need to be more willing to learn about Office Open XML before they form their opinions, but I’ve been trying to learn for over a month now, and to be honest it’s a lot harder than it ought to be.

    The Office team seems to have gone a different way to large parts of the community at several points in the past (e.g. we’ve discussed the DocBook vs. Office traditions of XML before).  People like me haven’t even heard Microsoft’s arguments before now, we just absorbed the other position by osmosis, and can’t imagine things being any other way.

    I realise you work very hard at getting your message across, but there’s a large chunk of the population where you really need to go back to first principles if you want to talk usefully with them.

    But anyway, in my continuing quest to get a clue, I’d like to ask: in your opinion, what does Microsoft want from the ISO process?  What does it stand to gain?  What would a win look like?

    Obviously you’re fulfilling a legitimate request from your customers, but for example, does Microsoft expect to enter niche markets springing up around the feet of the word processors?  Could greater competition in word processors themselves bring some youthful vigour back into a maturing market?

    – Andrew

  2. Dave S. says:

    Wouldn’t scanning and storing documents to PDF, a format known for layout preservation be better than OOXML where layout preservation is not guaranteed?

    It is likely the convertors will get better in the future, so why embalm the document content now? Instead, use a PDF overlay and, when the OCR process improves, reuse the PDF bitmap and re-OCR the document.

    The medical submissions use XML-schemas, not OOXML. Perhaps Tom Ferguson would be just as excited to learn that Word supports English and the FDA wants documents in English – a breakthrough?

  3. Ian Easson says:

    Dave S:

    Scanning and storing to PDF has been available for years now, from several vendors.  If it works for your needs, fine.

    However, consider that PDF is not open, not a standard, not XML-based, and the file is not editible without proprietary software.  None of this is the case with OOXML, which is open, standardized, XML-based, and editable by a growing number of pieces of software.

    It all comes down to what your needs are.  Adding OOXML to the list of possible outputs from scanning only serves to widen the possible choices, and is therefore, in my opinion, a good thing.

  4. Ian Easson says:

    Dave S:

    Your comment about the FDA and Health Canada that "The medical submissions use XML-schemas, not OOXML" suggests that you are not clear on how XML works.  

    An XML document is one that is compliant with an XML schema that is used to verify its compliance.  So, OOXML documents are (by definition) XML documents which are compliant with the OOXML schema (as documented in the OOXML standard from ECMA).  So, if someone sends an OOXML document to the FDA or Health Canada (or anywhere else, for that matter), they are sending a document in XML format, just as it says in the quote from Tom Ferguson.

  5. Ian Easson says:

    Andrew,

    It’s hard with a blog (any blog) to bring someone up to date with all the discussions and explanations that have taken place over the last few years.  When someone joins a blog for the first time, they are joining a long-standing discussion.  In the case of this blog, they are also joining what is in effect a course on a specific set of Microsoft technologies and their business motivations.

    In due course, there will be books and formal courses.  In the meantime, however, perhaps Brian could consider setting up a FAQ for new users, whose answers are just links to previous blog issues.  

    For instance, a question like "Why did Microsoft submit OOXML to ECMA?" would have as its answer the link to the blog issue that discusses it.

  6. Dave S. says:

    Ian,

    The FDA is looking for specific XML schemas. The Tim Ferguson quote gives the impression of a link between FDA required schemas and OOXML – a not yet existant link.

    http://www.fda.gov/cber/gdlns/esubapp.htm specifies the use of eCTD.

    This link is to the eCTD format – estri.ich.org/ectd/eCTD_Specification_v3_2.pdf

    You’ll note that the common file type for the narrative part of a conforming document is PDF, one of the graphics formats is PDF and only the structural part is XML. The eCTD specifically discourages non-common file types – xls, rtf, et al, to be used only on prior agreement, and then only for transition purposes.

    For ECG submissions there is the HL7 aECG XML format, which replaces the Phillips ECG XML format, so even for a singular use, XML formats are not simply interchangeable.

  7. Dave S. says:

    Ian,

    If PDF is not open then it is interesting that solutions are available from multiple suppliers – unlike OOXML which is truly available from only one supplier.

    Practically, the documentation for most PDF formats has been available for a long time, without notable encumberance. It might not be open, but it hasn’t been entirely closed either.

    Google finds a lot of PDF editors, some free. Not many OOXML editors are there?

    As to being a good thing – its rare that OCRs get everything correct. The best way to repair an OOXML file is to read it with the only OOXML editor, which can already read .doc files – a format long supported by the OCR software. This new output format doesn’t really add a choice, it parallels an existing one. At least with the doc format, already supported, there are other editors that can be used.

  8. Andrew Sayers says:

    Ian,

    A FAQ sounds like an excellent idea – Brian, why not put a link to it underneath "Email" in the "This Blog" section at the top of the sidebar?  It wouldn’t be that time-consuming if you just added questions to the FAQ as you replied to questions in future.

    That said, I’m not sure that it solves the problem I’m referring to on its own, as I don’t think the issues I’m talking about have been addressed.  My point is that people misunderstand OOXML not just because they’ve missed out on arguments from the last few years, but because they’ve missed out on arguments from decades ago.

    Think about it this way: if Microsoft proposed a standard that only worked on computers that used trinary instead of binary, or was designed to be implemented in a language that ditches functions for an elaborate GOTO-based system, you’d be horrified.  It wouldn’t help if they explained that it’s what customers want or that feature X is the most efficient trinary algorithm – you’d probably just assume that Microsoft were being moronic and/or deceitful.  In order to get you to understand (let alone agree), they would have to explain that trinary GOTO-based programming is a whole different model of computing, rather than just an abomination in a binary world, then discuss the practical advantages and disadvantages of trinary GOTO-based systems, compared to the only model of computing you’ve ever known.

    – Andrew

  9. Anonymous says:

    Dave S., Open XML is available from several suppliers, including Apple. This information was posted in this blog this very same week! My spider sense tells me you are a troll.

    Now, can scanning to PDF preserve spreadsheet formulas?

  10. Ian Easson says:

    Dave S:

    About PDF, you seem to be under the misimpression that scanning software which produces PDF output from a text document produces it in bitmap form embedded in the PDF.  It doesn’t.  (I know this personally; I use the Nuance program mentioned in the blog to do just that!)  Instead, it scans and interprets the input document or PDF file, and produces a a document containing text (plus optionally images in bitmap form).

    This output document can be in any number of forms.  The list for the Nuance Omnipage program mentioned is:  OOXML (for Word, Excel, and PowerPoint), PDF, rtf, xls, doc, txt, csv, WordPerfect, XPS, HTML, XML, audio (yes!), and InfoPath.

    So your ideas about how a PDF format is superior to OOXML because it preserves the bitmaps is all wrong.

    You can also see how this product’s support of OOXML allows repurposing of existing paper or electronic documents, so that the OOXML output can become part of a whole document production system (using software from Microsoft, other companies, or built in-house).  Once you realize that, you can see what a revolution that OOXML is going to enable.

    Also, I would like to secoond the comment from Anonymous that OOXML editing is available from multiple suppliers.  This editing can be manual, or programmatic — that is in fact one of the main reasons for OOXML rather than the old doc format.

  11. Ian Easson says:

    Andrew,

    I meant that in this blog in the last few years, issues have been covered that include design decisions that go back to the beginning of WORD for DOS.  

    So, a FAQ could cover things like why OOXML doesn’t use other document structure approaches that may (or may not) be technically superior for their customers.

    As for a discussion about why Microsoft doesn’t ditch everything it has ever done in the way of documents and start all over, it isn’t going to happen.  One of the main reasons that Microsoft succeeded is that it doesn’t abandon its users; instead, it makes revolutionary changes, but one step (release) at a time, so that it doesn’t get ahead of its users.  Consider that it began down the XML road for Office a decade ago now, and that with each release it has taken the next step in the voyage.

  12. Andrew Sayers says:

    If there’s been blog posts that cover all the stuff I’m thinking of and I’ve just missed them, fair enough.  A FAQ would be a good way of making that information more accessible – I think we’re in agreement on that.

    I didn’t mean to suggest that Microsoft should ditch everything it’s done and start over – I agree that’s implausible.  My point is that there’s a significant chunk of the population that can’t understand Microsoft’s position because they don’t understand the fundamental ideas that it’s based on.  Explaining Office Open XML to that group means not just explaining each tag slowly and clearly, but starting from "what is trinary, and when should you use it instead of binary?".  Again, a FAQ would be a good place to start there.

    Having said all of that, I’m not suggesting that everyone will immediately agree with Microsoft when things are properly explained to them, I’m just giving my thoughts about the steps that I’d like to see Brian take in order to let people form their own (informed) opinions.

    – Andrew

  13. Ian Easson says:

    Dave S:

    I checked out your statement that the FDA requires specific schemas. They say:

    "The following file formats should be used:

    – PDF for reports and forms

    – SAS XPORT (version 5) transport files (XPT) for datasets

    – ASCII text files (e.g., SAS program files, NONMEM control files) using txt for the file extension

    – XML for documents, data, and document information files

    – Stylesheets (XSL) and document type definition (DTD) for the XML document information files

    – Microsoft Word for draft labeling (because Microsoft Word can change, check our Web site for the current version)"

  14. Greg says:

    Still the patent uncertainty problems of OOXML remain. No one trusts the CNS or the OS-‘promise’. Microsoft gave us Baker&McKenzie but it does not really cover the legal issues.

  15. Dave S. says:

    Ian,

    Your list of formats is from http://www.fda.gov/cber/gdlns/esubapp.htm

    Looking at http://www.fda.gov/cder/guidance/index.htm#electronic_submissions one sees somewhat differnt information.

    For product labeling they request SPL, a specific schema.

    For general submissions it is http://www.fda.gov/cder/guidance/2867fnl.pdf, which, in section III "WHAT FILE FORMATS SHOULD I USE FOR ELECTRONIC DOCUMENTS?" says PDF and SAS (not XML or OOXML)

    The eCTD says to use a specific XML schema for its structure. eCTD describes XML – "XML files are read by a parser found in Internet browsers." Which does not sound like OOXML at all.

  16. Dave S. says:

    Ian,

    I have seen many documents where the graphics of the original scan overlay the OCRed text. In many cases the text is incorrect, but that is only known because of the graphic.

    The Nuance page lists "searchable image" as one of the PDF outputs. It appears to be the same function as I mentioned but, if not, then it’s called something else or Nuance is missing an important feature.

    OOXML support from Apple looks limited to importing. Apple does not generate OOXML, so it can not be referred to as an OOXML editor.

    Here’s a link to a blog post on what MS is doing for their own OOXML support on the Mac: shebanation.com/2007/05/16/ooxml-and-the-mac-more-bad-news-from-microsoft/ The Microsoft Mac pages don’t add much more.

    The Anonymous comment about PDF scans not preserving spreadsheet formulas is interesting. Perhaps OOXML has the ability to recognize information that’s not on a page and make it appear.

  17. Ian Easson says:

    Dave S:

    I think your confusion about OOXML vs XML may be because OOXML documents are packaged as a ZIP file, i.e., using the Open Packaging Convention (OPC).  Within that ZIP file are the actual XML files.  The main one is called Document.xml.  It is the XML document that follows the standard OOXML schema.  (There are other XML files there that are also defined in the schema).

    To check this out yourself:

    – Generate or obtain a ".docx" file (for Word 2007)

    – Save it somewhere

    – Rename its file extension from ".docx" to ".zip"

    – Open the zip file

    – Double click on the Document.xml file

    The file opens in your internet browser.  It has been parsed by your PC’s XML parser, according to the standard OOXML schema, just as eCTD describes XML as quoted in your post.

    I hope this clears up any confusion.  There is no proprietary or unusual thing going on here; it’s all using XML and XML schemas, just as required by several bodies.  That is why you are beginning to seee endorsements like the one originally quoted.

  18. ? says:

    Content-Type: image/png

    Content-Transfer-Encoding: base64

    iVBORw0KGgoAAAANSUhEUgAAALIAAAA8CAMAAAD17hDyAAAAb1BMVEX/2AD///8AJv8AAAGYmJeP

    jo4AEn6jo6Oenp7Gxsatra3X19eKiYSpqam2tra5ubnBwcHS0tLLy8uEhISAgIO+vr6ysrLqyBXO

    zs6dlW3V1dMAAQz/LwBIV7hiap7Dr0Xhwh7RuC7XvCjKszb7+/szKEtCAAAD10lEQVR42u2WjXai

    MBCF7Y2irtjVigVKVfx5/2fcO3HsHBK0uvXs1lNugcAwufk6TUh7Tw+nDrlDPo88mby9zefzoihe

    qDE1o5bL5W/qmXp9ff0lmlJD0UjUVw2OSqnFYhKPoUPMKT/CWGQDqH9kH5ovaK7Ih8k9kRfZIeLl

    EPdAVnMi0w6A9wMNMUaMDKDVE+KKAbwrUohr1lbjiXOuFTmhrkZOxLzniVll0A80RFuVQUu0epKZ

    vBBmnJALQr7LyeP9RMwqu1ZkDpDcgFx4ZPiJgTmAF8BXmQ0dAXhkiCWZMQXoCWA0AnUGOfO4djwJ

    MZHJ7ArniFy5ajarKI/smVlsIvM6HCZETkZJP6GIzCsv1IInzQ0ZH1UmNGuAJVgENJAxnWIIlgEj

    sAyeGXSF/KQtyDwbyAR+ceNqTOpZxSpXJ+SERU6mCaucnJDJ3O8nA/LyTFhlMmcXkKklRM0qe2Tq

    CmTSShtW2SNXzrlzyFZlRaauR9a5HCLLZFNkf8TIyiq4MXI1djKXr0YeUE3kwpZf68Sw5RdPDEPm

    ESAfuX1TfCw/507IVmUhDidGYsjRxLCPnCEHy88+crb8tMr6kTNkUXY4/5FzRVVx+fFBq6wfucby

    GxLYkI/LT5ETmt+0leCqrWRyYSup4q3k+aatZKIb9uyeyHGNdYi5r7IN8DfIh+4/uQ75JyG7o57c

    Tbbutncm91Vks3lAZKfgeq93GvJX95Hkbyzdt+ahVw1H1l9HNndp9dBG3zZeaJBNmGQe7WEed0XW

    sSi1pjSuChEtXR/t8BcLm/X9kRs8Kg1RmhROpRg5NDLruyAbsR0BXHjE8yLufb7vnZFtiYSrz1EN

    RBcmNZEtHFh/263EPdru59wP3LB7D6YOuUPukDvkDvmbIucZkOW9Oo1zfBAIgoA/AQtYE+deI+Am

    5DXyHGvrFdOdC/43ZCD3V+lWAqXcb7A+Bf3TaqvvmlXOU5SWIYE1b/SZD2xhplnGztJitSk1QdOi

    YXYZkObaRMjMLBk9mm+37ADUNVKj2+VY8V2ea9CLDWP1DtAMCfAX5Y0+p2ID7Sim7Kt2PkGyLa05

    zEqaTJt4+dXlChtPp6c2VCMosqAezSbfZHHYSw3N8nK2VHff0yZEVp/PkDXwCfIadVuYCpEvZlP1

    Zo21NheWn02MfY0y8C+xz5FaUCfGvjEakAPWY7ezjhGyZmtaMEwGP2m0AVo/ctsV9rb8dNJLUL3k

    XRpUWVeOBbYo9VnfWscAWbMjE83OSw+ljSF/7ZNDrfY9+QP9A90NebeWInQbdofcIXfIHXKH/EDI

    D6c/YgCDfQzBtPAAAAAASUVORK5CYII=

  19. Stephane Rodriguez says:

    Ian said "There is no proprietary or unusual thing going on here; it’s all using XML and XML schemas, just as required by several bodies."

    I beg to differ,

    http://www.codeproject.com/cs/library/office2007bin.asp

    In addition to this, since XML attributes are described, not specified, I wonder how many apologists have actually implemented it.

    Let’s be clear on one thing : it’s very easy to start a project that reads and writes XML. But a project that does anything of substance, like instantiating documents, is a different animal.

    So please, stop the bullshit and get back to work. You’re already a decade late compared to Microsoft Office’s fire and motion.

  20. Ian Easson says:

    Stephane,

    Please understand the context in which this discussion is taking place.

    It is not about legacy .DOC documents converted to OOXML, which might possibly contain macros (and which thus are not covered by the OOXML spec.).  Neither is it about accurately instantiating such documents outside of Office 2007.

    It is about newly created OOXML documents, probably in Word,  with no macros, and whether a small number of government programs (like the FDA drug program) would welcome such documents as being in line with their preference for XML-based documents.  By their own statements quoted in the blog above, they welcome OOXML.  That is their opinion.

    You may have your own opinion about OOXML’s suitability for this purpose, but keep any discussion clean and avoid ad hominum attacks.

  21. Stephane Rodriguez says:

    Ian said "It is not about legacy .DOC documents converted to OOXML"

    Unfortunately for you, it is EXACTLY the scope of OOXML. Microsoft claim is that OOXML is fully compatible with the legacy documents, it fully represent them.

    Ian said "It is about newly created OOXML documents, probably in Word"

    Again, you haven’t learned for past discussions we’ve had. Haven’t I showed one counter-example with VML for instance? VML is not specified in ECMA 376, it is merely described. Can you please me give an url of a third-party that implements VML?

    Ian said "You may have your own opinion about OOXML’s suitability for this purpose"

    As a vendor, I have more than an opinion. Microsoft poor implementation choices (and lock-in strategy, for those naive enough to think Microsoft are good guys) have an impact on my daily work. Only because it is so poorly designed, poorly described, poorly implemented (tons of regressions : my customers have spreadsheets than open and render well in Excel 97/XP/2000/2003 and don’t open and render well in Excel 2007).

    Ian said "avoid ad hominem attacks"

    It’s Microsoft who is the aggressor here. Open your eyes. They have missed the opportunity to come up with something useful.

  22. hAl says:

    @Stephane

    As macro’s and OLE objects are not part of the specification they can be embedded as binaries.

    So what.

    That is the same with ODF.

    That format also does not specify macros or OLE objects. So any implementation of ODF can also add those binary elements to implement macros and OLE objects within it.

    And if Microsoft wants to add a ton of binaries to a format they can do that just as wel with ODF as with OOXML.

    That is their implementation descision.

    As long as it does not affect he OOXML information in the files that is not really a problem.

    Would you consider it better if Micrsoft adopted ODF and started adding those binary objects to ODF files ?? As that are implementation descisions you can complain that Microsoft should implement ooxml with less use of binaries. That however is not due to the format but to Microsofts way of implementation. As other implementations are not likely to use the same way of creating files Office will still have to deal with being compatible enough for other implementations to use and to be able to consume documents created by other implmentations.

  23. Ian Easson says:

    Stephane,

    Let’s try this again.  Please read the comments above on this blog thread (about FDA drug submissions) before you come out with your out-of-context blasting about how Microsoft has complicated your life.

    If you are indeed involved in submitting drug documents to the FDA (which I doubt), then I will pay attention to your comments here.  Otherwise, you are only good for offbeat diversion.

  24. Stephane Rodriguez says:

    "As long as it does not affect he OOXML information in the files that is not really a problem."

    How can it not affect OOXML when you know that for VBA macros for instance, there are attributes such as "codename" which are saved and attached in worksheet parts themselves, and govern it back and forth ?

    This stuff is so poorly implemented there is no separation between the raw file format and application bits.

    Of course, anyone who has implement some of this stuff can also figure out that perhaps as much as 10% of XML attributes are application related, they have nothing to do with the data itself. Those application related values are used to pre-check boxes, or pre-select values in lists. It’s 100% application stuff, and shall at least be taken out of ECMA 376, or at least made a separate part.

    I’m interested to know who you guys are. You talk a lot, but I still don’t know on behalf of who or what company you are talking, and whether you have implemented any of this stuff.

  25. Stephane Rodriguez says:

    Ian said "before you come out with your out-of-context blasting"

    Hmm, who said "It is not about legacy .DOC documents converted to OOXML, which might possibly contain macros (and which thus are not covered by the OOXML spec.)."

    This is wrong. OOXML contains VBA macros, just that Microsoft merely references this stuff instead of specifying it. OOXML cannot NOT contain VBA macros otherwise their claim or "full backwards compatibility" goes under.

  26. n4cer says:

    The OOXML standard suports macros but there use is not required to generate or process an OOXML document. .docx, as discussed above doesn’t include macros. Its .docxm counterpart does.

  27. Dave S. says:

    Using the broswer on the document.xml file merely generated a run-on text. I would not want to read a large document this way.

    In looking for a docx file I used Google. It found <1000 files. A further search turned up ~71,000,000 doc/xls/ppt files and ~225,000,000 pd files. I’s say that pdf is the most popular.

    You suggested there was an FDA opinion in the blog. There is only an unsupported statement from a rep of a drug manufacturer. None of the information I’ve found indicates "all electronic submissions have to be in XML format." The structure of some submissions must be XML, but nothing OOXML can help with.

  28. Darkalias says:

    You’re not being fair, least in my case. Quote:

    "It looks like the anti-OpenXML folks are upset and have now started a petition to ask Apple to also support ODF",and talking about Apple: "They will add functionality based on what their customers want."

    No, I’m not anti-OpenXML, I’m pro-ODF, and yes, I HAVE asked Apple for ODF support in their products, because I really do work with, and share, ODF files right now. Many people I know do so as well on an increasing frequency.

    This, and no what you wrote, is my point.

    Might I also place my doubts that Apple included Office Open XML support to iWork 08 now because of customer demand only, while supporting ODF in TextEdit/Leopard later this year: I see the reason for this behaviour more in playing nice with Microsoft, and having MS Office for Mac also in the future.

  29. jones206@hotmail.com says:

    Darkalias, I wasn’t refering to pro-ODF folks, just anti-OpenXML. Sorry if you felt like I was going after you in any way.

    I am a pro XML formats guy, and in many ways consider myself a fan of ODF. I have no issue with people who are pro-ODF. My big gripe is people who try to use ODF as a weapon and are working very hard to block any adoption of Open XML.

    -Brian

  30. Andrew Sayers says:

    Hi Brian,

    This thread has gone a long way since it started, so I’d like to restate two important questions that have got a bit lost in the discussion:

    First, would you consider creating a FAQ or something like it, in order to collect all the posts on a particular topic, for newcomers to read in order to get caught up?

    Second, could you talk a bit about what Microsoft expects to gain from ISO standardisation?  For example, does helping Gnumeric somehow help create a new market for MS Office to move into?  (If you’ve answered this question before, please add it to the FAQ 😉

    These questions came from a discussion between Ian Easson and I (he deserves credit for the idea of a FAQ), and they’re discussed in more detail further up the thread.

    – Andrew

  31. Darkalias says:

    Brian, thanks.

    And my gripe is exactly vice versa, with people, or companies, who are suppressing or maybe even sabotaging the adoption and evolution of the existing ISO standard 26300.

    If Open Office XML, sorry, I’m always confusing this: Office Open XML delivers on its promises as truly open, broadly implementable, and technically sound format, I have no problems welcoming that on the other hand.

    -Darkalias

  32. Andrew Hilton says:

    Andrew,

    How about you read previous posts on this blog and others and create a proposed FAQ?  You’ll be doing yourself, and us, a favour.

    To answer your second question, search for ‘ISO’ in Brian’s blog.  The first to read might be ‘Thoughts on Open XML in ISO’.

  33. Andrew Sayers says:

    Andrew Hilton,

    Thanks for the lead about searching for "ISO".  That clears up a lot of my questions, although I’m still not sure how (for example) it’s in Microsoft’s interests to make Gnumeric a stronger competitor to MS Office.  Is it just because it’s a sign of a truly vibrant ecosystem around Office Open XML?

    Moving on to the idea of a FAQ.  To be clear, when I talk about a FAQ, I’m referring to the original use of FAQs: saving everybody time by removing the need to write a new answer for every instance of a question that’s asked frequently.  The modern use of FAQs (a list of questions the author thinks everybody ought to be asking) went out of fashion because they serve the same purpose as every other form of documentation, and are a bit harder to read.

    By building up a classic-style FAQ as he’s asked questions, Brian could give better answers in less time for common questions.  I don’t think it would work for me to build a modern-style FAQ on my own, but if Brian’s would be prepared to accept suggestions, I’d be happy to go digging on the issues I’m interested in.

    – Andrew Sayers

  34. jones206@hotmail.com says:

    I’m heading out of town for a week, but will definitely look into building up an FAQ when I get some time. I’d appreciate your suggestions for the first set… :-)

    -Brian

  35. jones206@hotmail.com says:

    Andrew, I missed you other question.

    For Microsoft, the main reason we voted in Ecma to take Open XML to ISO is that we had been asked to do so by various customers (mainly governments). They wanted to have our formats in the domain of the international community.

    In addition to that, recently we’ve seen IBM lobby governments to build policies requiring the use of only ISO standard formats. This is why they want to block Open XML from being an ISO standard, it doesn’t fit in with their strategy for their Workplace product. http://www-142.ibm.com/software/sw-lotus/products/product4.nsf/wdocs/productivitytools

    -Brian

  36. Dave S. says:

    http://www.marketwatch.com/news/story/state-state-microsoft-responds-assault/story.aspx?guid=%7BC0D943C4-4ADC-471C-8F87-9181A4EC3E7B%7D

    And that Massachussetts thing.

    Isn’t the legal term for a similar situation in a lawsuit "unclean hands?"

    I think the goverments would have been just as happy had the binary formats been well documented and put in the domain of the international community as, for example, there is likely little change to the Office 97 formats or application, et al, and there is no such limitation on Office 2007 or the not yet ISO MS-OOXM.

  37. Andrew Sayers says:

    Welcome back Brian.  Here are my suggestions for questions I’d like to see in a FAQ, as well as the style I’d like it to take, and in some cases even an outline of answers I’d like to see.  Where I’ve given answers, they’re mostly my take on your position, so I don’t expect you to use them as anything more than inspiration.

    Reading through your old posts, there’s one question I expect to be very uncommon, but which I’m dying to know the answer to: you said that short tag names are better because you parse tags with a trie.  Why not just use a suffix tree and have names as long as your arm?

    – Andrew

    <h3>About the site</h3>

    <h4>Who are you?</h4>

    <p>I’m Brian Jones, a program manager in Office. I’ve been working on the XML functionality and file formats in Office for about 6 years now.</p>

    <!– I realise this is at the top of every page.  I’ve included it here for the people that blank such things, and for completeness –>

    <h4>What topics are discussed here?</h4>

    <p>I mainly focus on XML in Office and the Open XML File Formats coming in the 2007 Microsoft Office system.</p>

    <!– Again, this is a copy/paste job, but I’d like to see this expanded.  What specifically do you talk about?  It seems to be mainly how-to type stuff with a bit of news thrown in.  You might want to link to Matusow’s Blog for people that want to talk politics –>

    <h4>What topics are not discussed here?</h4>

    <!– … –>

    <h3> Why the name "Office Open XML"</h3>

    <p>We were originally going to call it "Microsoft Office Open XML", to make the difference clear between this and Microsoft Office’s older XML formats.  We dropped the word "Microsoft" when we decided to make it an international standard.  This is much the same as the way OpenDocument Format used to be called "OpenOffice XML Format" before it was decided that it wasn’t going to be tied to a single product.</p>

    <h3>Why isn’t Office Open XML "proper" XML?</h3>

    <h4>Why doesn’t it follow the XML’s deeper design model?</h4>

    <p>There are some people who’ve played with other formats like HTML or DocBook that are curious why WordprocessingML doesn’t use that same model as either of those formats, and there’s actually <a href="http://brian_jones/archive/2007/07/11/wordprocessingml-document-model.aspx">a pretty straightforward reason</a>.</p>

    <h4>Why does it include non-XML metadata?</h4>

    <!– A few weeks ago, we talked about writing a post that outlined the little non-XML formats that go into documents.  Personally, I found the comparison with a web page to be a fairly convincing justification, but I’d like to see a post explaining the whole issue with a complete list of mini-formats if you’re going to mention in the FAQ. –>

    <h4>Why does Office Open XML look like previous Office formats, wrapped in angle brackets?</h4>

    <p>Because it works better that way.  Office Open XML needs to be as compatible as possible with older versions of Microsoft Office’s file formats, and the best way to do that is to use a format with a similar design model.  Since there are several million Office documents for every developer that’s ever worked on Microsoft Office, the only practical way of ensuring compatibility with that huge corpus is by making cautious, incremental changes.  Angle brackets are the increment we’re working on right now.</p>

    <!– Note: we talked about this a while ago, and this is how I think you should make your case, but I realise you might well disagree.  I’ve written the above in my own words, although I haven’t personally decided whether I agree with the position yet –>

    <h3>Why create a standardized file format</h3>

    <!– Note: IMHO, there are a whole range of questions that belong in this heading, and they all actually mean "please give me a model I can use for explaining Microsoft’s past behaviour and predicting its future behaviour, without resorting to elaborate conspiracy theories".   IMHO, this is the most important question to answer in the entire FAQ –>

    <h4>Why not use ODF?</h4>

    <p>This is a topic I’ve been coming back to since I started this blog back in 2005.  I’ve talked about <a href="http://blogs.msdn.com/brian_jones/archive/2005/06/13/428655.aspx">how the two formats differ</a>, how we needed better support for <a href="http://blogs.msdn.com/brian_jones/archive/2005/10/04/477127.aspx">formulas</a&gt; and for <a href="brian_jones/archive/2006/07/20/673323.aspx">tables in presentations</a>, <a href="http://brian_jones/archive/2005/12/14/503642.aspx">design requirements that OpenDocument doesn’t share</a> (<a href="http://brian_jones/archive/2006/05/24/605461.aspx">twice</a&gt;), and political considerations about why Microsoft <a href="http://blogs.msdn.com/brian_jones/archive/2006/07/14/666273.aspx">couldn‘t take a seat at OASIS</a>.  The ODF folks have done some great work since I wrote those posts, and it looks like ODF 1.2 will solve many of the technical problems.  However we <!– don’t like ODF’s solution?  Don’t think it’s compatible enough?  Think it’s too late?  I suspect one of these will apply –>, and anyway we still feel that ODF’s design requirements are too different to Office Open XML’s.</p>

    <h4>Why go to ECMA?</h4>

    <p>I covered this way back with <a href="http://brian_jones/archive/2005/11/21/495466.aspx">my original announcement that we were taking the format to ECMA</a>.</p>

    <h4>Why go to ISO?</h4>

    <p>Even though Office Open XML was already an ECMA standard, we felt that there was value taking it on to the ISO.  This was mainly because we had been asked to by various customers (mostly governments).  They wanted to have our formats in the domain of the international community.</p>

    <h4>What is your answer to GrokDoc’s <a href="http://www.grokdoc.net/index.php/EOOXML_objections">Objections</a&gt; to standardisation?</h4>

    <!– I’m not sure what you’d say here, but you might want to include a link to the <a href="http://www.computerworld.com/pdfs/Ecma.pdf">responses to NB comments</a> –>

    <h3>What can you tell me that will help me write better Office Open XML applications?</h3>

    <h4>What are the issues around licensing?</h4>

    <!– You might want to deal with this because you seem to get a lot of these questions, but it’s not something I’m interested in. –>

    <h4>What guides are available?</h4>

    <!– Again, this isn’t something I’m that interested in.  As well as your own how-to posts, you might want to include links to places like openxmldeveloper.org –>