Open XML in Science and Nature; Deploying Office 2007; and more…


Here are a few interesting links I came across this week:



  • Open XML in Science and Nature – Murray Sargent gives another update on the discussions we’ve been having with the folks from Science as well as Nature. They have some really cool publishing processes, and unfortunately we’re only now talking to them about how to integrate the new file formats and the new math functionality into their existing process. In the end we should see some pretty cool functionality, but it’s still a bit too early.

  • Deploying Office 2007 – The word team blog has a post with some details on how to ease the transition over to Word 2007.

  • More on the IBM support of OpenXML – Stephen McGibbon has a couple posts now talking about both the IBM article which describes how you can easily build solutions on top of OpenXML; as well as some information on potential OpenXML support in Lotus notes. The article on building solutions on top of OpenXML for some reason is no longer available, but Stephen has a copy you can still get to: http://notes2self.net/pages/stephen-s-cache-of-google-s-cache-of-ibm-s-openxml-document.aspx

  • Open standards advocate comes out in favour of Microsoft – The Bangkok post has an interview with Rick Jelliffe where he disputes the arguments currently being used against OpenXML.

  • Microsoft opens up its data format – Interesting article from the National Business Review that talks about the shift we made a couple years ago from proprietary binary formats to open XML formats.

  • Package Explorer 3.0 – Wouter van Vugt has updated his package explorer tool with added functionality for creating new documents and parts; signing documents; viewing signatures; and some loc work.

  • More on Word’s mediocre XML – Good writeup, and super valuable discussion in the comments section. I need to pull together some information on this one for a future post. Bob’s done a lot of thinking here and put down some really good information on the design of the WordprocessingML format. I think that a key area of confusion though comes from the initial design goals. The wordprocessingML format wasn’t designed to be the ultimate XML format for representing documents. There are already tons of formats out there that do just that. The purpose of wordprocessingML was to be an open xml format that could fully represent the existing base of Word binary documents. We wanted to take everything from that world, and bring it into the new world. That’s why I’ve always said that we have no issue with ODF, or any other format for that matter. ODF was designed to achieve different goals, and it’s perfectly acceptable to have both formats exist as standards. The work that DIN is doing to create translations is super critical for this same reason.

Comments (51)

  1. Stefan KZVB says:

    It seems you’re a real Open XML expert 🙂

    Could you please have a look at the DOCX-files on the following page?

    http://pschmid.net/office2007/forums/viewtopic.php?p=657

    Why do these documents have the wrong font when opened with the compatibility pack on Word 2000 but do open ok with Word XP?

  2. finite says:

    2 years… But it’s still very early.

    Sole purpose is 100% backwards compatibility… And yet interesting workflows need special help to upgrade.

    meh

  3. Doug Mahugh says:

    Professor Flavio Soares da Silva from the University of Sao Paulo has an interesting article on "Arguments

  4. Stephane Rodriguez says:

    "More on Word’s mediocre XML"

    For the life of me, I thought I would never see an admission of truth like that. By the way, who worked on Word’s XML?

    Let me add a few points.

    You can extend ‘mediocre XML’ to Excel and Powerpoint. In fact, you guys took the easy road (very strategic) and simply added angle brackets around binary formats. 15 years of legacy are not going away anytime soon. Just saying.

    So the formats were never designed with XML in mind. I wonder if this has anything to do with last week’s US ISO CHAIR COMMITTEE comments that are now made public (http://www.ibiblio.org/bosak/v1mail/). It’s an interesting event. Should such things happen only after you are being called out by authorities ?

    Also, when you said the new formats were created to be backwards compatible, you forgot two things :

    – the new chart drawing engine is incompatible with the old one. A ton of properties are lost when you migrate to the new file format.

    – the migration is new code and has plenty of bugs. I personally have to create workarounds for files that open well in Excel 97-2003 and not well in Excel 2007. And of course customers think I am the one being faulty here.

    In a nutshell, I have nothing against OOXML has a file format, but it should not be allowed to become an international standard since barely more than 5% is documented. (5% = 6,000 pages)

  5. Stephane Rodriguez says:

    You also forgot to mention that Office 2007 documents are extensions of ECMA 376 documents. And those extensions are not documented.

  6. Francis says:

    Interesting links… a thought on fields:

    I was disappointed to see last year that fields are not represented as true XML constructs. This will make it that much harder for other developers to capitalize on Word’s actually rather powerful fields (and integrate them, e.g., with content controls.) However, an even greater problem with fields is not that they are hard for developers, but for users. Most users don’t use them, even for long, complex, and should-be-structured documents.

    One reason may be that even when users know about fields, they are hard to use. The way fields work now is like Excel before auto-recalculation: you have to press CTRL+A, F9 to update them. This means a lot of errors in printouts: from incorrect page numbers in TOC, to wrong headers in STYLEREFs, to broken/old content in INCLUDETEXT/PICTURE, to "Error!" messages in cross-references. In fact, the linear nature of fields means that sometimes pressing CTRL+A, F9 once is not enough (because updating one field can throw, e.g., page numbers in all others off.)

    I don’t see any reason to XML-ify fields now. If/when, however, Word does start to represent fields internally with a live, relational data structure, changes in the file format will probably be necessary. That would be a great point to XML-ify fields.

  7. jones206@hotmail.com says:

    Stefan,

    I’ll take a look. I think I remember there being some issue around updating the fonts in older versions but I’m not positive.

    ——————–

    finite,

    The product has only been available for about 6 months. If you read Murray’s blog you’ll see that the issue right now isn’t the file formats, but it is the new math support. There is a brand new math engine in Word, and that functionality isn’t supported in older versions of Word. Since both Science and Nature have their workflows built around Word 2003, they’re still trying to figure out how to best support the new math.

    ————–

    Stephane,

    I was one of the folks who worked on the original WordprocessingML from Word 2003. It was absolutely designed with XML in mind, but not with the traditional DocBook/SGML view of document publishing. Traditional Word documents are not at all structured. Even things like Headings are just formatted paragraphs. The wordprocessingML format is an XML format that maps to that flat structure. Content controls and custom defined schema are the way we enable people to add additional structure.

    To your other point, other than the things we’ve already talked about like VBA; document encryption; and some markup around application specific UI everything is in the Ecma spec.

    ————————-

    Francis,

    That’s the way we thought of it. The field support has been around for a long time, and has behaved the same way release to release. The XML storage of those fields is pretty similar to how they are in the document. They aren’t a structured region, but instead a place where field code instructions can be inserted that specify the behavior.

    If at some point in the future we decide to make fields more structured, I would imagine that being something we’d work with Ecma to update in the file format as well.

    -Brian

  8. nksingh says:

    Stephane:

    The US ISO committee chair (the chair of INCITS V1) is Patrick Durusau, a founder of the OpenDocument Foundation and one of the architects of ODF, so you would expect him to have a thing or two to say about the "quality" of OOXML.  

    I can guess that you know this (seeing how vocal you are with regards to the whole standardization effort), but chose to omit this piece of information to make him seem more "authoritative."  I have yet to really understand why you’re so rabidly anti-OOXML considering that you are personally familiar with the pain of parsing the old formats.  Are you that afraid that your product will have no market??

  9. Stephane Rodriguez says:

    nksingh,

    "you would expect him to have a thing or two to say about the "quality" of OOXML. "

    Indeed. Isn’t it what I’ve said?

    Mr Durusau has recently explained that OOXML is documented at the syntactic level, not at the semantic level. This is enough to stop the ISO standard process.

    And by the way, this is one thing I have been saying for almost a year. Just that, being a third-party, Microsoft chose to ignore. I have also complained about the fact that Microsoft generally ignores third-parties to their own advantage : they’ll happily say the automating Word/Excel/Powerpoint on a server is perhaps not reliable (or too slow), but what they don’t say is that there is a ton of third parties out there to fill this need. (I am not talking about me specifically here). A reality check does not hurt sometimes…

    "I have yet to really understand why you’re so rabidly anti-OOXML considering that you are personally familiar with the pain of parsing the old formats."

    Because I am but a few who have really IMPLEMENTED the specs. You know, I have two products, and one of them does UNDERSTAND the format, it’s not only reading angle brackets contrary to many of the projects starting here and there (that’s why XML is such a marketing joke).

    Even Mr Brian Jones who will speak at length about this thing hasn’t. (correct me if I am wrong)

    In fact, none of the public voices from the MS Office blogs have. That’s what makes it so ironic.

    But in his latest post where he admits "mediocre XML", he’s admitting the sin. That’s a start.

    "Are you that afraid that your product will have no market??"

    You don’t understand. Microsoft is leaving huge gaps for me and others to make business in niches.

  10. Stephane Rodriguez says:

    Also on "I have yet to really understand why you’re so rabidly anti-OOXML".

    Didn’t I say just the opposite in my first comment? I said I was not opposed to OOXML as a file format. Only as an international standard, because it has no technical merit (no XML in mind), and only the fact that there is no other product than Office 2007 which can instantiate those documents in full fidelity should be a hint.

    I note that it’s the second time in your comment that what I said is taken as exactly the opposite. I don’t know what your level of English comprehension is, but let me tell you : "potato is a fruit". Hope that helps.

  11. Bruno says:

    Stephane, when you spout nonsense like ‘the 6000 page OOXML spec only documents 5% of OOXML’, you lose all credibility.

  12. Stephane Rodriguez says:

    What is your counter-argument?

    Can you provide evidence that 95% of what’s actually needed to implement the specs is there?

    How much of the specs have you implemented?

    Also, when you attack people, put your real name. If you have balls, that is.

  13. hAl says:

    @Stephane

    Actually I found that ODF seemed to lack a lot more to implement it’s spec than OOXML. In it’s basic spec it allows for very weird combinations of code to implement.

    Allthough OOXML certainly isn’t ideal and could still use a lot of improving it seems much more structured for implementation than ODF actually is.

    As it stands for instance I can forsee that even in 5 years it won’t be a problem to implement ODF which is conforming with the specification but which won’t translate to a document correctly in most implementations because implementations are not capaple of dealing with complexity.

    Basically what both specs need is reference implementations and examples because frankly with just a spec to go on there is now way either format will lead to interoperable implementations.

    At the moment it look like ODF implementors are using OOo as a reference (at least for the parts OOo already implemented) and that OOXML implementers will use MS Office as a reference.

  14. Stephane Rodriguez says:

    hAl,

    Why bring ODF in the game? All my comments are related to OOXML. It’s amazing that a number of apologists (correct me if I am wrong) always use ODF when it comes to finding excuses to counter the crap that OOXML is. Come on, you are ready to do anything to avoid talking about OOXML. In a way, I understand you, it’s ugly, and gets only uglier (15 years of legacy). Have you read Bob Ducharme’s comment?

    Now that we are into this, here is how it works IMHO, if we digress a bit into ODF versus OOXML.

    The resulting file formats are Word/Excel/Powerpoint/Access formats for Office related activities, aka productivity software. Although Microsoft will insist that what the open source projects do (OpenOffice, AbiWord, …), and what they do, is different, it is really targeted to the guys in the IT department who don’t know better. Of course, OpenOffice and MS Office ARE competitors.

    With that said, the underlying file formats themselves were not designed the same way. OOXML is just binary surrounded by angle brackets. It is as simple as that. It’s easy to migrate binary formats to OOXML for this reason. In case you wonder, that’s what I do in one of my products. And, with no surprise, that’s all Microsoft is saying when they mean backwards compatible (even though, I pointed out this is not entirely true either). What they don’t say is that they are not starting from scratch, they are using the actual codebase. Anybody starting now from scratch in in it for a number of years of work. The syntactic level is documented (pretty much, with many typos), the semantics isn’t. Note that this comes from my own experience first (diffopc, xlsgen, plus other past projects).

    In particular, I believe ODF is allowing not just ease of support, auto-discoverability, but also expandability. There are other scenarios, but I’ll stick to those because they are very strong and put the developers in control rather than the opposite.

    When a file format is easy to support, the parsers and accompanying stacks don’t put the burden on developers. See the US ISO CHAIR COMMITTEE comments. Here is an interesting one : http://www.ibiblio.org/bosak/v1mail/200706/2007Jun05-132133.eml”>http://www.ibiblio.org/bosak/v1mail/200706/2007Jun05-132133.eml

    When a file format is auto-discoverable, and you implement a read routine (let’s say : you are reading a spreadsheet that was sent to you, and you have no control on what’s in that spreadsheet), you can read and process only what you want. But with OOXML it is the opposite. I give one example of that, let me know what you think : assuming you read an Excel 2007 XLSX spreadsheet that was sent to you, I say good luck to find the color of a cell without handling themes (among other things), which itself involves complicated and undocumented overriding rules, complex mathematics such as colors expressed in undocumented coordinate spaces, or using interesting attributes (tint, stuff like that).

    It’s clear the burden OOXML puts on developers. At least to me, your mileage may vary.

    Now, if you don’t understand the spreadsheet, in other words if you don’t do anything MEANINGFUL with the spreadsheet, then you can read as many angle brackets you want, write it back, and claim you support Excel 2007 spreadsheets ! But is it really the case?

    That’s one of the marketing tricks (the XML joke).

    See how Microsoft never talks about features such as instantiating documents (which includes  rendering, calculating, …). Have an idea why?

    Could it be because all you are being given is a specs that forces you to suck your work into actual MS Office application instances. Thus you are helping Microsoft sell MS Office licenses (ie one of their two cash cows).

    See how this hasn’t changed a bit the agenda of old binary formats. Want to take a bet on how this does not get changed at all in the future? (let’s see how how many years it takes OpenOffice to accurately support 20% of the features of OOXML).

    In fact, Microsoft themselves gave earlier this month the perfect case for me, in the form of their new .NET 3.0 based Open XML SDK. Have you visited forums with developers trying to get something done with it, and ALL OF THEM eventually giving up ?

    Interesting eh. No big surprise though, reading angle brackets has nothing to do with INSTANTIATING documents.

    Microsoft has no incentive to provide a powerful SDK (file format) in the hands of someone out there. If they did, that would hurt their revenue growth related to Office server licenses, and with future integrations.

    Microsoft can only distribute an extremely limited SDK. Anything else is suicide.

    Please don’t take my words though, just read ISO comments : http://www.ibiblio.org/bosak/v1mail/

  15. hAl says:

    I do not exactly see how your example makes OOXML differ from ODF or why that should be simpler. Also I would think dat in a spreadsheet if anything you would look for data in cells and I do not think that is a lot easier in ODF and I still do not know what is supposed to happen when in ODF you make the data in a table cell differ from the data in the cell property.

    I find a lot of things easy in OOXML because of it structured aproach by which you have a treelike structure which is also defined in the spec. So implementing a part of the spec only required a particular branch of the tree supported whereas it seems ODF has less of such a structure and needing most of the spec supported for only a particular document type for instance.

    In OOXML I like the OPC but I find the VML inclusion ridiculous but as a whole the specs is fairly acceptable.

    ODF has a nice clean setup with little burden of legacy stuff but is complex to implement and if MS were to implement it it would probably produce a simular size spec of just extentions to ODF which would not help anybody.

    And on those comments:

    You call them ISO comments but it seems you could also say that they are OpenDocument Foundation comments. It is like naming Rob Weir comments on OOXML ANSI comments while me and just about anybody else would see them in the context of Rob Weir portraying the IBM point of view. I am not calling Brian comments ECMA comments but Micrsoft comments and so on.

    Actually the amount of people with an agenda in these kind of public committees startles me.

  16. Stephane Rodriguez says:

    hAL,

    I think you understood very little of what I said.   I talk semantics, instantiating documents, and you respond by saying you like the tree structure. Don’t you think the tree structure comes with the XML territoery anyway. This gigantic crap all-in-one crap that Word2003 ML is has a nice tree structure too…

    "In OOXML I like the OPC but I find the VML inclusion ridiculous but as a whole the specs is fairly acceptable."

    That’s not what you should be saying. What you should be saying is : I refuse to allow a file format including a markup (VML) whose semantics is not documented and whose rendering is tied to a specific platform (Windows). If you don’t see what I mean, just read BillG memo to the Office team back in the good days : http://antitrust.slated.org/www.iowaconsumercase.org/011607/2000/PX02991.pdf

    "You call them ISO comments but it seems you could also say that they are OpenDocument Foundation comments."

    You are in denial and have obviously not read them. Again, do yourself a service and read them.

    Rob Weir and al is not contributing much other than what he has already posted on his blog (and you know that too, since someone going with the same nickname than yours has been posting plenty of times there).

    Read other comments. If you don’t understand, then I am curious to know what is your interest  in this after all. Are you a troll?

    "Actually the amount of people with an agenda in these kind of public committees startles me."

    If you are talking about me, I call BS. I am a single-man vendor, so give me a break. What I say on this blog is based on my own experience on implementing the specs. I am pretty sure the owner of this blog is willing to accept comment and things like of that too. But if all you have to come up with is rant with no argument, then please go read this kind of post like this one :  http://blogs.zdnet.com/microsoft/?p=511. Apparently it was written for you.

  17. jones206@hotmail.com says:

    Stephane, is your issue with the documentation, or the format itself?

    The format does not convey semantics any more than the application itself (though the object model or UI) conveys semantics. Word has the concept of paragraphs, tables, formatting, etc. Excel has the concept of rows, columns, equations, pivot tables, etc. All of that information is represented via the XML, and all of the XML is fully documented. If you want semantics beyond what the applications themselves support, there are some options though including custom XML and content controls.

    If your issue is with the documentation, what more are you looking for? Are you refering to the things that aren’t part of the standard like encryption and macros? Or is there something else you’re trying to get at?

    Lastly, while I know you don’t want to talk about ODF and OpenXML together, that’s what the current issue is. IBM and others are pushing for legislation that would require ODF and block OpenXML. What do you see in the ODF format that you like better than the OpenXML format? Or do you think both formats should not be standards?

    -Brian

  18. hAl says:

    May by it is not so easy to understand what you ment because english isn’t my native language. I took your words as meaning that you lacked a structure to implement stuff in the format specs.

    However I have read the particular mail you referred to earlier which compares Wordperfect and Word sturtures and I also read the blogpost and reaction Brian is linking to.

    From that blog a reaction from Rick Jeliffe,  who has a pretty good understanding of XML documents formats and OOXML, and who states some on the way an OOXML document is defined:

    "Open XML’s syntax is indeed odd at first, easily enough material for a year’s worth of blogs :-), but I have found that there are usually reasons: Open XML has been made using completely different tradeoffs than, say, DOCBOOK has, and consequently looks different.

    First, on the superficial syntax. Remember a few years ago when Michael McQueen was saying that the trouble with attributes was that you couldn’t have structured attributes, and the SML people were saying that there was no difference between an attribute and an element and that we should reduce our use of attributes to a minimum? That seems to have influenced MS’ design choice behind their properties. They have systematically adopted a "head-body" approach to have properties in elements (this is hardly a new thing: I wrote about it my 1998 book): there is a consistent naming convention of a "Pr" suffix used throughout.

    However, they also have the HTML-inspired approach that element content should only have searchable content in mind, so that searching doesn’t need to be schema-aware. (With the slight complication that you raise, that deleted text sections and fields still use data content, and the use of numeric indexes to shared string tables in SpreadsheetML.) Then they have decided against using mixed content, again influenced by the SML propaganda but also because it resolves one issue for documents loading into relational DBMS.

    Now I never cared for the SML ideas much: but the combination of allowing structured attributes, schema-less searches, and easy loading to DBMS are entirely respectable choices it seems to me. Which is not to say that DOCBOOK or ODF should adopt the same goals."

    "Fourth, there a difference in the design level too: as far as I can see, what MS were trying to do is to take a *completely* linear format and allow arbitrary interleaving of custom XML as the mechanism for *all* structuring. Office 2007 doesn’t do any structural implication that I know of (though I am not an expert in it.)

    So saying Open XML is like RTF-in-XML is not unfair, though to say that Open XML is *only* RTF-in-XML would be unfair. Nor would a comparison with HTML (a linear format where structures can be made by the user with DIV and SPAN.)

    Open XML is an "open" format in the sense that the zipper on a flasher’s pants is open: you may not like what you see, it may be less or more than you were expecting, but the functionality is exposed unadorned for all the world’s education: whether you are repelled or see opportunities is your business 🙂 The aim of Open XML is to expose everything that goes on inside Office 2007 not to mediate it according to some abstract/ideological view of the perfect document.

    So, in Word, a document is a list of blocks, and a block is either a list of runs or a table. Consequently, in WordprocessingML, a document contains a sequence of <p> or <tbl> elements, and a <p> contains a sequence of <r> run elements, which may contain a sequence of <t> text runs and diagrams etc.

    The radical thing MS have did was to take an interleaving approach to structure: you can open any schema, and use this with a context sensitive editor (in Word) to wrap blocks, runs, rows and cells with "custom" elements from that schema. The schema is used to provide syntax direction, but not for subsequent validation; the created WordprocessingML document can still be validated against its usual schemas because the custom elements are marked up with one level of indirection, as values of customXml elements in the word-processing space. Now at the moment, this is not fully baked: you cannot key styles to customXml elements as far as I know: but the aim is to expose what Office 2007 does not what it *may* or *should* do!

    In this way they are trying to turn the linear format from a flaw into a strength: if they had structures in place already (sections, lists, headings) they would have to figure out how not to clash with custom XML structures (which is a problem I expect ODF would have.) "

  19. Ian Easson says:

    Stephane,

    Please understand that no one reading this blog has a clue what points you are trying to make.  Maybe it’s just me, but all that I can get out of what you have written is that the OOXML standard has no "semantics", and in your opinion makes it too difficult for developers to write code to render the document.  Pardon the pun, but no one knows what *you* mean by "semantics"!

  20. Stephane Rodriguez says:

    Brian,

    "is your issue with the documentation, or the format itself?"

    Both. The documentation is so bad (sorry to the guys who worked on it) that I had to develop diffopc to do any actual work. There are many typos by the way, which means 1) ECMA hasn’t done their homework 2) Nobody implemented it.

    The format. As I said, the format itself I don’t oppose. It’s been roughly ten years I am smelling the crap off binary formats, so a couple more is not changing the game. Pushing it to international standards is a whole different story, however.

    What would you object to a principle like this : several specs made of ten or more pages which describe the infrastructure, then more specs describing the objects showing that there was thoughts in it, not just a rehash of the 15 years of legacy. An example : there is at least 6 different and incompatible ways to describe text formattings (overall). Don’t you think a proper standard would stick to only one? Why did you hurry up the shipping of Office 2007 instead of fixing this legacy stuff once for all?

    Then you have all what I mention above (disclaimer : I don’t claim to know it all, whatever ; pretty much what I say is already said out there, just my voice is of a real implementer, not a theorist) : ease of support, auto-discoverability, expandability, …

    "The format does not convey semantics any more than the application itself (though the object model or UI) conveys semantics."

    Oh it does. That is exactly the difference between programming access (i.e. reading kilometers of angle brackets without understanding it), and instantiating it. Take the CleverAge plugin. With programmatic access alone, you get what they have today, something so utterly destructive that everyone including Microsoft should be embarassed to even mention it. On the other hand, with instantiation, they can pretty much repurpose objects from the Word 2007 land into OpenOffice land (or some other program), and vice versa. It’s hard work, and a proper specs (usually short and conclusive) makes hard work less a hard work, lowering the barrier to entry.

    "If your issue is with the documentation, what more are you looking for? Are you refering to the things that aren’t part of the standard like encryption and macros? Or is there something else you’re trying to get at?"

    First of all, I would have liked admission of truth. I have been posting the exact same thing  for months and all I got was 1) denial 2) continuous marketing lies such as "fully documented", "backwards compatible".

    Second, if instead of linking to people who keep from starting new projects (reading angle brackets at envy without understanding much of it is an increasing trend, if your blog posts are to be believed), you would point to people doing real implementations. That’s where it would show strength, weight and even perhaps the beginning of a leg for becoming a standard.

    The new .NET 3.0 Open XML SDK is exactly the opposite way. Tied to Windows, does pretty much nothing worth.

    "Lastly, while I know you don’t want to talk about ODF and OpenXML together"

    Not entirely true. The other guy was steering a technical debate on OOXML with words about ODF. I thought it was just steer to avoid talking about OOXML and I called BS on that.

    "ODF format that you like better than the OpenXML format? Or do you think both formats should not be standards?"

    Based on my observations, ODF seems to be designed for the future, while OOXML is from the past (ie binary formats). It’s like a free-flowing document layout algorithm versus a fixed one. I’ve talked above about something like avoiding to put the burden on developers when it comes to how parsers are implemented. Have you read this comment? http://www.ibiblio.org/bosak/v1mail/200706/2007Jun05-132133.eml

    You keep coming with custom XML very often but there is a constant confusion between XML as a serialization format, XML as a generic (or vendor-specific data source). And, if you take Office 2003, you already had custom XML, and you already could get a pointer to a IStream interface of your OLE document and do replacements. What’s so magical with Office 2007?

  21. Stephane Rodriguez says:

    Ia,

    "Pardon the pun, but no one knows what *you* mean by "semantics"!"

    If you try to implement the specs, you’ll soon realize what the semantics is about.

    In a nutshell, it’s programmatic access versus document instances.

    Someone at ISO said it better than me here : http://www.ibiblio.org/bosak/v1mail/200706/2007Jun05-132133.eml

  22. Ian Easson says:

    Stephan,

    Sorry, I don’t have the time to try and implement the specs in order to understand what you are talking about.  To call it "programmatic access versus document instances" is no help either — that’s another total opaque phrase that just does not mean anything to me.  

    However, I did carefully read the document you referred to.  What I can make of it is that the people writing it have a background in theoretical models of what an ideal document should be — e.g., it’s a heirarchy with rules such as "Heading 2 is subordinate to Heading 1".  By their own admission, they have a great deal of trouble understanding or appreciating OOXML because it does not have such a fixed ideal model of what a document should be.  That is because the application that ultimately spawned it (MS Office) has no such ideal model built-in, and the over-riding design goal of OOXML is to be able to faithfully reflect the contents of those existing documents.

    If that’s what’s bothering you too, then there is no way of possibly satisfying you.  (The billions of existing documents are not going to go away, and the market need for OOXML, i.e., for an open standardizied XML format that can faithfully reflect their contents is a real one.) Am I correct in my understanding?

  23. Dave S says:

    Ian, et al,

    What prevents those many of those billions of documents from going away?

    I thnk they are going away – partial or total loss via incompatible formats is only the tip of the problem. Poor back-up and fowarding procedures will destroy many more, as will simple deletion due to lack of relevance or company policies. More photographs have been taken than all the documents made in Office and yet most photos are lost to simple physical processes or outright deletion. I may have some wonderful documents on 5.25 inch floppies, but no drive to read them with. I had files on 9 track tape. Now, no reader – no files.

    It is also curious the question of what faithful representation means. Since XML is not a binary format, then any software that depends on the prior format will see any XML format as not faithful.

    If it means the on-screen or on-paper depiction is unchanged, then PDF or bitmaps offer a much greater chance of faithful depiction.

    If it means the application that uses the format is the same, then Office 2007 is certainly not faithful – significantly designed to be different from prior efforts.

    There is a lot of syntactic documentation out there that is practically unusable. I’ve not read the OOXML spec, but it comes from a source similar to that of specs I have read.

    The problem is this – if the documentation does not describe what a feature is intended to do and why it is inteded to do so, then any details about the feature are almost meaningless.

    Psuedo-Example:

    Red – Red is the color key for the red menu selection.

    Color Key – this is the key color for the menu.

    Menu – menus have color keys.

    If one can’t find out a need for a color key, then one can’t understand what it means in the context of the red menu; one won’t certainly appreciate the meaning of Red in that context.

    Now onto a tangent:

    By-the-by, one and all, on the slippery slope of words, when did the the opposite of create become consume?

    I have looked at many definitions for consume and see none that really apply to opening/examining a file for its contents. Ordinary consumption results in the incorporation of the consumed item into the consumer (or the consumer’s possession,) usually with drastic effects on the consumed.

    Where, in software-ese, did the idea originate to take a word practically means ‘eat’ and turn it into a homograph for ‘read and understand’

    Given that Word, eg, does not incorporate any document file into itself (virus laden aside) and the file is not destroyed and the file is not necessarily altered by reading it, how can it be ‘consumed?’ One might argue that an idea can be ‘consumed’ by a person, but that is covered by the common, somewhat inaccurate  idea that the idea alters the consumer; ‘Consumed by rage’ for example, is really where a person is the consumed and rage is the consumer.

    It’s noun-buddy, consumption, was Pulmonary Tuberculosis or "Consumption is the using up of a resource," which computer programs generally do not do to files.

    That written, I’ve consumed all the time I have for this trip down wordsmith lane.

  24. Francis says:

    Stephane,

    If you have been having problems with Open XML, why don’t you help document the errors and workarounds/solutions you have found? That would be a great help for other implementers as well as for Microsoft in fine-tuning the standard and preparing future versions.

    Take a look at this web site to see what I mean: http://www.xmlopen.org/ooxml-wiki/index.php/DIS_29500_Comments

    Along with constructive and pointed criticism, it is a wealth of information on the typos that unfortunately made it into the spec and may [understandably] frustrate developers like you.

  25. Stephane Rodriguez says:

    Ian,

    "Sorry, I don’t have the time to try and implement the specs"

    So then, how legitimate are you in your criticism ?

  26. Stephane Rodriguez says:

    Francis,

    "If you have been having problems with Open XML, why don’t you help document the errors and workarounds/solutions you have found? That would be a great help for other implementers as well as for Microsoft in fine-tuning the standard and preparing future versions."

    You want me to work for free? Really?

    And you really think Microsoft is willing to update their file format to make it better for consumption? I wonder what fantasy land you are living in.

    I oppose the international standard, not the file format. For freaking sake, this is the third time in this thread that I say it.

    "it is a wealth of information on the typos that unfortunately made it into the spec and may [understandably] frustrate developers like you."

    My fix was to develop diffopc.

    You don’t get my point. That so many typos was no problem is evidence that 1) ECMA did not do their homework 2) nobody implemented it.

    Should a file format that only one vendor implemented be pushed as an international standard? Don’t you think this goes to the detriment of the legitimity of ECMA and ISO, who should not be the bitches of private interests?

  27. Wesley Parish says:

    Brian, (sorry to interrupt, Stephane, I’m afraid I have to agree with you 😉 if you want an idea of the respect Microsoft’s formats are generally held in, you might like to wander over to this web site:

    http://www.jsware.net/jsware/msicode.php3

    I was looking for an implementation of the MSI installer for Linux and Solaris, and Google pointed me in that direction.  So I found this:

    "Windows Installer (WI) refers to using MSI database files as the "housing" for a software installation. An MSI file used to install software through WI contains the software install settings and usually contains the software itself, packed inside the MSI. Unfortunately, the Windows Installer system is extremely – even bizarrely – complex. It uses an MSI database that contains approximately 80 tables, with extensive cross-referencing between the various columns of those tables.

    "The structure of MSI databases, when they are used as Windows Installer installation files, is so complex, convoluted and poorly designed, with data so heavily cross-referenced – and the available tools are so limited – that few software developers using WI actually create their own installation files. The basic software installer tasks of creating simple dialogue windows, copying files to the system, etc. are daunting challenges under the WI system. Windows Installer is so difficult to use that most software developers who ship MSI installation packages build them with some kind of 3rd-party software built on top of the WI system. And some of those 3rd-party systems, such as InstallShield, just cram their own installer EXEs into the MSI so that they can provide whatever functionality they want to while still honoring Microsoft’s MSI "standard"."

    Might it be that there is a genuine problem with Microsoft’s various formats?  I mean, I had to go to Port 25 to suggest a way to "defang" the early ActiveX dependency in what is now ECMA 376.  If I had to go to a totally different part of Microsoft from Office to get a worrying malware vulnerability in the original MS OOXML specification worked out, then there is definitely something wrong with the specs and I’d suggest Microsoft slow down and do rather more work on it, and allow independent implementers to work out all the bugs.

    That’s not politics, it’s just good sense.

  28. jones206@hotmail.com says:

    Hey guys, the discussion here seem to be all over the place. 🙂

    They are very good discussions, but it’s hard to keep up.

    I think what I’ll do is pull together a post that goes into the actual content model of the wordprocessingML format. That should help clear things up. The short of it though is that the formats were absolutely designed with XML in mind. They just weren’t designed to fit into the SGML/DocBook model of what a document is. They were designed to match Word’s existing content model, which is why we stress how important backward compatibility was for us. We weren’t trying to create the ultimate general purpose XML formats. We were creating an XML representation of Office documents.

    Wesley, I think in any spec you’ll see bugs. It wouldn’t have made sense to delay the standardization of the Ecma spec for misspellings, etc. All specs have bugs. Heck, look how many versions of ODF there are out there already (and it still has huge holes they they are working to fill).

    I’m not sure what you mean about the Port 25 thing. The ActiveX issue was brought up by one of the TC members, and we all decided to change it. There was a public alias though that everyone was free to use though, and we were pretty vocal about that. We released public updates to the spec every couple months during the standardization process and had the public feedback alias going as well. That will continue to be available as we move forward with future versions of the spec too.

    -Brian

  29. Stephane Rodriguez says:

    Brian,

    I beg to differ. It’s been mostly a one-way discussion. Surrounded by a couple of shills who apparently don’t even have an interest in implementing the specs.

    I think it’s a bit disappointing you chose to take the innocent child party line. You don’t need to "clear things up". Everything is clear as water. Microsoft is trying to push a format for which they and only they have the secret sauce. The ECMA 376 specs is a JOKE for anyone who tries to implement it. And no clearing up will go against that argument. Of course, if you did not implement it yourself, all the wrong conclusions follow.

    I also note that you chose to ignore the examples I gave, for instance the actual algorithm for calculating the color of a cell of an arbitrary spreadsheet being sent to you, and the ECMA 376 specs in hand.

    Why don’t you answer?

    You don’t answer general principles (only 5% of what’s needed to implement an Office alternative is documented). You don’t answer examples either. So, in short, what do you answer?

    What is the purpose of this blog?

  30. jones206@hotmail.com says:

    Stephane, if you like the design decisions of ODF better, than feel free to use that format.

    You keep talking about the Ecma specs being a joke, and that they only contain 5% of what’s needed for implementation. Could you point me to a file format specification that is closer to what you’re looking for? The ODF spec has even less information than the OpenXML spec, so there must be something else you’re expecting…

    I gotta tell you man, I don’t really know what you’re looking for. Maybe it’s just me, but I have no clue what you want. Do you want Ecma to stop the ISO submission and maintain the ownership of the spec? I would think that having ISO own the spec would be a good thing. Do you just want more documentation?

    I feel like I’ve been very open and honest with you for the past couple years I’ve been blogging. You’re constantly negative responses to everything I say though are really getting old. Do you ever have a polite discussion, or is this just the way you are?

    -Brian

  31. Ian Easson says:

    Brian,

    Yes, this dialog with Stephane is pretty useless.  He never makes it clear what his objections are or what he is looking for.  But here are my guesses, reading the tea leaves:

    – He wants Microsoft to build a new version of Microsoft Office (both the application and the file format) that is organized around SGML/Docbook paradigm

    – He then wants Microsoft to release not just the new file format spec for this new version, but also complete *application* specifications so he can then create a clone of it.  (I deduce this from his latest comment in which he says "only 5% of what’s needed to implement an Office *alternative* is documented" — my **)

    As I said in an earlier post, there is thus no practical way he can be satisfied.

  32. Stephane Rodriguez says:

    Brian,

    I am not entirely sure what you mean by negative. Am I, like the shills in the comment area, supposed to comment on how great you guys are? Or is the comment area a way for me to provide feedback?

    If that’s the former, then why not say it explicitely on your blog title : "SHILLS WANTED".

    If that’s not the case, it means you accept criticism.

    Well, in regards to criticism, it seems to me I have been willing to explain in many ways, either with general views, or with explicit examples, why OOXML simply CANNOT become an international standard until it gets reworked.

    In particular, I have showed evidence that because of the so many typos, and the so missing parts in the documentation, contrary to what a number like 6,000 might suggest, that ECMA did not do their homework, and as a result the specs should go back to where they come from. Until the quality is improved. We are talking about an open process, right?

    As I have said before, make no mistake, I don’t oppose the file format itself. Microsoft has every right to create as many file formats they want. But it so happens that you are trying to push it to international organizations, and it becomes a whole different story then. It is simply not possible that respectable institutions allow a vastly undocumented, proprietary format to proceed.

    One example : those 3 formats use VML all over the place. The VML library is proprietary and the semantics is not documented. In BillG’s memo I linked to above, the proprietary Internet Explorer extensions he’s talking about is VML.

    I hope that clarifies a bit.

    Last but not least, I don’t know, out of those who have commented on your blog since you’ve opened it, how many did so while doing actual work, actual implementation.

    Do you at least accept the fact that I implement this stuff, and as a result, I have many results to criticize what’s going on?

  33. Stephane Rodriguez says:

    Ian,

    Get a life.

  34. hAl says:

    I think you should tone down your argument style. The fact that i am interested in Office formats has everything to do with implementing them however I probalby have way different goals then you have. I am looking for implementions created from applications that combine data into template wordprocessing documents and delivering large amount of applications data into complex spreadsheets. I am interested in converted our old millions of automatically created and electonically archived documents and retrieving data from them.

    I am not looking for rendering office documents using the spec as you seem to be. Rendering complex documents like office documents requires heavy complex applications espcially if you want to build everything yourself without using provided api’s.

    Anybody who thinks the office document specs are enabling people to build office functionality should be well aware that office suite applications require many thousands of man hours to build. These formats are not building blocks for that and will never be. but if you have build Office functionality then it is fairly easy to pick the format specs and see which part you can support and which not. You can build converters, importfilters and even decide to change your native format (the latter being a big change).

    [quote]One example : those 3 formats use VML all over the place.[/quote]I am not sure what you mean. The format spec identifies VML to be deprecated and thus only needed for converted office 2003 XML format files. Why would anyone but Microsoft even attempt to support that part of the spec.

  35. Stephane Rodriguez says:

    hAL,

    "I think you should tone down your argument style. "

    What started this thread is Jones’s own admission of Word’s mediocre XML. He had been denying it until now. My comments certainly reflect that there’s something wrong.

    "I am not looking for rendering office documents using the spec as you seem to be. "

    Some people don’t want to render this stuff, it depends on what they want to do. But there is a  difference between what a consuming application intentionally limits itself to do, and the inherent limits of the specs. But Microsoft is not making it explicit. That’s among other things what I want them to do because it, again, it would be another admission of truth. The admission that, when it comes to rendering this stuff, there is no way the 6,000 pages provide what’s needed. Reverse engineering is mandated and, as a result, this is why this thing cannot become an international standard.

    In short, if you are only reading angle brackets, and your application does not require you understand the stuff you read, then you can probably ignore stuff in what you read. For instance, you can ignore VML and everything is fine. But admit that this is more an edge case and that YOU decided to limit yourself to this scenario.

    If you write stuff, you only need to know how to write the stuff in a way that Word/Excel/Powerpoint can open it. For instance, if you write red cells in a spreadsheet, you don’t need to know how to write green cells. Note that this is also true when targetting binary formats.

    But if you are creating a more advanced application that actually renders, calculates this stuff, in other words instantiate documents, then everything must be documented, not only the syntax of attributes. The 6,000 pages of the specs is only the syntax. So by definition, the specs is targeted to dumb consuming applications that can’t generally instantiate documents.

    And this is perfectly inline with Microsoft’s agenda, i.e. make sure that all the data is sucked into actual Word/Excel/Powerpoint instances (cash cow).

    Jones plays innocent child by intentionally trying to confuse people with XML as a serialization format versus XML as a custom schema. Note that these are two entirely different things, that custom XML is not a tool to render stuff, and that it’s been available in Office 2003 (OLE streams).

    We have a fairly simple case of a single vendor pushing their proprietary stuff to an international standard. There is nothing so hard to understand here, and it does not deserve such a lengthy thread. Of course, if you don’t implement this stuff, you may not see what the thing is about.

    Note that Rob Weir just weighed on the subject here : http://www.robweir.com/blog/2007/06/no-representation-without-specification.html

  36. Stephane Rodriguez says:

    "I am not sure what you mean. The format spec identifies VML to be deprecated and thus only needed for converted office 2003 XML format files. Why would anyone but Microsoft even attempt to support that part of the spec."

    This "deprecate thing" is a marketing trick. It isn’t. VML is all over the place in the new file formats. The Excel team has even ADDED VML dependencies in objects such as spreadsheet sticky notes.

    As to whether you don’t have to, again, if you intend to render a document to a screen or a printer, how can you avoid to implement VML too ?

    Another marketing trick about VML is that there is a chapter about VML in the 6,000 pages. But the semantics is not there, only the syntax.

  37. Stephane Rodriguez says:

    If you are willing to take a minute or two to follow those steps and see for yourself if VML is "deprecate" or not : Create a new Excel 2007 spreadsheet ; right-click on a cell and choose "Insert comment" ; type some comment ; save the spreadsheet ; close it.

    Now unzip it and, surprise, surprise, a VML part is waiting for you. The VML describes the layout of the sticky note.

    There is no way a "deprecate" thing, if it was really deprecate, would be created with a NEW instance of an Excel 2007 file.

    Lookup "deprecate" in whatever dictionary, and I’m sure you’ll come up with a sound definition.

    What "deprecate" means in theory is that VML may appear if you are opening an OLD file.

    But it’s not true, as my simple example shows.

    Not convinced yet?

    If those sticky notes were expressed using DrawingML, and the corresponding markup would be entirely documented (syntax and semantics), then that’d be perfectly fine. Regardless of the documentation, if those sticky notes were expressed using DrawingML, that would be a first step. It would not just make sense, it would avoid Jones and others to lie in public.

    VML is pervasive in all 3 file formats. And VML is only one example, that I delibarately chose because BillG is also involved.

    Truth be told, there is an untold story here. It’s been going on for more than a decade, and it’s probable that the Office team is losing at their own game. In short, each MS Office release is never so comprehensive as to advance a new layer (such as DrawingML) in the file formats so broadly that it has the same level of support in all 3 applications. Before DrawingML (E2O internally), there was VML (MSO internally). And before that, there was Powerpoint’s vector graphics layer. This is really where it originates from. With every new release since Office 95, they have tried to use "shared libraries", but never quite got to make releases that made the "shared libraries" equally supported in all 3 applications. In Office 12, it so happens that VML is still there, although the longer term is to go with DrawingML.

    What’s just bad here is that Microsoft is deliberately avoiding to speak about it, and still push the file format so it becomes an international standard. But technically speaking, they are not there yet. The move to DrawingML is not complete in Office 12.

    Perhaps it will in Office 14 (i.e. MS Office’s next release). Then, and only then, it will become fair to push it to international standards. Well, that is, if the semantics is documented too. But at least that would be a sign of progress.

  38. Stephane Rodriguez says:

    (sorry for commenting so much)

    Office 12 versus Office 14

    What the above says is that Office 12 is not a reference implementation of the OOXML specs. In OOXML specs, it says VML is deprecate. Office 12 actual implementation does not make it deprecate at all, as the example is showing.

    That’s the case of a serpent biting itself.

    Isn’t it ironic, a sane person should find it hard that a file format become an international standard when there is only one REFERENCE implementation. Which is the reality, there is no non-Microsoft implementation of the specs out there, and by implementation I mean something intended to do the same thing (not something playing it on TV). But what is this person supposed to think when the so-called REFERENCE implementation is not even a REFERENCE implementation?

    Office 14 may be better at it. But from an international standard standpoint : there is not even ONE STRICT REFERENCE implementation out there.

  39. Angus says:

    Stephane,

    By your tortured logic, ODF should have not become an international standard because there is not one implementation yet that supports the full spec.

    Not to nitpick, by you also incorrectly attributed the statement of Word XML being mediocre to Brian.   Read closely.  Brian quoted the headline of a post in Bob DuCharme’s blog.   Brian’s actual comment concerned the design goals of Word XML.

  40. Stephane Rodriguez says:

    Angus,

    Your rhethoric does not get you anywhere. We are talking OOXML here.

  41. hAl says:

    I can agree with you that Office 2007 might possibly not fully adhere to the current OOXML specs but as you consider a reference implementation important where were you two years ago with ODF implementation as that did not have a full reference implementation then and still does not have one today.

    You consider the format to be the think you can build office functionality on. That is only fine if you duplicate MS Office. If you have your own office suite of your own application designed you do not use the format as a templete for your application but you use your application as a basis and use whatever from OOXML fits into that. It is unlikely more than 10 companies/organisation are capaple of producing full implementation on such a scale that they can fully render all MS Office functionality and edit all of it as well. If you consider that the most important factor in these Office formats you will be disappointed.

    Why would you ?

  42. Stephane Rodriguez says:

    hAL,

    First of all I’m glad you agree at some point. It takes a lot of work though to get you guys to agree on something, that certainly amazes me. May be it has something to do with Microsoft’s pro-OOXML campaign trying to hammer whatever nonsense, every single day.

    I don’t talk about ODF, as I think it’s uncalled for. If I did not disclose I were an independent vendor, everybody would just call me an IBMer. Since they can’t do that, they are steering to ODF. I think commenters like that are just trolls (someone even said he was not interested in implementing this thing, yet insisted on criticizing). Obviously, I would prefer sounding examples of why I might be wrong.

    Of course, if I were wrong, there might be a problem. As a vendor (for a number of years), I deal with very real issues like that all day long every day. I think it’s very hard for someone to prove me wrong. Note that Jones is not even trying. I wish the other commenters would be a little more clever and understand that the easy road for me would be to just shut up, and that it actually goes against my business to put under scrutiny (ends up being very negative) what Microsoft is doing.

    It so happens that I want the OOXML international standard process to be really open, not playing open on TV.

    More generally, perhaps that’s not extremely clear because this is now a lengthy thread, but mostly I have been debunking some of the lies being written about OOXML on this blog and the small ring of blogs linking to each other in the hope that gives them weight (see Doug Mahugh’s blog, it’s hilarious). I have debunked statements such as "fully documented", "backwards compatible", "100% XML", "VML is deprecate". And I can go on. It so happens I implement this stuff, so I see through the smoking wall.

    "It is unlikely more than 10 companies/organisation are capaple of producing full implementation on such a scale that they can fully render all MS Office functionality and edit all of it as well."

    By saying that, the mistake I think you are making is that you internalize that a 6,000 page specs is acceptable as an international standard. (And that fast-tracking it is, of course, the best way to accomodate for the length of the specs.)

    I don’t think so.

    I also don’t think there are more or less 10 companies who can afford implementing this stuff. No one can implement this stuff in full-fidelity because it implies such a coupling not only with Windows but also with what Office applications do internally (i.e. the semantics), that it’s not achievable. For instance, VBA macros have been available since early 90s, and no non-Microsoft implementation supports this stuff. And by then, Microsoft would have sued you since you would have reverse engineered their IP (the covenant not to sue is a joke since their IP is in non-XML parts of OOXML, and OOXML only documents XML parts). For more information, please read my article : http://www.codeproject.com/cs/library/office2007bin.asp

    "That is only fine if you duplicate MS Office."

    This is what the international standard is for. That it is so hard in the case of MS formats just means it cannot become an international standard.

  43. hAl says:

    [quote]This is what the international standard is for[/quote]

    Not it isn’t. A standard Office format is not just to create clone Office applications with all the same features. Also there is no need for 100% interoperability between documents created such applications. If that were the case than creating standards is only stiffling innovation.

    An open office format need to create a certain amount of interoperability which is nescesary for the required task. The more complex the task the more complex the documents will be that are exchanged and the more features you need to support on both end. Governments that want interoperability should not use extremly complex office documents with custom extension features or macro’s even if they are in an open format. But internal company documents could have a really complex custom extensions to integrate them into specific Office applications. (For instance adding digital organisation specific archive info in your documents using a custom schema making all documents automatically archivable).

    Interoperabiltiy with such difficult office documents formats will require the use of simple documents and the ability for implementing applications to ignore certain not implemented complexer parts of the format without compromising those documents.

    It is fairly easy in current OOXML and ODF format specs to create documents that cannot be correctly read by any implementation. However there are also tons of applications that manage to be fairly interoperabel with the not open binary format of MS office even though they implement a lot less features than that Office suite. This can drastically improve using the OOXML format as those application can now much easier extend their interoperability with such formats and still not feel the need to support every feature that a company like Microsoft has decided to support.

  44. Stephane Rodriguez says:

    hAL,

    "Not it isn’t. A standard Office format is not just to create clone Office applications with all the same features."

    I am sorry if my point did not get across. This is not what I exactly meant, and I chose to say "cloning Office" instead of "instantiating Office documents" only because commenters above felt they did not understand what I was talking about.

    But to be accurate, it’s about instantiating documents. The file format provides keys for programmatic access, but instantiating documents is a different thing.

    If the specs is complete, you can create a program that will instantiate ANY kind of Office document without ever the need to reverse engineer undocumented encodings.

    It’s not exactly what the specs contain. 95% of what is needed is not there (I said 5% is documented) if you want to provide rendering capability in full-fidelity.

    "The more complex the task the more complex the documents will be that are exchanged and the more features you need to support on both end."

    What you keep forgetting, I think, is that any of the 3 file formats are nothing more than 15 years of legacy. And if it is so hard, it’s because of that, not because the 3 file formats are inherently complex. Why should a format that does word processing, spreadsheet, or presentation should be so hard? There is nothing inherent complex in either.

    If you start implementing this stuff now, you are in it for at least ten years. Is it a good thing? Obviously not.

    Let me give you one example : there are at least 6 ways to describe text formattings in the specs. Some of them are totally coupled to GDI internals or printer DEVMODE internals. You call this thing interoperable across platforms and a good thing for the long term? I am curious to know how the Office team created their mobile versions. If you want to read interesting bits about the Mac BU at Microsoft, just read Rick Schaut’s blog. Recommended. (http://blogs.msdn.com/rick_schaut/archive/2006/12/07/open-xml-converters-for-mac-office.aspx)

    Those 15 years of legacy (this includes VML for instance) are the keys to the next stage of vendor lock-in.

    If you think this is a replay of what we have known already, it’s worse. Just read this MS blog about OBA scenarios : ""In this scenario, a manufacturing sales representative uses a solution based on Office Word 2007, Microsoft Visual Studio 2005 Tools for Office Second Edition, and the business data catalog feature of Microsoft Office SharePoint Server 2007 Enterprise Edition to access product information and create a new quote sheet.(…)". (http://blogs.msdn.com/dazwilkin/archive/2007/06/21/learn-how-to-build-obas-combining-office-word-2007-the-bdc-and-a-back-end-sql-server-database.aspx) Customers need all this crap to create a quote sheet? Isn’t custom XML supposed to make it possible to make you free of actual Office application instances? How come you end up with $20,000+ worth of product licenses?

    Last but not least, a rant about RTF : http://blogs.msdn.com/michkap/archive/2007/06/21/3431070.aspx

    If you don’t know why I am inserting it, it’s because the Office 2008 downlevel converter from the Mac BU converts Word 2007 files to RTF files. Just saying…

  45. I have to agree with hAL on this, irrespective of the fact that this will become a de facto standard anyway, so a lot of this esoteric, detail-based argument is moot, how is having OpenXML as a internationally recognized standard, worse than it not being a standard?

    In these days of de facto standards, how many ISO and other standards are not used and forgotten anyway. If OpenXML does get recognized as a standard, this doesn’t necessariy mean that everyone is now duty bound to adopt it, does it?

    Coming out with a statement like <paraphrase>the spec becoming an ISO spec is there so you can duplicate MS Office</paraphrase> is unhinged.

    Surely, the benefit has to be seen in a less emotional, detached context.  If it becomes a standard, then it becomes for all intents and purposes, impossible for Microsoft to ever take this knowledge away, or deny the usage of this knowledge to anyone who wishes to use it.

    If it is not a standard, then this is less clear cut.

    -Organizations COULD take the view that they ARE adopting the OpenXML ISO standard, as it now exists, and then stick to it for ever, meaning that Office 2007 is the last version that will ever be truly relevant.

    -A large organization with talents far greater talents and resources than ours or Stephane COULD create an Office suite would have more functionality than Office, but would be able to open and save documents in this ISO format.  (Note I mention talents, as although Stephane and other complain that it is impossible to create this full fidelity rendering etc, enough extremely talented programmers, skilled in reverse engineering COULD replicate the small areas of undocumented functionality that COULD exist in OpenXML instance documents, and MS would be unlikely to shoot themselves in the foot by pursuing IP claims on the consumption and production of an ISO STANDARD that they helped to create, now would they)

    -Many smaller developers are now in the position of being able to either render or extract content from Office documents to a very high dgree of fidelity, which was previously prohibitively difficult.  Backing this up with a no-way-back standard means that these guys can be safe in the knowledge that their products will work with any documents that conform to the standard (irrespective of how hard anyone claims that to be).  This is once again backed up by the fact that Microsoft will be nailed to the wall on ALWAYS allowing a Save As to a specific revision of the ISO Standard within Office for years to come.

    -It is now easier to create documents which can be read by Microsoft Office (and any other apps that implement "sensible" parts of the spec) from other applications and non-Microsoft platforms. Surely the non-MS crew would be happier to hold up a copy of the ISO-ratified spec and beat Microsoft with it if their implementations of it don’t work as advertised, rather than their previous position?

    The weird thing is, I could have understood this type of vitriol BEFORE OpenXML, talking about obfuscated file formats, vendor lock-in etc etc, but no-one was really all that bothered, but as soon as Microsoft make a step in the right direction, they get slated.

    Gareth  

  46. Stephane Rodriguez says:

    "but no-one was really all that bothered, but as soon as Microsoft make a step in the right direction, they get slated."

    If you think Microsoft is getting there for a technical reason, you certainly live in a fantasy land. Microsoft is getting there because governments are mandating it. It’s very defensive. Their covenant not to sue is of the same kind.

    But Microsoft are not going to kill one of their two cash cows that easily. Hence the new formats that are nothing more than angle brackets around the 15 years of legacy.

    "It is now easier to create documents which can be read by Microsoft Office (and any other apps that implement "sensible" parts of the spec) from other applications and non-Microsoft platforms."

    Not true. See my example with VML.

    "Stephane and other complain that it is impossible to create this full fidelity rendering"

    I haven’t said that at all. I have said it is impossible to create this full fidelity rendering WITH THOSE SPECS ALONE. I am an adept of reverse engineering. I am not complaining the file format (4th time I say it), I am complaining the international standard.

    "Organizations COULD take the view that they ARE adopting the OpenXML ISO standard, as it now exists, and then stick to it for ever"

    That’s where you don’t get it. Office 14 with an updated .DOCX file format that changes the semantics of a couple attributes. Sure your existing app can still read the angle brackets, but to assume that it can still render it, extract it, or do whatever significant thing with it is a bet you are making. What’s Microsoft record so far when it comes to file versioning? Are not they world reknown for making sure that you have to upgrade over and over again?

    You are somebody from datawatch. Do you generate Excel files? (sorry I usually do research, but this time I did not bother) If yes, have you noticed that a number of things that opened and rendered well in Excel 97-2003 don’t open and render well in Excel 2007 only because the compatibility mode in Excel 2007 has plenty of bugs? Sorry to rain on your parade.

    "In these days of de facto standards"

    If you intentionally confuse de facto standards, and international standards, then there is no discussing. Of course Microsoft binary formats are de facto standards. And, should Office 2007 formats grow in market share, they will become de facto standards too. But is it what we are talking about?

  47. hAl says:

    Stephane, you are now going into Office 2007 which is fine but frankly not very relevant. I think an ISO adoption for OOXML would make it nescesary for MS to adopt Office 2007 to the ISO standardized format rather than the other way around. If Microsoft want to claim support for the ISO format then I think they will have to smooth out any flaws in Office 2007 to that format. This is what makes the standardazation powerfull. If Microsoft wants changes they will need to proces them trough Ecma making them more open and if ISO standardisation is involved they will need to be aware that national bodies need to be convinced those changes are needed and keep the current document base compatible.

    Furthermore I must say that Gareth also put in some great points probably explaining them better as I am able to, in my limited English. (often have troubles reading my own posts a day or later)

    Stephane, I actually am happy with you giving critique on the standard. By no means I find OOXML perfect but I find it a big step to what me and a lot of developers that work with Office documents want and need in having an open and stable format. But I would rather have you critique the proposed standard to get in an improved version as I would prefer in stead of only objecting.

    If anything I see this blog as a place where you could state some things you would like to see changed and know that it is read by Microsoft and most likely even be seen as constructive towards the future.

  48. Stephane Rodriguez says:

    hAl said "you are now going into Office 2007 which is fine but frankly not very relevant. I think an ISO adoption for OOXML would make it nescesary for MS to adopt Office 2007 to the ISO standardized format rather than the other way around."

    I have no idea what you are talking about.

    Office 2007 is supposed to be a reference implementation of OOXML. I have pointed out a couple of Office 2007 flaws to explain that, among other things, Office 2007 is actually 1) not a strict reference implementation (VML not deprecate at all) 2) quite a buggy one (Excel 2007 compatibility mode has plenty of bugs)

    It means one vendor who is pushing a proprietary format so that it becomes an international standard is actually unable, right now, to show a reference implementation for it.

    So not only there is no non-Microsoft implementation, Microsoft’s own implementation falls apart.

    If Office 2007 itself isn’t appropriate to back their specs, what will?

    Or should we, as Gareth suggests, internalize Microsoft culture, keep our mouth shut and let Microsoft get their way regardless?

    Perhaps it’s typical American culture, I have no idea. But ISO national bodies don’t have to comply to this crap, right?

    I mean, ECMA time stamped it without doing their homework (many typos, many missing parts). But ISO isn’t that bad, I think, there is hope for a real review.

    "By no means I find OOXML perfect but I find it a big step to what me and a lot of developers that work with Office documents want and need in having an open and stable format."

    The problem is if you regard the issue under the light of a special case, rather than the general case. Microsoft is very happy with people like you. They are betting the whole farm on it (that’s why Jones talks about custom XML at every turn). That’s exactly the difference between someone’s file format, and an international standard.

  49. Dave S. says:

    hAl, et al,

    Surely you know that governements have pacts in place to respect and/or follow one-anothers policies when it comes to free trade.

    As an ISO standard, ODF gets first place when deciding on document formats for trade exchanges. Such placement provides an opening for competing vendors. This has been somewhat true for Wordperfect in the legal and government areas for some time, but making ODF an official requirement gives developers the certainty needed to get funding and create competing products.

    Sometimes organizations agree to disagree. e.g. ISO has standards on dimensioning and tolerancing and so does the American National Standards Institute. Even so, ISO and ANSI work to harmonize their standards so that goods and services can be transfered between member nations. The differences are few enough to be easily recognized and dealt with.

    The sole purpose of pushing for ISO adoption of MS Office Open XML is to get on the government approved specifications list and displace ODF.

    ISO and ANSI don’t validate implentations, they only provide standards management so that lawsuits over not following the required standards have a firm basis for what constitutes the standard.

  50. Stephane Rodriguez says:

    In closing, as the owner of this blog can’t bother even provide a sound argument, I’ll recommend first of all anyone to read comments from the US ISO committee who appear to be public.

    Such as this one : http://www.ibiblio.org/bosak/v1mail/200706/2007Jun21-121337.eml

    And, to others, who really think what they think, I shall simply remind them that file format experts have seen this story before, have learned what Microsoft does, and the best of what they can do in this "david versus goliath" combat is to warn adopters. To warn adopters that it is very hard to reconcile "MS’s Office Open XML gives freedom from Office applications" and "OBA scenarios will work with $20,000+ worth of MS Office product licenses", both of which are coming from the horse’s mouth.

    Up to you guys. And at least, this thread got me the opportunity to learn where the heart of the commenters above go towards.

  51. Andrew Hilton says:

    Stephane,

    I imagine Brian doesn’t bother to reply because it’s pointless.  Tirades generally win in comment forums.

    Continuous linking to material by persons who have obvious conflict of interest does not buy my vote.

    Andrew Hilton