There can be only one?


I had a few people point me at a couple of IBM blogs today (Bob Sutor and Rob Weir) and I have to admit I was a little disappointed to see that they are really working hard to continue to push negative views of the Office Open XML formats. Basically they want to position it in such a way that there is a winner and a loser, and it’s no surprise that they think the winner should be the one they’ve put all their resources behind (ODF). It’s definitely a strong “us vs. them” mentality that you also see a lot in politics these days. I admit I’ve pushed back in the other direction at times and had some criticisms of the Open Document format, but those have always been in response to folks who ask why we couldn’t use ODF as the default format for Microsoft Office. I had always stated that we needed a format that could fully support all of the features our customers used, and when the ODF folks snapped back saying that I wasn’t providing enough concrete examples, I decided to start providing specific problems. I’ve never said the world can’t use ODF, I’ve just said that the Office Open XML formats are also necessary. I feel like some of these folks have watched Highlander one too many times (hence the title of this post). I would never make the claim that the HTML format means that ODF isn’t necessary, and I certainly don’t believe the ODF means that Office Open XML isn’t necessary.

The latest criticism from Bob and Rob is that the Open XML formats don’t use MathML, and instead define a separate XML syntax for a Math Presentation format. Rob even displayed a bit of a flare for the dramatic, as he titles his post “Math you can’t use” and Bob followed up with “Making bad choices, over and over again.” Well thankfully this isn’t really true, and to be honest, if posts like that aren’t considered ‘FUD’ I don’t know what is. Every piece of the Office Open XML format is being fully defined in Ecma, and we’ve even built XSLTs that will transform from MathML and back. In addition to that, we’ve worked closely with different companies out there that already support MathML to make sure we are compatible with their solutions. We support MathML on the clipboard, so you can paste a MathML equation into Word. Here is the latest version of the XSLT that takes the Office Open XML format for Math and transforms it into MathML (http://jonesxml.com/resources/omml2mml.xsl), and here is the XSLT that goes in the opposite direction (http://jonesxml.com/resources/mml2omml.xsl). Anyone who has Beta 2 of Office 2007 should already have these on their machine under “Program Files\Microsoft Office\Office12”.

Just in case folks aren’t sure what I’m talking about, this is all about the presentation form of MathML. The math is never actually calculated, only displayed. Also note that this is different from the discussions around functions, which are a large part of the SpreadsheetML specification. Unlike the spreadsheet functions, the Math support is all around scenarios like academic papers that need to use formulas as part of the information they are presenting.

I remember a few years ago having a discussion with Murray Sargent, who was one of the key folks behind the new math support in Office 2007. He also had worked on the MathML 2.0 standards body before it was dissolved, and we talked about whether or not we could use MathML for the formats. He obviously was very familiar with the MathML format, and the conclusion was that we unfortunately couldn’t use MathML in our new default XML formats. We found that while MathML works great for isolated math islands, it didn’t give us everything we needed at the document-level. Although MathML does have space for annotations so we could have extended it, that would not have worked well with document-level features like comments, track changes, Word styles, etc. The equation support in Word 2007 is actually very impressive, and if you haven’t taken a look yet I strongly suggest you give it a try.

We did agree though that we should fully support MathML as an interoperability language between apps, which is why we can read and write Presentation MathML on the clipboard (leveraging those XSLTs).

This is just another example of the difficult decisions we had to make when building these new formats. Of course we would have loved to have just used MathML, as it was already fully designed and documented. It would have been much easier, but it would have also meant we would have to either cut back the functionality, or extend it in such ways that it was no longer as usable. If you ever used the HTML formats from prior versions of Office, you’ve seen that when you try to take a format that was designed for other purposes and add extensions so that it can represent your files you often end up with a rather complex and unmanageable result. So instead, we used MathML as a guide, and tried to leverage as much of the design as we could. We had to make sure we could support our features though and not let the format put the end user in a bad state. Most of our users don’t care the least bit about XML and XML formats, and if moving to the new file formats meant things like tracked changes wouldn’t work on the equations, then folks would have chosen to stick with the binary formats instead. So we instead have an XML format that supports all of the features, and that format is fully documented and free for anyone to use. Not a bad deal in my view. I can’t say enough how proud those of us are who worked on the formats are. It’s such an important change in the world of Office documents.

-Brian

Comments (62)

  1. John says:

    "I was a little disappointed to see that they are really working hard to continue to push negative views of the Office Open XML formats".

    This is a little rich when you continue to push negative views of ODF.

    "This is just another example of the difficult decisions we had to make when building these new formats".

    Then maybe you will allow ODF the same leeway.

    And then maybe we can get back to the technical aspects of Office XML.

    John.

  2. Fernando says:

    Well, one thing is disagreeing with design choices that are made when creating a standard, and I bet IBM would be very welcome to voice their concerns about the non-use of MathML in the ECMA TC45 subcomitee.

    Another very different thing is to rush a semi-complete, draft-level standard through ISO, falsely claim that "billions of existing office documents will be able to be converted … with no loss of data, formatting, properties, or capabilities", and then use it as a selling point to government worldwide.

    Nice try from Rob and Bob though.

  3. James says:

    You don’t state *why* you don’t use MathML in your spec though, only saying ‘we can interop with it fine, look’. Why wasn’t MathML used? If you can convert to and fro with XSLTs, I see no reason to re-invent the wheel and define a new standard for your app.

  4. BrianJones says:

    John,

    I think you missed the point of my post. I have no problem with ODF, and people can use it as much as they want.

    My only criticisms of ODF have been in response to people that have tried to push it as the one and only standard for Office documents. I’ve tried to cite examples of why the ODF format was not something we could use as a default format for Office.

    Fernando,

    You are correct that anyone is free to voice disagreements with design choices. IBM is a member of Ecma international, but has decided not to participate in the TC45 work. I think that one of the reasons that the IBM folks don’t understand the design decisions of Open XML is that they don’t see how the goals of the two formats are quite different. ODF was never intended to fully represent all the features that Microsoft Office customers use. Open XML absolutely had to meet that requirement.

    James,

    Actually I did state above why we couldn’t use it in the default format:

    "Although MathML does have space for annotations so we could have extended it, that would not have worked well with document-level features like comments, track changes, Word styles, etc."

    -Brian

  5. I have to say that the case you make is not particularly compelling.  There is absolutely a place for multiple standards, and I don’t hope ODF wins and Open XML loses, but there is also room for sharing common standards within the larger fomats when possible.  While I obviously cannot tell how much work it would be to extend MathML to handle the Word styles and change control, etc., I think this is an area where Microsoft is falling into a trap of its own making.  If Open XML is supposed to be a standard beyond Microsoft, some extra effort needs to go into creating a format that allows use elsewhere.  Even if Microsoft Word supports reading and writing MathML to the clipboard, that doesn’t help any third party products that want to work with Open XML data as well as ODF data.  Each product must now maintain more code to work with the two formats, and fairly unnecessary code.  Microsoft again manages to give the impression that for the sake of a short term advantage with Office 2007, it is willing to avoid standards that don’t have to diverge.

    This is a disappointing decision that weakens the argument for Open XML being intended as real standard and not just an "Microsoft Office standard".  Microsoft is acting quite a bit like AT&T when it was so used to being a monopoly that it couldn’t compete when competition came along.  Microsoft is just shooting itself in the foot when it avoids existing standards such as MathML and SVG.  It should make the effort to support those standards, thus both reducing the ammunition against it as a standards supporter, and, more importantly, focusing its differentiation on areas where it can bring actual value, such as the spreadsheet standards.

  6. Sinleeh says:

    Dear Brian,

    In your reply to Fernando, you kinda complain that IBM did not participate in TC45 eventhough they are Ecma members. Isn’t it the same as complaining that Microsoft does not participate in ODF committee at Oasis despite being members of Oasis?

    Another problem I see is that negative comments about ODF and OpenXML at such a late stage for both formats (ODF out of OASIS into ISO, Office 07 close to shipping),  making it impossible have a proper discussion about the design decision.

    I’m going to be critical here. Rob presented his case about MathML. His arguement is effectively MathML can be used in different apps as-it-is. You skirted this question by saying an XSLT transform is available and direct cut-and-paste from MathML is supported. Both are great but why do other developers have to go through the humilation of a transformation? Unfortunately, your anwser to this is less than persuasive here. It sounds like "trust me coz I know better". You mentioned MathML is not cooperating well with document level objects, it would be great if you had XML sniplets to demonstrate your case.

  7. Sinleeh says:

    Dear Brian,

    Forgotten another thing in my original comment. I  think the implicit line of attack from ODF front with respect to VML and MathML is that Microsoft is arrogant, do what it wants and is simply not going to listen to the others collective wisdom.

    While Microsoft is definitely free to do so, do you think I have a point when I say "At some point, one have to compromise a bit from the pure viewpoint of focusing on document creation / maintenance / management for the sake simplifying interoperatbility?". I must say that all defence  from Microsoft seems to be focusing strongly on the document viewpoint mentioned. It sometimes sounds to be unnecessarily overpowering.

  8. BrianJones says:

    I think folks may have forgotten that the Office applications have supported standards for years now. We’ve supported open and saving Word documents, Excel spreadsheets, and PowerPoint presentations as HTML files. We’ve done this since Office 2000, and we also did a lot of work to add our own attributes to the HTML so that we could preserve all the features our customers used. These extensions are what a few of you are now asking us to do with things like ODF and MathML. If anyone has used the HTML that we output, you’ll see it’s pretty complex because of those extensions. In fact, we’ve had a lot of people complain quite loudly that the extensions were not the right thing to do, and that we should instead have just blocked people from using features that couldn’t be represented in the HTML standard (without extensions).

    The Office Open XML format was not intended to be a generic document format standard. It is a standard that was intended to be compatible with Microsoft Office documents. This is really an important point. We were in a world where all of the documents our customers saved were in a binary format that was extremely difficult to build solutions around for third parties. We wanted to move out of that binary world and create a new XML format that would be open, free, and fully documented. This new XML format though had to still do everything that the old binary formats could do, otherwise our customers wouldn’t use it. That is a really important piece to understand when looking at the design of Open XML.

    In terms of ODF, I’ve never said it’s not a good generic format. It just doesn’t work as the default for Office. We’ve said that if there is significant customer demand, we would build in support for ODF as well. There was actually a good amount of demand from the government sector, and while it’s too late to add new functionality directly into Office 2007, we created an open source add-in project that will give people ODF support in Office. So people that want to use the more generic format (with some feature loss) are now free to do so. From my point of view, this isn’t an "us vs. them" issue. It’s about choice, and we’ve actually historically provided a good amount of choice with file formats in Office (RTF, Text, HTML, XML, binary, other app binaries, PDF, etc.).

    -Brian

  9. Bryan says:

    I can certainly appreciate that MathML may not be easy to integrate with all the document features provided by Open XML, and if the gentleman you mentioned worked on the MathML spec, I’m going to presume he knows what he’s talking about. :-)

    That said, would it be possible to treat MathML as a special custom XML schema? Meaning you could store equations as MathML chunks in the package, which could be kept in sync with the document. This would make it easy to extract equations, process them, or insert them back in–and would allow us to leverage the large number of MathML tools available. I know it’s getting late in the dev cycle, and I admit haven’t had the time to fully explore custom schemas to understand their limitations, but maybe it’s something that could be considered for a future revision to the standard?

    On a related note, are you aware of any groups at Microsoft–within Office or otherwise–that are actively working with the MathML standards people? Ultimately, I’d love to see MathML get platform-level support so that is could be leveraged by other applications (i.e., IE).

    Thanks.

  10. Sam Hiser says:

    Brian-

    The term ‘standard’ means one thing. To suggest there can be two standards is bad grammar (oxymoron) as well as bad technology.

  11. Brutus says:

    It’s all very well to respond to ODF FUD in blogs, but the fact is, ODF advocates like IBM are feeding these sorts of lies and distortions to *governments* in an attempt to persuade (read "trick") governments into *exclusively* mandating use of ODF.  Microsoft needs to really step up its lobbying efforts or else the world will be stuck with ODF based on lies rather than technical merit.

  12. AC says:

    Wow, it takes some guts from someone from Microsoft to bring up the Highlander syndrome. Has there been a mea culpa from Microsoft when they chopped off the heads of Netscape, Lotus, BeOS, etc., that I missed? Otherwise, I think it’s just a tad hypocritical.

  13. orcmid says:

    I don’t understand Sam Hiser’s comment.  

    For there to be multiple standards in a field of application is quite common.  

    At the ISO level, there were *already* both SGML and ODA (the, ahem, Open Document Architecture).   Look at graphics standards and standards for image formats.  

    Some times the multiple standards are promulgated by the same organization, some times by others.  The U.S. is still not on the full set of metric standards and yet the non-metric standards used in the U.S. are standards nonetheless, and the metric standards are also usable in many US settings.

  14. Patrick Schmid says:

    Dennis, the metric/US measurement argument has one problem. The US non-metric units were defined by Congress as multiples of metric units.

  15. orcmid says:

    So where is the metric bar for the meter that is used in those standards?  I bet it’s not in Paris.  Or do we use some kind of atomic scheme for the dimension now?

  16. Patrick Schmid says:

    Dennis,

    all seven SI base units (metric units) are defined by measurements of natural phenomena except the kilogram. That still is defined according to a mass that is actually kept in Paris. Scientists haven’t been able to come up with a measurable natural phenomenon to replace it yet. A replacement is needed, as a physical prototype can actually lose mass over time (it is speculated that the kilogram prototype has actually lost mass since it was created in the 1880s). For practical reasons, each nation tends to keep a copy of the international kilogram prototype that is calibrated against the international one from time to time.

    By the way, the grounds of the International Bureau of Weights and Measures that keeps the kilogram are considered international territory (similar to the UN in New York), not French territory.

    The meter e.g. though is defined in terms of the distance light travels in a certain amount of time. For all definitions, see http://en.wikipedia.org/wiki/SI_base_unit

    For the definitions of the US customary units in SI units, see http://en.wikipedia.org/wiki/U.S._customary_unit

    For an interesting history of SI units, see http://www.aticourses.com/international_system_units.htm

  17. orcmid says:

    Based on Brian’s remarks about the difficulty directly integrating MathML as a format with in Office Open XML, I became curious how products that support ODF as a native format smooth things over.  My only handy benchmark for this is OO.o.

    So, I opened OpenOffice Writer and looked around for a way to create a formula (not for evaluation but for presentation).  The only way I found to do that was to do an Insert | Object | Formula.  I could type the text of my simple formula, (a+b)/(a-b) and a somewhat nicer version showed up in the frame of the object in the text.  

    I saved the document and then I went looking inside.  It turns out there is a separate Object1 directory in the Zip, and the content.xml file uses the MathML namespace. It also uses a private DTD which I am not sure how to find:

    <!DOCTYPE math:math PUBLIC "-//OpenOffice.org//DTD Modified W3C MathML 1.01//EN" "math.dtd">

    Finally, I looked to see how this was incorporated into the main content.xml file for my document, and I find a draw object that mentions .Object 1 for inclusion on load of the content document and that’s it.  So the formula is in a little island of its own.  Interestingly enough, there is also a draw:image element that links to .ObjectReplacementsObject 1 which is a file with no extension that holds, ahem, binary data of unknown characteristics.  (This is a little startling since I occassionally here claims that sainthood is achieved by ODF never allowing a binary data unit to cross its lips and that any breach is to be met by banishment from civil society.)

    Section 9.3.3 of the ODF Specification allows binary data to be linked, but ODF is a bit coy about how an application is to know the actual format (PNG is recommended, but I don’t quite know how one is supposed to know when that is the actual case).  In the inline case, the dreaded binary is to be encoded in the dreaded Base64 taken as blatent envidence of proprietary non-standardness.

    The draw:object element of my little example (and defined in Section 9.3.4 of the ODF Specification) could be binary too but in this case there is a link to an XML object with a mysterious DTD, as already mentioned.  It is recommended that a draw:image also be present as an alternative and there is one as already discussed above.

    To be fair, I created a Word 2003 document with the same equation, and saved it as an Office Open XML .docx file.  This ended up using some vml elements and also linked to a binary file for the Equation.3 OLE Object that Word 2003 used to make the equation.  The vml was in-line in the Word document but the OLE Object was linked via a relationship.  Information about the nature and the format of the binary was explcitly provided in conjunction with the link.   I did not try Word 2007 to see what might be different in that case.

  18. orcmid says:

    Patrick, that’s interesting.  It  doesn’t alter the existence of multiple standards, but I think it is cool that the SI units are maintained that way, and that U.S. units are referenced to those.

  19. Patrick Schmid says:

    Dennis,

    Word 2007 has a completely different equation feature (as Brian said, it’s really cool). It ships with a list of equations users can choose from. I picked the Binomial Theorem and inserted the equation into a DOCX file. Then I took a look at it.

    The equation was completely represented in XML and it uses this namespace: http://schemas.microsoft.com/office/omml/2004/12/core

  20. orcmid says:

    I agree with Ben Langhinrichs in principle, but I am not sure that the W3C would have ever considered office document requirements for MathML, it being the W3C, after all.  The page at http://www.w3.org/Math/whatIsMathML.html provides this interesting comment:

    "MathML is a low-level format for describing mathematics as a basis for machine to machine communication. MathML is not intended for editing by hand, but is for handling by specialized authoring tools such as equation editors, or for export to and from other math packages."

    I’m not sure what to make of this, but I can see how Microsoft might have been more comfortable handling the export to and from case in Office 12.

    I notice that the OO.o form of the MathML used for my example is of the presentation type, not the semantic type ilustrated lower down on the W3C page.

  21. orcmid says:

    Thanks Patrick.  It is useful to learn that the Word 2007 equation system produces a pure XML element structure for it.

    I didn’t know what ot expect when I save the Office 2003 document as a .docx, and it is interesting to see that the OLE object used by 2003 is preserved in the beta 2 compatibility conversion to 2007 format.

    Not being quite sure where else to go with this, I also imported the Office 2003 .doc file into OO.o (since it doesn’t recognize .docx as currently shipped).   Now that I see it, I guess it should be no surprise that when that is saved to ODF both the draw:object and the draw:image are now in binary.

  22. Ian Easson says:

    Dennis, here’s the actual xml for your (a+b)/(a-b) equation in Office 2007.  There was a choice of presentation formats, so I chose "professional", which put it like a math equation in a textbook:

    <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

    <w:document xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006&quot; xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:o12="http://schemas.microsoft.com/office/2004/7/core&quot; xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships&quot; xmlns:m="http://schemas.microsoft.com/office/omml/2004/12/core&quot; xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/3/wordprocessingDrawing&quot; xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/3/main"&gt;

    <w:body>

     <w:p>

     <m:oMathPara>

    <m:oMathParaPr>

     <m:jc m:val="centerGroup" />

     </m:oMathParaPr>

     <m:oMath>

    <m:f>

    <m:num>

    <m:r>

    <w:rPr>

     <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math" />

     </w:rPr>

     <m:t>a+b</m:t>

     </m:r>

     </m:num>

    <m:den>

     <m:r>

     <w:rPr>

     <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math" />

     </w:rPr>

     <m:t>a-b</m:t>

     </m:r>

     </m:den>

     </m:f>

     </m:oMath>

     </m:oMathPara>

     </w:p>

    <w:sectPr w:rsidR="00A12ECC" w:rsidSect="00407C86">

     <w:pgSz w:w="12240" w:h="15840" />

     <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0" />

     <w:cols w:space="720" />

     <w:docGrid w:linePitch="360" />

     </w:sectPr>

     </w:body>

     </w:document>

  23. Ian Easson says:

    I wonder if the use of the word "modified" in the DTD from ODF that Dennis found,

    "-//OpenOffice.org//DTD Modified W3C MathML 1.01//EN"

    indicates that ODF 1.0 uses a *modified* version of MathML.  If so, then the IBM blogs about "Math you can’t use" "Making bad choices, over and over again", are pure hypocrisy, since they do exactly what they criticize Microsoft of doing!

    Can anyone clarify what "modfied" means in this context?

  24. Paul Topping says:

    To the commenter Bryan,

    IE has good support for MathML (better than Mozilla/Firefox) via our free MathPlayer plugin. It supports both Content and Presentation MathML as well as working with screen readers for access by blind users (it speaks the math).

    You can see some comparisons here: http://meta.wikimedia.org/wiki/Blahtex/Bugs_in_browser_MathML_support

    Paul

  25. Francis says:

    Ben L: AT&T is a poor choice of an analogy in this context. AT&T/Bell Labs invented the standards used by the telephone system to this day. These were then, later, turned over to a private industry group (the Administrative Council for Terminal Attachment.

    If Microsoft and ECMA represent these players, then all will be well. AT&T’s relinquishment of control over the very system it created, i.e. opening of standards, led to a profusion of telecommunications devices, providers, and services.

  26. orcmid says:

    Regarding Ian Easson’s question, the weird DOCTYPE in the XML prefix is courtesy of OpenOffice.org, so we should be careful not to lay it at the feet of the OpenDocument specification.  I some sense, the presence of a DOCTYPE at all is problematic and it is probably a mistake of the OO.o implementation that it is there.  (This does show how big a hill there is to climb before there are tested interchange profiles that can be used to qualify these implementations.)

    Now, the file does use the correct XL namespace for MathML (and some UTF-8 character codes for parentheses that I can’t get to render properly), so pretty much all the DOCTYPE can do is regulate some elements and attributes (and namespace) usages to restrict the file to some selection that OpenOffice.org will accept.  

    Because there is no URL provided for access to the referenced DTD, the use of this material in interchange among applications is a bit murky.  I suspect that will have to be sorted out.

    PS: I did not check the ODF specification for anything it might say about restrictons on MathML.  It was late and I was so thrown by the occurence of binary material (Base64 encoded or not), that I had to stop before my head exploded.

    PPS: What seems pretty clear is that, in practice, OO.o use of ODF is somewhere around the level of technique that we find in Word 2003’s use of the XML-carried material, at least for the equation case.  The specimen of Word 2007’s approach is heartening, although the use of a Microsoft-specific namespace for part of the (beta2 ?) content indicates that there is more work yet to be done.  Unless there’s been a new public link to a new ECMA TC45 draft, it is difficult to foretell how that is going.

    PPPS (pent up thoughts here).  I think the beta2 converter’s preserving an Office 2003 Equation.3 OLE entity in a .docx is striking and the kind of thing that will invite heat.  It is certainly a practical solution at this stage and (I shall finally install 2007 beta 2 just to see what happens next when that document is imported and saved as .docx).  I can see that it will create difficulties where people want to stamp out the binary cruft that comes with OLE embeddings (or linkings), at least in documents that must be preserved and interchanged as part of eGovernment and other cases too.

    I am not crying out about this.  I understand that we are seeing the early stages of a complex and lengthy journey, and I have great admiration for the effort that Microsoft is investing, no matter what the past situation was and the degree to which competitive offerings have triggered a self-interested response.  (The competition is a good thing, because it causes everyone to improve their game and to seek excellence.)  This is hard work and it is being seriously undertaken.  The team has my admiration for that and the way they have struggled (especially Brian) to be open and accountable as they move ahead.  There’s more that I can say about this, but I should consume of some of my own blog space for that.

  27. orcmid says:

    I did find the math.dtd.  

    It is stored in the install location of Open Office 2.0 in subdirectory sharedtdmath1_01 and it is a significant object (35,000 bytes).  It also has a revision history and other comments.  Note that, for interchange purposes, having a location for a relative URL tied to the location of installed software is not exactly workable.  I am sure this will be sorted out over time, but meanwhile many OO.o ODF-format documents are being created with this cruft.

    One thing the DTD does is force use of particular QNames for the namespaces (technically, not a requirement in an ODF document or in XML documents generally, and a problem for the OO.o implementation to work its way out of some day).  

    [I have not checked to see if Word 2007 is brittle in this same way, but I would be extremely surprised to find such a case, even though Office saves XML files using very compact QNames for performance and file-footprint reasons.]

    The comment that seems relevant here is this one:

    — Modifications are intended to ease validation

    — of MathML files written by StarMath 6.0

  28. Karthick C says:

    Hi Brian Jones,

    Can you help me in making a presentation on Office 12. I happend to find your blog on hunting for it, your’s quite interesting.

    1. What’s new in Office 12? (I am new to it i have been using office XP)

    2. I got an doubt do there is any other version for xml other then 1.0, if so where do we use that? if not why there is an attribute version on the first line of xml file?

    <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

    and this is my mail id karthickcbca@yahoo.co.in.

    I feel sorry to take your time on me.

  29. Francis – Sorry, I wasn’t clear.  It is not AT&T’s use of standards for the telephone to which I point, since that is what made AT&T a monopoly.  It was their attitude after being a monopoly for a while when they tried to enter the PC business and couldn’t adjust to the competition in that market.  You are quite correct about the durability of the AT&T standards, but I think you will agree that AT&T PC’s are not prevelant today.

  30. orcmid says:

    OK, one more update on equations in ODF (er, OO.o) and in Word 2003 – Word 2007 and .docx files.

    I finally installed the Office 2007 beta 2 and the first thing I did was open the .docx that I made with Word 2003 and an Equation.3 embedded OLE object.  

    Word 2007 beta 2 did three very cool things.  First, when I opened the file, it included the fact that this was a Compatibility Mode situation in the title bar.  Secondly, I got an alert about improvements to the equation system (from a third party) and advice on how to find out more.  Finally, I was adviced that I could convert the .docx to standard mode and it told me what button to click to find out more.

    On upgrading to "standard" .docx, I couldn’t see any difference.  So I saved the new version (once I found Save As … ) under a new name.  

    I then opened the cleaned-up .docx in Word 2003 and, lo, I still get the Equation.3 OLE embedding that I started with.  

    On inspecting the .docx with WinZip and an XML editor, I found the following interesting thing.  The OLE embedding is still being carried (though it has a slightly different size), there is also a .wmf file being carried, apparently for the same image, and there is the vml description of the equation presentation in the document.xml file.  

    Although I haven’t given this a meticulous examination, it appears that someone has put in a lot of thought about how to get the maximum chance at preserving round-trip use across Office versions while also getting a flavor that works in more-or-less "pure" Office Open XML, all packaged together.  

    In a way, OO.o is accomplishing something similar with its use of ODF MathML, although there seems to be insufficient information to accomplish interchange successfully in the actual OO.o document and the ODF specification (and I may well have overlooked something in the ODF spec.)

    This little exploration reinforces my admiration for the Open Packaging Conventions though, as well as the maturity of the Microsoft Office team’s use of XML as an interchange carrier.

  31. Patrick Schmid says:

    Karthick C: I am not Brian Jones, but you are probably well advised to look at the Office preview site: http://www.microsoft.com/office/preview/default.mspx

    There are two versions of XML: 1.0 and 1.1: http://www.w3.org/XML/Core/#Publications

    Dennis: Compatibility mode in Word actually drives me crazy. Let’s say you create a docx file in Word 2007 that includes some new object, e.g. a SmartArt. You then save this file as doc and open it in Word 2003. The SmartArt will be rendered as image (so would an equation created in 2007 btw). You can make changes to the file in Word 2003. When you open the doc file then in 2007, the SmartArt will still be an image. However, once you convert it to a docx file in 2007, it will be a fully editable SmartArt again. I am very impressed that this works even when you copy & paste the image into a new doc file. Now repeat the same thing with a SmartArt in Excel or PPT, saved as either xls or ppt and opened in 2003. When you open the xls or ppt again in 2007, the SmartArt will be automatically editable without first having to convert the file to the Open XML format. The difference between Word and Excel/PPT is a design decision, as the Word Compatibility Mode stresses how the same document would look like in 2003, while Excel and PPT worry more about ease of use than layout-similarity. So fine so good.

    Now, install the Compatibility Pack for Word 2003. Open the docx with the SmartArt in 2003 using the Compatibility Pack (it will appear as image in 2003) and save the file as a docx in 2003. Then reopen the file in 2007 and hit Convert. What happens? The SmartArt stays as an image and it cannot be edited anymore. Opening the docx in 2003 using the Compatibility Pack destroyed the round-tripping that you can do with the doc format. If you do the same with xlsx and pptx, the SmartArt remains editable after saving in 2003 and opening in 2007 again.

  32. hAl says:

    I respect this blog a lot more than for instance the one by Rob weir who censors all comments on his and removes any remotely negative messages on ODF.

    I fact not just this blog but in several blogs by Microsoft personel and team I have found that they are much more open towards the internet communities than I had expected.

    It is also not like IBM’s Rob Weir that has to smuggle his blog postings onto groklaw (never known for their IBM’s bias) to get his message to the crowds.

    Carry on Brian and give us the much appriciated info on OOXML.

    Btw, do you guys take bets on the size of the documentation that makes it to Ecma standards 😉

    My guess will be 6453 pages.

  33. orcmid says:

    Patrick made me smile.  I fear that we may be seeing evidence of the intersection of Conway’s first law (the structure of a software system reflects the structure of the organization that builds it) and Steele’s law (there are only two sizes of software development teams: less than 12 and more than 100 or some numbers like that).  

    Architectural incoherence is a great challenge for Microsoft and Office (or Vista) is its existence proof.  What can I say.  

    I think the struggle to create a coherent definition of the format is a necessary step to reining in complexity, but it is clearly not sufficient.  I am at a meeting where it has been suggested by some well-known computer scientists tghat the inconsistencies among the current code base are insurmountable and that it will require inconsistency-tolerant software verification techniques to ever conquer.  I have no idea.

    PS: I bet the specification shrinks once there are recognized ways to reduce redundancy, rely on diagrams and tables, and otherwise make the treatment of details more compact.  But time pressure might not allow that to occur in the first version.

  34. Juan R. says:

    It is rather understandable that Microsoft decided do not support MathML because technical motives. We supported MathML technology in an experimental way during close a year and recently decided abandon it by several technical motives.

    James said:

    "Why wasn’t MathML used? If you can convert to and fro with XSLTs, I see no reason to re-invent the wheel and define a new standard for your app."

    I find just curious that the MathML WG was exactly critized this way: "reinventing the wheel…" and as you can find in the Internet! "… and doing it square."

    MathML people did not copied very good stuff from previous standards (de facto as TeX or international like ISO-12083). In fact, the WG even ignored the previous W3C CSS spec, whereas inventing own elements and attributes (now forced to be deprecated in each new version of the spec.). If the own W3C people is known to ignore other popular W3C specs (look SVG criticism, look the trouble with X-link, H-link…), why would Microsoft to be forced to support ‘external’ specs if are _not_ working in well-defined fields? A standard is to be used when *solve* your problems.

    Sinleeh said:

    "I’m going to be critical here. Rob presented his case about MathML. His arguement is effectively MathML can be used in different apps as-it-is."

    Rob posting is very disccussible and I will be talking about it a bit in my personal blog (Canonical Science Today) the next week. About the theoretical usage of MathML as a data format between apps, I presented this year in the official W3C MathML list how p-MathML code for something so simple as dot{q} generated by several MathML tools produced either incorrect rendering or processing errors when submited to last Mathematica. Now I am repeating the experiment with different pieces of c-MathML code and also one obtains lot of errors. It is supposed that Mathematica is one of better (more complete) MathML tools. The situation with other tools (specially free ones is still poor).

    Sam Hiser said:

    ·The term ‘standard’ means one thing. To suggest there can be two standards is bad grammar (oxymoron) as well as bad technology.·

    If your words were correct (I do not think so) then MathML is a bad technology because an international standard exited before: ISO-12083.

    Take the case of the Office Math sample provided by Ian Easson. I can see that numerators are tagged like <m:num> whereas denominators as <m:den> In Microsoft ‘new’ format. Precisely this was the approach taken by ISO-12083 that MathML rejected in favor of a more presentational markup is DOM and CSS unfriendly (doing implementation in browsers a nightmare). XML-MAIDEN also uses <num> and <den> as in ISO-12083 and render them via CSS or XSL-FO.

    Ian Easson said:

    "I wonder if the use of the word "modified" in the DTD from ODF that Dennis found,

    ‘-//OpenOffice.org//DTD Modified W3C MathML 1.01//EN’

    indicates that ODF 1.0 uses a *modified* version of MathML."

    It is natural to see adaptations and or modifications of the MathML standard. Elsevier publisher is also using an in-house modification because technical limitations of the W3C standard.

    Moreover, the OO.o DTD is using the old family 1.x of MathML with several stuff deprecated in version 2; probably the same will happen in future MathML 3 as Paul Toping could confirm us here.

    Juan R.

    Center for CANONICAL |SCIENCE)

  35. Fernando says:

    "If your words were correct (I do not think so) then MathML is a bad technology because an international standard exited before: ISO-12083."

    Great. Seems that poor Rob Weir really can’t get anything right. Let’s see what is his excuse this time – will he come up with a new one, or just rehash his two favorites:

    – The magical ODF plug-in now in trials in Massachusssets will solve all problems.

    – Everything will be solved in ODF 1.2.

  36. Paul Topping says:

    In response to Juan R’s comment and at his request (sort of):

    MathML 3.0 may well deprecate some thing in earlier versions. As with any standard, it isn’t perfect. MathML has really just started gathering steam. I’m amazed at the growing number of websites that support it. Once math-based search and math accessibility, both enabled by MathML, take hold we will really see wide adoption.

    Paul

  37. Patrick Schmid says:

    MA has postponed the roll-out of open source applications that default to ODF. The reason is that the current non-MS Office applications do not provide sufficient accessibility. The state is now looking into plugins (presumably for MS Office) to provide ODF capability as a near-term strategy to still use ODF as default:

    http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9002594

  38. Wesley Parish says:

    Concerning "There can be only one", I’m reminded of the standards hoohaa over the internetworking protocols.

    Several big networking players had offerings of their own, and the darling of the standardization set was the Open Systems Interconnect.  Meanwhile the TCP/IP stack was establishing itself, although it was far from being the most elaborate and "complete" internetworking stack available.

    How many of you use DECNet, IPX/SPX or SNA on a regular basis today?  How many use OSI?  How many use TCP/IP?

  39. hanishkvc says:

    As far as I can think sanely "if some open standard is not up to mark for some related purpose" then the best approach to do is work with "that standardization body" and "other groups" working on similar issues and then come up with a consensus  solution which can be adopted into the standard as well as used by all concerned parties.

    This way the world is saved from one more standard as well as everyone can coexist happyly and ethically without getting bogged down by business or commercial motives.

  40. Juan R. says:

    Paul Topping I desire you good luck with your idyllic vision of MathML.

    I have updated a Canonical Science Today entry with comments on MathML and the new Microsoft format. It can be accessed in

    [http://canonicalscience.blogspot.com/2006/08/microsoft-avoids-mathml-in-office-xml.html]

    I do not know details of the new format for mathematics, but I am glad to see solid points in above piece of markup are present in other markup languages such as ISO-12083 and XML-MAIDEN, also adopted in the proposal for HTML5-Math discussed at WHATWG; points are not present in MathML.

    I also review both Rob Weir and Bob Sutor postings about MathML format and add some basic thoughts about fiasco of OO to correctly render a simple piece of math recently reported by Rob Weir:

    [http://www.robweir.com/blog/2006/08/demo-mathematica-mathml-and-odf.html]

    Juan R.

    Center for CANONICAL |SCIENCE)

  41. Juan R. says:

    sorry but the previous link to blog archive is not working due to internal bug of system. The archive can be currently accessed from the root,

    [http://canonicalscience.blogspot.com/]

    the comments link do not work neither but you can send my comments to my personal e-mail.

    Juan R.

    Center for CANONICAL |SCIENCE)

  42. Juan R. says:

    Sorry by writting again Brian Jones!

    The problem with root access to the article on MathML and Office MML is that the article will be automatically eliminated by the blog software in a near future when the root changes due to recent additions. This does your readers have a limited time to accessing that archive from the blog root.

    Fortunately, i am able to do the August archive working now and a permanent link would be next

    [http://canonicalscience.blogspot.com/2006_08_01_canonicalscience_archive.html]

    Sorry by inconveniences!

    Juan R.

    Center for CANONICAL |SCIENCE)

  43. BrianJones says:

    No problem Juan, thanks for all the information.

    Thanks everyone for all the great comments and feedback. This is a really good discussion.

    My wife and I just celebrated our 2nd anniversary yesterday, and as you can imagine I haven’t been quite as active on the blog. I’ll try to read through everything and see if there is need for another post on this topic.

    We also had the Ecma TC45 face-to-face meetings last week, so I aslo want to talk a bit about that. We’re getting really close!

    -Brian

  44. Jemm says:

    Congratulations, Brian! :)

  45. David Carlisle says:

    Thanks for making the stylesheets available, I have the beta, but don’t see them (although I assume they are somehere as mathml output works via the clipboard, wahoo!) . Not wanting to get into the ODF OOXML debate, I think Bob Sutor was a bit hard on you, really. Given that Word will have MathML input/output I think that’s a ringing endorsment of this standard not ignoring it. It’s often the  case that a "MathML application" uses MathML on IO and not in its internal structures (same true of mathematica/maple, etc) It’s just that having used XML to expose the internal structures

    it’s a bit more obvious in your case.

    It’s not surprising that some of the comments made about math in Word 2007 didn’t appear to notice that the mathml support was there, I’d looked for it but didn’t find it: I saw one check box about mathml in a menu item but no other information, and when I tried cutting and pasting I didn’t see any mathml.

    (actually because I pasted into wordpad and it showed the formatted image. If I’d have pasted it as text or used notepad or something then I’d have seen the mathml, but it took a while to find that).

    You said

    > He also had worked on the MathML 2.0 standards body > before it was dissolved,

    we weren’t disolved, just changed status from Working Group to Interest Group (as we didn’t plan to issue a MathML 3 immediately, so as to have a period of stability

    in order for implementations to catch up) However we are now back to WG status to work on MathML3.  Microsoft

    would of course be most welcome to re-join the group

    see

    http://www.w3.org/Math/Documents/Charter2006.html

    One of your commenters said

     but I am not sure that the W3C would have ever

     considered office document requirements for MathML,

     it being the W3C,

    But office requirements (and other non-browser uses of MathML) are explictly in our charter:

    Other XML Standards Groups

       MathML is frequently incorporated into other XML

     languages being standardized. Recent examples include

    S1000D or OMDoc,

    or the Oasis Open Document Format for Office applications.

    That mentions ODF explictly as it was announced at the time, but helping to ensure smooth transition from Word to MathML would fall under the same Working group remit.

    One of the major planned items for MathML3 is rtl language support, especially for Arabic, and it would be good to make sure any extensions in that area fit well with the extensive language support in Word.

    David

    (W3C Math WG member, and co-editor of MathML2, but speaking for myself)

  46. orcmid says:

    David,

    That was me.  I dug deeper into the MathML material on the W3C web site after that, and I agree that the objective would appear to be accomodation of document embedding and I would hope that the recently-initiated MathML 3.0 effort will get into that.  

    I notice that ODF includes math:math as an element and referencs the MathML 2.0 specification, but there is nothing but a couple of sentences and a (prose) reference to the specification.  The ODF schema does not specify or quality how the MathML schema is blended into that of the host document.  

    My sense of MathML 2.0 is that it is strongly focused on the specific goals of support in web pages and browser functionality, and its examples are predominanty oriented to XHTML.  

    Do you have some sense for what would be done to allow hosting of MathML in other documents and document models that may require injection and comingling of host elements and attributes in the MathML material, or is it to remain an island with transformation in and out of document models by linking to isolated MathML content?

  47. David Carlisle says:

    > Do you have some sense for what would be done

    er yes, however first just asking a point of protocol to Brian Jones, I just wandered in here via google and am not sure of the house rules about how much discussion should take place on the comments or whether should be moved elsewhere.

    mathml is designed to be directly embedded in other formats

    docbook explictly has a module for this, people do it with TEI I know, and in xsl:fo via its inline-foreign-element.

    The "interesting" part os co-minglng which is easy to define at

    a dtd/schema level.  in docbook for example you just make mml:math allowed in docbook:equation, the question is do you then want to allow docbook markup inside mml:mtext

    (which is just pcdata by default) it’s easy to extend the schema to allow that but much harder to implement if the math is being rendered by some plugin architecture, as the math renderer needs to call back to the host text renderer.

    Same is true of XHTML, You can easily specify an XHTML+SVG+MathML schema in which all elements may be mixed in more or less natural ways. Mozilla can even handle that as essentially it implements an xhtml+mathml+svg engine nativley. However the distributed xhtml+svg+mathml dtd follows the published mathml spec of not allowing xhtml inside mml:mtext. If you have xhtml markup in mathml in xhtml in svg in xhtml

    and IE is rendering the XHTML, Adobe is rendering the SVG and mathplayer is rendering the mathml, then that would require a degree of cooperation between the components that goes beyond what’s currently available. We had hoped that the W3C CDF activity would spec out how such compound documents were supposed to work but they have restricted themselves to rather simpler use cases so far.

    so basically allowing mathml inline in any other xml vocabulary is fairly easy, whether or not to allow that host vocabulary (including perhaps further nested mathml)

    to be nested inside the mathml is a difficult question.

    It’s a lot easier to specify than implement. Even in a closely integrated environment like word I note that you can’t for example paste a textual table into a formula.  TeX users are used to being able to go vbox{….} inside a math

    expression and then having full access to  all textual elements (section headings, whatever) although it’s not immediately clear that having section headings in a fraction is something that any standard really ought to be supporting.

    I noticed that OO.org writes out math fragments as separate documents in the zip file but I assume that’s more to do with how it wants to internally interface to its equation editor rather than anything to do with the details of the mathml format.

    > My sense of MathML 2.0 is that it is strongly focused on > the specific goals of support in web pages and browser > functionality,

    Not really, making sure it worked in computer algebra systems was always part of the plan,  in the day job I generate C code from it, etc.

    David

  48. Juan R. says:

    Well finally problems were solved and basic thoughts about MathML and Office math available on a copy

    [http://canonicalscience.blogspot.com/2006/08/microsoft-avoids-mathml-in-office-xml_22.html]

    I continue with Internet problems this month and still did not download the ECMA draft. However a colleague mine -expertise in scientific and mathematical markup- did and provide next comment to me:

    "From the first glance markup looks quite solid."

    Therefore congrats (double because anniversary!) Brian because solid is not the word i heard about MathML from this colleague.

    My colleague has not revised the draft in deep but has said me that the markup for scripts is not all good would be and suffers from i call the base rendundancy problem. This is a well-known problem in MathML obligating to increase the number of script elements in a funny way.

    In fact today, MathML 2 is able to encode less script structures than ISO-12083 like mathematical markup when ISO-12083 used _less_ elements than MathML.

    In fact, due to unusual complexity of markup for scripts, one finds MathML tools (e.g. IteX, ASCIIMathML…) encoding scripts via hints and tricks. For example, the tensors of general relativity are being encoded as tricky combinations of msub and mrow elements instead using the correct mmultiscripts tag.

    I says this because standarization is being claimed to be a advantage of ODF over Microsoft ‘own’ format, but anyone who worked in mathematical markup with a bit of detail knows that advantages of being a standard claimed for MathML are lost thanks to the ‘infinite’ variability in the codes generated by MathML tools.

    Take the 15 diferent examples i elements i introduced in

    [http://lists.w3.org/Archives/Public/www-math/2006Jul/0120.html]

    we simply do not know how will be encoded by different MathML tools.

    I will download the ECMA draft and provide feedback when find some time. I also wait to provide some ‘stylesheets’ to|from CanonML format is under active research.

    About original goals of MathML, well… One of ironies of the "math for the web" (MathML) is in its unfriendly web design doing implementation in browsers difficult and/or impossible (parsing, CSS, DOM queries…). That is one reasons that after many years only Mozilla (and W3C Amaya) obtained native support (for part) of MathML.

    In fact, one of the plans of the MathML 3 WG is the study of further changes to current spec for favouring the implementation in future browsers.

    About how MathML ‘works’ in computation, well it is time to remember some clarifications from Neil Soiffer (one of MathML authors and recognized expertise in computer algebra systems):

    <blockquote>

    Content MathML is <em>not</em> really designed for computation.

    MathML purposely does not contain an "evaluate" token.

    What a MathML application should do when it receives the following is not defined

    <apply> <sin/>

    <apply><divide/>

    <cn type="constant"> &pi; </cn>

    <cn>4</cn>

    </apply>

    </apply>

    </blockquote>

    The emphasis is in the original.

    Juan R.

    Center for CANONICAL |SCIENCE)

  49. BrianJones says:

    David, I don’t really have any rules as far as the comments go. It’s great when we get some good discussions going like this, and I really appreciate you providing all the background information.

    Thanks Juan! We’re actually going to head up to Vancouver for the weekend to celebrate a bit more (that’s where I proposed).

    -Brian

  50. eucap says:

    The Open Document movement, from the OASIS industry consortium, is slowly but surely wresting Microsoft’s market dominance in word and spreadsheet applications. The Oasis consortium is formed by government and public institutions around the world, as

  51. Enclick Blog says:

    The Open Document movement, from the OASIS industry consortium, is slowly but surely wresting Microsoft’s market dominance in word and spreadsheet applications. The Oasis consortium is formed by government and public institutions around the world, as

  52. Some really interesting things to note for the week:

    New blog on Math in Office – Murray Sargent who…