Creating Math Web Documents using Word 2007

If you use Word 2007 to create a document containing mathematical equations and expressions and save it as a web page, it looks just as good in Internet Explorer as it does in Word 2007! The equations look as though they had been typeset by TeX or in some ways even better. How did this come to pass? Well Word 2007 saves mathematical formulas in an HTML document in two ways: 1) the original OMML contained inside comments, and 2) png’s for programs that don’t know how to distill the OMML out of the comments, or don’t know how to display OMML even if they could retrieve it.

Currently at least Internet Explorer knows nothing about OMML and only knows how to display MathML if a math behavior facility such as Design Science’s MathPlayer is installed. So Internet Explorer bypasses the comments in favor of the png’s and they look really fine unless you zoom them (which makes them get fuzzy) or change the background color (which may make them unreadable).

Needless-to-say, David Carlisle, web/XML/TeX/LaTeX/MathML/etc. whiz, was intrigued by this situation. He figured that with a little help Word 2007 could do “the right thing” and create HTML documents with embedded MathML instead of OMML. He next concluded that the right thing for him to do was to prove it. In the process, he discovered and fixed some bugs in Microsoft’s omml2mml.xsl file, which Word uses to export MathML. You can download David’s handy work from his post on the subject. Qualifies as exceedingly cool!

Comments (15)

  1. davidacoder says:

    It would actually be an excellent idea if Word 2007 saved a MathML representation of the equations it contains in docx files. This would just be in addition to the OMML representation, nothing else would need to change (i.e. all load etc would go via OMML).

    Why? Because that might actually allow us to use Word’s equation editor for scientific work. A huge number of major publishing houses with the leading journals do not accept documents that contain equations created with Word 2007 (have a look here at Science Magazin for example: Quite frankly, Word 2007 just misses that market right now, it can’t be used. The situation is worse than it was with Word 2003. I believe you guys need to accept that they use MathML for their workflow. That does not mean you have to use it fully as well, I can very much understand your reasoning on that point. But just include a MathML represenation of every equation in the docx file, so that we can use Word 2007 for scientific work.

  2. MurrayS3 says:

    David Carlisle shows how to get HTML with embedded MathML from Word 2007. From a typographical point of view, the embedded OMML in Word’s docx or HTML formats is actually better than either MathML or TeX. Since it is an XML, it can be converted to MathML using the shipped omml2mml.xsl as David shows (he fixes a couple of bugs in that xsl). So with after market tools anyhow, publishers can use Word 2007’s math. Remember also that this is only version 1. As I mentioned in a comment replying to your comment on my Find/Replace post, Rome wasn’t built in a day 🙂

  3. davidacoder says:

    Yes, but I cannot tell publishers to do that. They simply won’t accept a paper if I write it with Word’s 2007 equation editor. Quite a number of them do not accept PDFs, they want the original format that was used to write the paper (like TeX or doc).

  4. David Carlisle says:

    > but I cannot tell publishers to do that.

    Actually I don’t see why not (at least why you can’t ask them to do that). Your reference to PDF is misleading, PDF is an end-format and the last thing the publishers want as input from authors as they need to typeset the document to their own requirements.

    But this is different. a docx file from office is just a zip file containing a pile of XML, if that pile of XML contained some MathML that the publisher could use in further processing, they’d have to tool up somehow to extract that and probaby use XSLT or XPath to do that extraction.  As it is, the zip file doesn’t contain MathML it contains oomml but microsoft supply an xsl file to go from one to the other

    so the tool the publishers need to extract mathml from the docx is

    essentially identical, just add a line saying xsl:include href=omml2mml…

    There were a couple of bugs in that stylesheet that I fixed in the stylesheet mentioned in my blog but they weren’t hard to find or fix and note that it’s much easier to fix the transformer than to fix bad mathml, so from that point of view it’s better to have the original oomml and a bad (but easily fixed) transformer than it is to have the docx file storing badly converted formulae.

    It’s not surprising that publishers aren’t yet set up to accept office 2007 yet (it’s still 2007!) but the page that you referenced showed I think that the publisher there hadn’t spotted that office does have quite good mathml support. To be honest it’s not suprising that they didn’t spot it: it’s fairly well hidden.


  5. davidacoder says:

    Just for the record, I did not want to indicate that PDF was a solution to this, quite the contrary, I wrote that publishers do not accept PDF submissions.

    There is zero point in me asking Science to change their publication process. I also feel that MS should do that, they put a product on the market that does not play well with existing infrastructure at publishers, so they should work with them to enable their customers to use Word for that scenario.

    I don’t think point out that publishers could just use the MS stylesheet to do the conversion themself is a good point. Is that method officially supported or documented anywhere from MS? Will this method be supported in new versions? Not to my knowledge. This would essentially mean that publishers would base their workflow on a hack that is not supported or documented by MS. You cannot seriously ask them to do that. The way you do this is essentially using an unsupported and undocumented API within Word, with literally no way of knowing whether that will be serviced or supported in the future.

  6. David Carlisle says:

    Maybe the discussion should be on my blog rather than filling up Murray’s, but anyway while we are here…

    Firstly I should probably stress that I have no connection with Microsoft, just an interest in making sure that mathematics in general and mathml in particular get to work in an interoperable way.

    > There is zero point in me asking Science to change their publication process.

    On the contrary, one of the most important things that a publisher would use to asses the importance of supporting a new author submission format, is user-pressure from authors.

    As for how much support the stylesheet has, I can’t answer that, but MathML  input/output from the clipboard is a supported documented feature of word and according to reliable sources, the stylesheet is how that is implemented.

    As I said in my comment originally, documentation of MathML support in Word is virtually non existent, which is an issue, but not directly related to the technical issue of whether the file save format should change. Given that the MathML output in Word (to the clipboard) is obtained by running the omml2mml stylesheet over the omml expression, I can’t see any technical advantage in storing both the input of that stylesheet (the omml) and the result (the mathml) in the docx file, it just makes the docx file larger with no extra information stored and practically no usability increase, as any method of extracting the MathML from the docx could just as easily do a combined extraction/conversion of the omml.

    > Is that method officially supported or documented anywhere from MS?

    The whole point (presumably, I know some would offer different, more political motives) of switching Office to use an XML based output format is to  allow people to use XML tools on that output, this seems to me just a completely natural use of that technology.  Microsoft obviously can’t support every possible XML workflow that starts with a docx file, but they surely must support the general principle that such workflows be built.

  7. Chris Rowley says:

    Reverting to what should go into a Web page to ‘represent some non-textual character-based notation’.

    We are coming to the conclusion that what is needed is a universally recognised tag called something general such as:


    Which is a container for as many different representations as an application may want to put there.    

    Browsers can then ignore it all (useful if they will do strange things when exposed to non-standard HTML) or look inside for the stuff they can  use (including the primitive, but unfortunately necessary, bitmap representation).

    Note that using something from an XML vocabulary such as mathml namespace for this container tag does not work.

  8. Thank you for taking the time to comment on such an interesting area of wood processing

  9. I am not a mathematician, but the indication that Greek letters are not properly rendered in Office 2007 raised alarm bells. I work with Japanese, and also need macron vowels to render Japanese in romanization for English publications. These characters are in Unicode in the Latin Extended-A character set, and I have hot keys to put them into documents easily. What I am wondering is whether the Greek letter problem alluded to is only for Greek letters that appear in equations, or whether Office 2007 is somehow defeating the purpose of the shift to Unicode that made life so much simpler for many of us who work with different character sets.

  10. MurrayS3 says:

    Patricia, I don’t know of any problems with Greek letters in math zones. Standard Unicode code points are used. It’s true that Word 2007 doesn’t handle the recently added bold digammas (U+1D7CA and U+1D7CB), but the other Greek letters should be fine. Could you give me an example of a problem?

  11. When I open documents in WORD2007 written using earlier versions of WORD, and also when I save them as docx files, the symbols are not correctly displayed in many cases.  Sometimes they do appear correctly, however.  The documents I have produced include contributions from other people and always when their equations are embedded they are not correctly displayed in WORD2007, whereas they were in the earlier versions of WORD.  Beta appears as a bicycle and an integral sign as a cocktail glass complete with stick.  When I double click on the embedded object it displays correctly but it doesn’t stay that way.  If I save in pdf using the WORD2007 pdf export facility I get the bicycles and cocktail gasses but if I print to pdf using CutePDF Writer the equations are written to pdf correctly.

    Is the problem a consequence of a font being missing in the main text part of WORD?

  12. John Rowlands says:

    I can add something to what is written above.  By highlighting the embedded object and right clicking one has – Equation Object – Convert.  (Convert to Microsoft Equation 3.0) – OK.  This works in many cases but the equation is about a mile wide.  However, it can be condensed sideways as one would an inserted object.  However, some characters just disappear completely, including the minus sign which one would have thought would be present in all fonts.

  13. John Rowlands says:

    Further to the above two messages, the problem is now solved.  The problem was that the Symbol font was missing from WORD2007 and I don’t know why.  By opening the Font folder in Control Panel the Symbol font just became present in WORD2007.  However, to get the converted equations it was also necessary to open the original doc format documents and allow the equations in these to be converted before saving as docx.

  14. Rajkumar says:


    I am a software developer.May I Know how to binding the Mathametical Equation with in a line.




  15. wonder says:

    Recently, the minus sign can’t be displayed using Equation 3.0 in word 2007,  why?