Science and Nature have difficulties with Word 2007 mathematics

Science and Nature, two premier science publications, are having difficulties with Word 2007’s elegant new mathematics facility. Part of the reason is due to misunderstanding about Word’s MathML support, which hopefully this post will help to rectify. And part of it is that the new facility represents mathematical text in a way that Word itself understands. Such mathematical text can differ dramatically from text entered using the Equation Editor and MathType, which use embedded OLE objects opaque to Word. Since this second area is primarily responsible for the choices made in Word 2007, I discuss it first.

As soon as mathematical text is represented in a way that Word itself understands, things are both simpler and more complicated. Things are simpler because Word’s user interfaces, formatting commands, object model, etc., can be used directly with mathematical text. Things are more complicated because this convergence in user interfaces allows users to insert Word-oriented features into math zones such as

· Images

· Revision markings

· Footnotes and comments

· Elaborate formatting and styles, …

 

The file format needs to be general enough to express such material faithfully. Unfortunately, MathML 2.0 isn’t able to handle embedded XML namespaces and as such simply isn’t general enough to represent Word 2007 technical documents. Accordingly we had to develop an XML approach that is general enough and we created OMML (Office MathML), which can be embedded in Word’s primary XML, WordProcessingML, and vice versa.

 

Office 2007 also ships XSLTs to convert OMML to MathML (omml2mml.xsl) and MathML to OMML (mml2omml.xsl). These XSLTs are used, for example, by Word for MathML clipboard support. They are stored in the subdirectory C:\Program Files\Microsoft Office\Office12. Naturally the MathML resulting from OMML in this way is missing content like images, revision markings, footnotes, etc., but for many purposes that’s acceptable. It just isn’t acceptable in the Word docx format, since this format has to reproduce exactly what the user created. The docx format and OMML are international standards and are thoroughly documented as noted in previous blog posts.

 

One of the very nice features of XML is that it can be translated relatively easily from one kind of XML to another. David Carlisle has used this flexibility to advantage in converting Word’s HTML to HTML with embedded MathML. Word’s HTML contains the math zones in two formats: OMML in comments and images. David’s program extracts the OMML, uses the omml2mml.xsl to convert to MathML and puts it all back together. Admittedly David is a magician, but he proves it can be done J

 

The bottom line is that Word 2007’s new math facility is a huge improvement over past approaches. But anytime such big improvements occur, there can be, and evidently are, problems with upgrading. I think the trouble is well worth it in both user convenience and the marvelous typographic quality. I’ve been doing technical word processing since the late 1960s and Word 2007’s mathematical capabilities still amaze me. Not that it’s finished; we do have a number of features to add…