Getting Word 2007 Technical Files into Publisher Pipelines


Nature, Science and other publishers have robust ways of converting Word 2003 documents with embedded Equation Editor and MathType objects into the XML representation they use for publication. Notably MathType can export mathematical equations as MathML and this capability is part of the methodology. In principle a similar approach can be used with Word 2007 docx files with math zones in OMML. Since these files already consist of XML, they should be compatible with the general approach. But it takes time to implement. For example, transformations must be performed to ensure validity according to the NLM DTD (see http://dtd.nlm.nih.gov/). Publishers add a lot of value both in improving the writing itself as well as in including information to render correctly and to satisfy archiving requirements. This work has to be integrated into the process. I’m kicking myself for not having contacted the publishers back when we offered Word 2007 beta versions over a year ago. Then we might have been able to work out a solution closer to the time that Office 2007 shipped.


I’m very impressed with what the publishers are doing and how forward looking they are. Back in the late 1980’s and early 1990’s I was chairman of the Optical Society of America’s Publication Technology Committee and I participated on a similar committee for the American Institute of Physics. Both these groups were very advanced electronically at the time. But it’s amazing to see how much progress has been made since then.


The publisher infrastructure isn’t the only area that has trouble with Word 2007’s new equation capability. Perhaps you’ve noticed that PowerPoint 2007 doesn’t understand it either and uses images for math zones instead. Although the images generally look very good in Internet Explorer, they don’t work well in PowerPoint because the sizes and backgrounds used on slides are typically very different from those in Word. We’re working on this problem too…


Comments (8)

  1. Here are a few interesting links I came across this week: Open XML in Science and Nature – Murray Sargent

  2. davidacoder says:

    Excellent to see you getting engaged with the publishers! Keep us updated on progress :)

  3. I saw this article today, and wanted to make sure that folks weren’t confused about the latest with some

  4. As I noted on the Nature blog post on this, MS has also gotten the new citation and bibliographic support wrong. While nice from a UI standpoint and fine for students, it won’t work for many real-world scholarly uses because the fields only remain live within the Word 2007 universe. So if I use Word 2004 and my colleague uses Word 2007, we cannot collaborate. Even worse, the citations my colleague adds to their document show up as plain when I open it open on Word 2004.

    I’ve heard the technical explanations from MS for why this is the case, but I don’t find them compelling; it seems to me to have been a (bad) business decision.

  5. Zhang Jianfa says:

    Hello Sargent, I have a problem with equations in word 2007, and donn’t know if it’s proper to post here.

    When I paste equations of word 2007 into Mindmapper, some characters, such as

    hbar, show as "?", even through I changed the font to Cambria math. Pasting

    equations into other OLE supported software have similar problem. Why?

  6. Zhang Jianfa: Sounds like a Unicode issue to me. While OLE should be able to transfer Unicode your program you paste into must be able to handle it; if not, characters are converted to a near equivalent, if present in the current codepage or ? if they aren’t.

    Murray Sargent: I’d love to see Math support in all Office applications as well … I’ve used Word 2007 last term to write my notes for a math lecture and it worked remarkably well (except numerous crashed when using backspace in math zones) but for example charts cannot contain math in their text areas so when I needed them I had to create a Word text area on top of the chart and hope it doesn’t get thrown around by layout (as happened to images in previous versions frequently — didn’t try in 2007 whether it still happens :)

  7. Mohammed Soliman says:

    Does anyone know how can I open an XML file containing mml:math tags and view and edit it in word 2007? (edit the equations)

  8. Mohamed Gad says:

    Dear Sir/Mrs:

    I Want to open XML+MathML with NLM dtd in word 2007

    Thanks