Mathematical RTF

This post discusses the Word 2007 math RTF control words. A good way to understand these control words is to note that they are actually OMML tag names written with RTF syntax. Hence you can refer to the very thorough OMML documentation for more detailed information. For example in OMML, the built-up skewed fraction for a/b is represented by

<m:f>
  <m:fPr>
    <m:type m:val="skw"/>
  </m:fPr>
  <m:num>
    <i>
      <m:r>a</m:r>
    </i>
  </m:num>
  <m:den>
    <i>
      <m:r>b</m:r>
    </i>
  </m:den>
</m:f>

In RTF, it can be represented by

{\mf{\mfPr{\mctrlPr}{\mtype skw}}
{\mnum\u-10187?\u-9138?}
{\mden\u-10187?\u-9137?}}

You need to include the math object's properties group, here {\mfPr…}, including the {\mctrlPr} even if the latter is empty if you want the text to inherit character formatting from the ambient.

Word generally doesn't write surrogate pairs for the math alphabetics, but they work and they're simpler to use since they're used internally for most math variables. Word writes {\mr\mscr0\msty2 a} for the math italic a (U+1D44E) in the numerator of the fraction above and {\mr\mscr0\msty2 b} for the math italic b (U+1D44F) in the denominator, probably because it's easier for human beings to understand, especially since U+1D44E is represented in RTF as the decimal surrogate pair \u-10187?\u-9138?. But the extra translation isn't really that important since RTF is usually only handled by computers. In case you really need to know what the UTF-32 value is, you can convert the RTF pair to hexadecimal form D835 DC4E by pasting -10187 -9138 into the "Decimal code points" box of the Unicode Code Converter, and then convert that to 1D44E. Surrogate pairs must appear inside math object groups as in this example, or inside a math text-run group {\mr…} if not inside a math object. Technically for RTF the latter case shouldn't be necessary, but it happens because Word's RTF reader shares code with the OMML reader and OMML requires the <m:r>.

Math information is collected into two areas:

  • math document properties in the {\mmathPr…} group
  • math zones in {\mmath…} groups

Math zones can be inline or "display mode", corresponding to TeX's $ and $$ toggles. With Office math, math zones are identified internally by a character-format effect bit like bold. If a math zone fills an entire paragraph, it is a display-mode math zone. If it shares a paragraph with nonmath text, the math zone is inline. The math RTF for an inline math zone replaces the first ellipsis of the nested group structure

{\mmath {\*\moMath…}{\mmathPict…}}

Readers that don't understand the ignorable {\*\moMath…} group can use one of the pictures in the {\mmathPict…} group. An RTF display-mode math zone replaces the second ellipsis in the nested group structure

{\mmath{\*\moMathPara{\moMathParaPr…}{\*\moMath…}}{\mmathPict…}}

The {\mmathPict…} group is a great backward compatibility feature, but it sure bloats Word's math RTF files. One way to alleviate the bloat is to zip the RTF file, just as the docx format is zipped.

   

Math Objects

Built-up objects like fractions and integrals can appear inside the {\*\moMath…} group and are defined in the following table:

Control word Meaning
\macc Accent object, consisting of a base and a combining diacritical mark.
\mbar Bar object, consisting of a base argument and an overbar or underbar
\mborderBox Border Box object, consisting of a border drawn around an equation
\mbox Box object, which is used to group components of an equation
\md Delimiter object, consisting of opening and closing delimiters (such as parentheses, braces, brackets, and vertical bars), and an element contained inside
\meqArr Equation-Array object, an object consisting of one or more equations that can be vertically justified as a unit respect to surrounding text on the line. Alignment of multiple points within each equation can occur within the equation array
\mf Fraction object, consisting of a numerator and denominator separated by a fraction bar
\mfunc Function-Apply object used for math functions like sin x
\mgroupChr Group Character object used for stretching a character above or below other characters
\mlimLow Lower limit object
\mlimUpp Upper limit object
\mm Matrix object, consisting of one or more elements laid out in one or more rows and one or more columns
\mnary n-ary object
\mphant Phantom object used to introduce or suppress spacing
\mrad Radical object
\msPre Pre-Sub-Superscript object, which consists of a base e and a subscript and superscript placed to left of base
\msSub Subscript object
\msSubSup Subscript superscript object
\msSup Superscript object

   

Math Object Arguments

Each math object group contains a property group and one or more arguments. The arguments are contained in the special groups defined in the following argument table:

Control word Meaning
\mdeg Degree argument in radical object
\mden Denominator argument in fraction object
\me Base argument of a mathematical object
\mlim Limit argument of a limLow or limUpp object
\mfName Function name argument of the Function-Apply object
\mnum

Numerator argument of fraction object

\msub Subscript argument of n-ary, sPre, sSub, sSupSup objects
\msup Superscript argument of n-ary, sPre, sSup, sSupSup objects

   

Math RTF Control Words

To see as many examples of math RTF as you desire, type the relevant math into a Word 2007 document and save it as RTF. Then you can use NotePad to see what Word has written. You'll find a huge amount of stuff, but the math RTF will be embedded where it needs to be. That's the way I learned how it worked. Okay, I did look a little at the Word source code ☺ A complete alphabetic listing of all RTF math control words will be part of a new version of the RTF specification which will appear sometime soon on the web. If you want to start generalizing your RTF reader or writer right now to handle math RTF, you can get the list from the corresponding OMML tags by prefixing them with "\m".