Representation of Math Accents

The post Math Accents discusses how accent usage in math zones differs from that in ordinary text, notably in the occurrence of multicharacter bases. Even with single character bases, the accents may vary in width while in ordinary text the accent widths are the same for all letters. The present post continues the discussion by describing the large number of accents available for math in Unicode and in Microsoft Office math zones and how they are represented in MathML, RTF, OMML, LaTeX, and UnicodeMath.

Unicode math accents

As noted in Section 3.10 Accent Operators of the UnicodeMath specification, the most common math accents are (along with their TeX names)

These and more accents are described in Section 2.6 Accented Characters and 3.2.7 Combining Marks in Unicode Technical Report #25, Unicode Support For Mathematics. More generally, the Unicode ranges U+0300..U+036F and U+20D0..U+20EF have these and other accents that can be used for math.

The Windows Character Map program shows that the Cambria Math font has all combining marks in the range 0300..036F as well as 20D0..20DF, 20E1, 20E5, 20E6, 20E8..20EA. The range 0300..036F used as math accents in Word looks like

Except for the horizontal overstrikes and the double-character accents shown in red, all these work as math accents in Microsoft Office apps, although many aren’t used in math. In keeping with the Unicode Standard, UnicodeMath represents an accent by its Unicode character, placing the accent immediately after the base character. There’s no need for double-character accents in Microsoft Office math since the corresponding “single” character accents expand to fit their bases as in

In UnicodeMath, this is given by (a+b)~, where ~ can be entered using the TeX control word \tilde. This is simpler than TeX, which uses \widetilde{a+b} for automatically sized tildes rather than \tilde{a+b}.

The combining mark in the range 20D0..20EF that work as accent objects in Office math zones areYou can test accents that don’t have TeX control words by inserting a math zone (type alt+=), type a non-hex letter followed by the Unicode value, alt+x, space. For example, alt+=, z, 36F, alt+x, space gives

Accents in MathML

MathML 1 was released as a W3C recommendation in April 1998 as the first XML language to be recommended by the W3C. At that time, Unicode was just starting to take hold as Microsoft Word 97 and Excel 97 had switched to Unicode. [La]TeX was developed before Unicode 1.0, so it relied on control words. Accordingly, it was common practice in 1998 to use control words or common spacing accents to represent accents instead of the Unicode combining marks even though many accents didn’t have a unified standardized representation. Unicode standardized virtually all math accents by using combining marks. One problem with using the combining marks in file formats is that they, well, combine! So, it may be difficult to see them as separate entities unless you insert a no-break space (U+00A0) or space (U+0020) in front of them. UnicodeMath allows a no-break space to appear between the base and accent since UnicodeMath is used as an input format as well as in files. Only programmers need to look at most file formats (HTML, MathML, OMML, RTF), so a reliable standard is more important for file formats than user-friendly presentation.

MathML 3’s operator dictionary defines most horizontal arrows with the “accent” property. In addition, it defines the following accents

02C6      ˆ              modifier letter circumflex accent

02C7      ˇ              caron

02C9      ˉ              modifier letter macron

02CA     ˊ              modifier letter acute accent

02CB     ˋ              modifier letter grave accent

02CD     ˍ              modifier letter low macron

02D8     ˘              breve

02D9     ˙              dot above

02DA     ˚              ring above

02DC     ˜             small tilde

02DD     ˝             double acute accent

02F7      ˷             modifier letter low tilde

0302        ̂             combining circumflex accent

0311        ̑             combining inverted breve

Presumably the operator dictionary should be extended to include more math combining marks and their equivalents, if they exist, with the spacing diacritics in the range U+02C6..U+02DD.

Here’s the MathML for the math object 𝑎̂.

<mml:mover accent="true">

mm<mml:mi>a</mml:mi>

mm<mml:mo>^</mml:mo>

</mml:mover>

 

Accents in OMML

“Office MathML” OMML is the XML used in Microsoft Office file formats to represent most math. It’s an XML version of the in-memory math object model which differs from MathML. The math accent object 𝑎̂ has the following OMML

<m:acc>
mm<m:accPr>
mmmm<m:chr m:val=" ̂"/>
mmmm<m:ctrlPr/>
mm</m:accPr>
mm<m:e>
mmmm<m:r>
mmmmmm<m:t>𝑎</m:t>
mmmm</m:r>
mm</m:e>
</m:acc>

The Rich Text Format (RTF) represents math zones essentially as OMML written in RTF syntax. Regular RTF uses the \uN notation for Unicode characters not in the current code page. The math accent object 𝑎̂ has the RTF

{\macc{\maccPr{\mctrlPr\i\f0\fs20 }{\mchr \u770? }}{\me\i\u-10187?\u-9138?}}

Unicode RTF is easier to read since characters are written in Unicode

{\macc{\maccPr{\mctrlPr\i\f0\fs20 }{\mchr  ̂}}{\me\i 𝑎}}

But none of these is as simple as the UnicodeMath 𝑎 ̂ ☺.