A fairly common inquiry is how a program can use and access the many special glyph variants of a math font. It’s clearly a much more intricate interaction than encountered in most text applications. This post outlines how the Office math layout software interacts with the Cambria Math font and, in principle, with any other math font that has similar capabilities. More specifically, this post describes the functionality of the special library, mathfont.dll, which is shipped with Office 2007/2010. This library, in turn, interacts with the OpenType and OpenType-like tables in a math font.
Cambria Math and the math tables were developed together with the Office 2007 math software, each influencing the other to obtain high quality results. Some history is given in the post High-Quality Editing and Display of Mathematical Text in Office 2007. The font contains extensive math tables, glyph variants and glyphs for much of the Unicode math character set. It was designed with ClearType and excellent screen readability in mind and enables the best screen-resolution display of math text available today.
The specialized math tables include values that control glyph placements in math zones. Many math constants are defined to handle displacements such as axis height, fraction rule thickness, etc. The math tables are formalized as OpenType tables, although they are not yet part of the OpenType standard. Refinements include entries for positioning subscripts/superscripts horizontally using cut-ins and italic corrections. The cut-in tables allow automatic positioning of subscripts and superscripts horizontally better than un-tweaked TeX. Math characters have four cut-in values, one for each corner, allowing sub/superscripts to be kerned with their bases. Other table entries give larger glyph variants for operators like the integral sign, square root, and stretchy characters such as brackets and arrows.
The math tables are organized as a hierarchy accessed via the OpenType ID “MATH”. The names of the tables in the hierarchy are MathConstants, MathGlyphInfo, MathItalicsCorrectionInfo, MathTopAccentAttachment, ExtendedShapeCoverage, MathKernInfo, MathKern, MathVariants, MathGlyphConstruction, and GlyphAssembly. The MathConstants table includes parameters such as the em-size-dependent sub/superscript values
LONG lSubscriptShiftDown;
LONG lSubscriptTopMax;
LONG lSubscriptBottomDropMin;
LONG lSuperscriptShiftUp;
LONG lSuperscriptShiftUpCramped;
LONG lSuperscriptBottomMin;
LONG lSuperscriptTopRiseMin;
LONG lSubSuperscriptMinGap;
LONG lSuperscriptBottomMaxWithSubscript;
LONG lSpaceAfterScript;
Cambria Math contains full sets of glyph variants that have heavier weights so that when scaled down to the script and scriptscript levels the stem widths match those of the text-level glyphs. The prime (U+2032) and multiple prime characters need to be superscripted and scaled down accordingly. The dotless i and j glyph variants are used in the bases of accent objects. Accents over larger bases are given by special flattened and/or widened glyph variants.
Brackets, braces, parentheses and other stretchy characters have a number of larger glyph variants as well as arbitrarily large size created using glyph assemblies. When the assemblies are displayed, the pieces are clipped to prevent overlap, since overlaps create ClearType artifacts.
One choice not handled by the math font tables is that for the italic open-face characters 0x2145 – 0x2149 (differential D, d, and e, i, j). According to a document setting, software can display these characters as themselves (useful for patent applications) or with the corresponding math italic or corresponding ASCII letters. Serif italic glyphs are used for these in most math publications, but serif upright glyphs are used in some European math publications and math calculation engines. The use of the differential d (U+2146) automatically introduces a small space between it and the preceding character if that character is alphabetic.
An OpenType table or feature is identified by a 32-bit constant equal to the contents of a four-byte little-endian string. For example, the “MATH” table is identified by the string 0x4854414D. In C/C++, you can use the macro
#define MakeTag(a, b, c, d) (((d)<<24) | ((c)<<16) | ((b)<<8) | a)
#define tagMATH MakeTag(‘M’,‘A’,‘T’,‘H’)
to create such IDs if you don’t want to type the ASCII values of the letters directly. Note that these IDs are case sensitive. In particular, “MATH” identifies the overall math table hierarchy, and “math” identifies the math script, which is used for math glyph-variant features such as subscripts, superscripts, and dotless i’s.
mathfont.dll functions
The following table describes the functions exported by the mathfont.dll. All functions return an HRESULT. Some entries in the table refer to the “current font metrics”. These metrics depend on the font height (point size), the script level (0 for text size, 1, for script size and 2 for scriptscript size or higher level nestings), and the device mode (reference or presentation).
mathfont.dll function |
Purpose |
OpenType table used |
GetMathConstants |
Get pointer to math constants |
MATH |
GetMathGlyphItalicsCorrection |
Get italic correction for a glyph at current font metrics |
MATH |
GetMathGlyphTopAccentAttachment |
Get top accent attachment displacement for a glyph at current font metrics |
MATH |
GetMathGlyphIsExtendedShape |
In [left]sub/sup math objects, determine whether adjacent base glyph is extended, i.e., stretched vertically |
MATH |
GetMathGlyphKerning |
Get kerning for a given corner and height of a glyph at current font metrics |
MATH |
GetMathGlyphVariant |
Get possibly stretched glyph variant or set of glyphs for a glyph of desired size at current font metrics |
MATH |
GetMathGlyphVariantItalicsCorrection |
Get italic correction for a vertically stretched glyph (or set of glyphs) at current font metrics |
MATH |
GetMathGlyphScriptShape |
Get glyph variant for script or scriptscript size (use “ssty” feature for “math” script and “dflt” language) |
GDEF, GSUB |
GetMathGlyphDotlessForm |
Get dotless glyph variant (for i or j like glyphs) (use “dtls” feature for “math” script and “dflt” language) |
GDEF, GSUB |
GetMathGlyphAccentFlattenedShape |
Get flattened accent glyph variant if base height exceeds x height ) (use “flac” feature for “math” script and “dflt” language) |
GDEF, GSUB |
GetMathFontTextMetrics |
If font is a math font, get math font text ascent, descent, and linegap at current font metrics |
OS/2 |
Right to Left Math Zone Considerations
Right-to-left math requires mirroring the images of parentheses, integrals, square roots, arrows, etc. Many such mirror images can be obtained by using corresponding Unicode characters. For example the mirror image of a left parenthesis is a right parenthesis and vice versa. Such glyph variants are automatically returned by the Uniscribe function ShapeString() if SCRIPT_ANALYSIS::fRtl = TRUE. But Unicode doesn’t have many characters that are mirror images of other characters, such as integral signs and square roots. Furthermore it seems that using glyph variants for these characters makes more sense than adding characters to serve as the mirror images. Other approaches include using world transforms and mirrored bitmaps. But these approaches don’t solve the problem that the right-to-left character desired sometimes isn’t a perfect mirror image, e.g., the contour integral.
In principle (and in a prototype I’m working on), the glyph variant approach works by following the ShapeString() call with a call to Uniscribe’s ScriptSubstituteSingleGlyph() specifying tagScript as “math”, tagLangSys as “dflt”, and tagFeature as “rtlm”. Here “math” identifies the script as math, “dflt” specifies the default language, and “rtlm” requests right-to-left mirroring. If no such special mirrored glyph exists, the call does nothing. In particular, if the appropriate mirrored glyph is given by a Unicode character, the call does nothing, so the ShapeString() call can be followed by the ScriptSubstituteSingleGlyph() call and never result in “double mirroring”.
If you want a complete specification of the math tables, please email me. Hopefully someday the specification will be available as part of the official OpenType standard. The mathfont.dll code was written by Sergey Malkin.
Math in Office should be promoted.
That brought back (strange) memories of using Equation Editor in my youth.
Just when you thought Unicode couldn’t get more complicated This is actually a very fascinating post. Like rsanow, this brings back memories of Equation Editor from my university days. I thought it was very cool then and seeing some of the internals makes it seem even cooler. Thanks for the great post.
I am designing a small equation editor using the Win32 API. So, I'm really interested in the mathfont.dll functions . Would it possible to have additional documentation regarding these functions ?