Acetate selection is discussed in RichEdit Colors. The principle is that the background color of the selected text is blended with the selection background color and the text is then painted on top with the regular text color. This differs from other selection methods, such as inverting the colors of the selected text, using different selection text and background colors, or enclosing selected text in rectangles. Office applications typically use acetate selection, whereas Windows apps such as Notepad use selection text and background colors. One advantage of acetate selection for RichEdit is that partial ligature selection doesn’t get clipped by the rectangular selection background. This is illustrated in the following image where the *f* of an *fi* ligature is selected

Since the *f’*s text color doesn’t change when selected, the *f’s* underhang and overhang aren’t clipped. In contrast without acetate selection, RichEdit appears to select the whole *fi* ligature and clips the *f’*s underhang since RichEdit doesn’t have the code to render the ligature glyph three times with appropriate text colors. You can try out acetate selection with an Arabic ligature too, e.g., type a lam aleph (لا – gh with an Arabic keyboard) and then shift+→ to select the lam alone. You’ll see the acetate highlight go half way thru the لا. Acetate selection in RichEdit works the same way as in Word.

Without acetate selection as in Notepad, RichEdit can clip overhangs and underhangs. For example, with selection text and background colors and no ligatures, selecting the f alone renders as

Notepad displays this text without clipping. Acetate selection is used by default in RichEdit. To disable acetate selection, send EM_SETEDITSTYLEEX with wparam = lparam = SES_EX_NOACETATESELECTION.

Nevertheless, even with acetate selection, RichEdit will clip in some scenarios. If the character format of adjacent space differs from that of a character with an underhang, the selection can clip it. For example, selecting “*f*” in RichEdit where the *f* is in Times New Roman and the leading space is in Segoe UI looks like

The *f* overhang is also clipped if the character following the *f* is selected but not the *f*. For some scripts, RichEdit automatically formats spaces with the same font as the character that precedes it. But ideally the code should display such scenarios enough times to paint all parts of a glyph with the appropriate background. This problem doesn’t occur in Word or Notepad.

The source of the problem is that RichEdit handles one character-format run at a time, first painting the background and then the text. When the format background changes, the new background gets painted over any overhang from the preceding run. A fix would be to paint all the background colors for a line first and then paint the text on top. Alternatively, one could paint a glyph as many times as necessary to display the various parts of the glyph unclipped (as in Notepad).

In well-formed typography, the baselines between different fonts coincide. This increases the line height when fonts with different ascents and descents appear on the same line. This is particularly true when Latin and Japanese scripts appear together. No one font can cover all of Unicode; Version 10 has 136,755 characters and TrueType fonts are limited to a maximum of 65535 glyphs (16-bit glyph indices). So multiple fonts must be used in general. Notepad, too, uses multiple fonts to display such text. As such virtually any editor has to have some degree of rich-text capability as discussed more in RichEdit Plain-Text Controls.

To illustrate how the line height increases when an East Asian font is used on the same line as a Latin font, we select text with a single font and then with a combination. The selection background color reveals the resulting line height

A way to avoid baseline alignment is to send the message EM_SETEDITSTYLEEX with wparam = lparam = SES_EX_DONTALIGNBASELINE (0x00000800). This flag isn’t currently defined in MSDN, but should be. For the example above, this choice produces

which fits in the same vertical space as the Latin text alone.

Some Southeast Asian scripts can have glyph size or shaping that results in a large ascent and/or descent. This can make it hard to mix with other scripts and maintain the line height. Accordingly, font designers may leave little or no vertical padding for glyph clusters that push the limits. In single-line controls or when paragraph line-spacing-exactly is active, this may result in clipping. For example, consider the Thai character SARA AI MAIMUAN ใ (U+0E43). Enlarging it and showing the top and bottom glyph boundaries compared to the Latin letter A, we see that there’s no room for roundoff on the top

RichEdit clips the top of this letter unless there’s some paragraph space-before or line-spacing exactly with enough extra vertical space. While space before/after can eliminate such clipping for a single line, inside a multiline paragraph, one has to use a large enough line-spacing-exactly value. One could argue that the font designers made a mistake by eliminating all vertical padding for these large glyphs, but text renderers now need to deal with it. This font no-padding “feature” differs from the high fonts used in mathematics and in fancy fonts, which can have ascents and descents appreciably larger than those for standard glyphs.

While RichEdit provides a way to disable baseline alignment and to specify a larger line height to avoid Southeast Asian large glyph clipping, it doesn’t handle all kinds of selection clipping. It would be desirable to fix the remaining selection clipping scenarios and to offer a mode where the line height is automatically increased to avoid clipping of large Southeast Asian glyphs in fonts that provide inadequate vertical padding.

]]>

The EQ field was added in an early version of Word and still works, although the user is responsible for math spacing and inserting the desired math symbols. Any Unicode characters can be used including the complete Unicode math set of 2310 characters. To enter an equation, click on Insert/Quick Parts/Field… and select Eq from the list. Click on Field Codes and the Options… button to see the possible math constructs, such as \F(,) for fraction. With alt+x’ing in the Unicode math symbols, adding spaces around the operators and using the Cambria Math font, one can insert the solution to the quadratic equation as

The EQ field for this is {EQ 𝑥 = \F(−𝑏 ± \R(,𝑏\S(2) − 4𝑎𝑐),2𝑎)} formatted with Cambria Math and Unicode math italic alphabetics for a, b, c, and x. You can toggle the EQ field between the math display and the field codes by right-clicking on the field and choosing “Toggle Field Codes”. Note that the math axis doesn't line up between the equal sign and the fraction bar. The display isn’t as typographically pretty as that of the Office math facility (type alt+= \quadratic<space><space>)

and it’s appreciably harder to use. Also, the superscript size for the square must be formatted explicitly. If you change the size of the formula, you have to reformat the superscript with an appropriate smaller size. The Office math facility automatically chooses superscript and subscript sizes appropriate for the base text. The EQ field doesn't support *N*-ary expressions other than simple integrals, summations and products. Word represents ruby objects using EQ fields, although it uses LineServices in rendering them. Some documentation on EQ fields is given here.

MathType and its limited edition, the Equation Editor, made math entry easier and considerably more general than the EQ field by adding intuitive tool bars giving access to mathematical symbols and function templates. All common math constructs are supported, and the typography includes niceties such as automatic math spacing around operators and trig functions. You can download the current version of MathType and use it for free for 30 days. After that it goes into a “lite” version unless you pay $99. The lite version has advantages over the Equation Editor not the least of which is that it’s currently maintained.

Both MathType and the Equation Editor display the solution to the quadratic equation as This is easy to enter using hot keys or the templates on the template tool bar. You can also enter it by hand writing using the Math Input Panel which communicates with client apps via MathML.

On Windows and the Mac, MathType and the Equation Editor use embedded OLE objects for displaying and editing math text. As such they can be used by any programs that support OLE such as WordPad. For WordPad, click on Insert Object and choose Microsoft Equation 3.0. But OLE is only available on Win32 Windows and the Mac. Also, a program’s search and formatting commands don’t work with embedded OLE objects, while they do work with native Office math zones.

MathType incorporates the Unicode 2.0 math characters and adds many others in the Unicode Private Use Area (see the MTCode Encoding Tables). 364 characters have the math property in Unicode 2.0. This is considerably less than the 2310 math characters in the current Unicode Standard. Most of the characters in the MTCode Encoding Tables are in the current Unicode Standard, but there might be backward compatibility issues if MathType were to upgrade to that version of the standard. There are many math characters in the Unicode Standard that are not in MathType. I haven’t been able to enter such characters into MathType or the Equation Editor. Since these characters are not common, it’s not a major limitation. Chapter 6 of Creating Research and Scientific Documents with Microsoft Word describes both the native Office math facility discussed in many of this blog’s posts as well as MathType.

Now let’s consider how various math entities are represented in these math facilities and the file formats they support. The formats are Presentation MathML, OMML (Office MathML), RTF, [La]TeX, UnicodeMath, braille, and MathType. Math entities such as fractions and integrals have arguments. The number of arguments and the order in which they occur may differ between the representations. The MathType equation format is a binary format and is documented in detail in the MathType SDK. The discussion here is limited to what the templates reveal when one runs MathType and/or the Equation Editor.

All math formats have a numerator and a denominator for fractions and they occur in that order. The EQ field and math braille only have stacked fractions (numerator above denominator), but the other formats also have slashed, linear, and small fractions.

Math functions, such as trigonometric and log, have an explicit two-argument entity in OMML and RTF, but are handled by character formatting alone in the other formats. Having an object with two explicit arguments reveals the content better in the user interface and facilitates typographic refinements, such as precise spacing.

OMML, RTF and MathType have three arguments for *N*-ary entities such as integral and summation, namely the upper and lower limits and the integrand or summand, which can be called *N*-aryands. [La]TeX and braille only have the two limits. Presentation MathML doesn’t have an explicit *N*-ary entity; instead it overloads other entities (msub, msup, msubsup, mover, munder, munderover) which don't have an *N*-aryand. This is a shortcoming of Presentation MathML. The EQ field has integrals, summations and products with all three arguments. Limits can be placed as subscript/superscript or under/over. UnicodeMath defines the *N*-aryand to be the first operand following the *N*-ary operator and its limits (if present). MathType puts the *N*-aryand before the limits, while the EQ field, OMML and RTF put it afterward since it follows the limits visually. Having it afterward makes navigating with arrow keys more natural. MathType traverses integrals and summations with the Tab key in visual order, but the left/right arrow keys bypass the limit arguments.

Subscript and superscript bases are not arguments in MathType, [La]TeX, braille or the EQ field. They are arguments in MathML, OMML and RTF. In UnicodeMath, the base is the variable, function name, number, or object immediately preceding the sub/sup operators. Knowing the base allows proper kerning of the base onto the script (superscript or subscript) as well as providing more exact semantics in interoperating with mathematical calculation engines. While the base can be determined algorithmically as in MathType, [La]TeX and UnicodeMath, having an explicit argument for it automatically gives it helpful shading in Office math zones and lends precision to the fine-grained math speech needed for unambiguous editing of math using speech.

The EQ field has the \B() option which puts appropriately sized parentheses around its argument. It doesn’t allow for other kinds of delimiters, such as [] and {}. Also, it doesn’t have multiple argument expressions that size the argument dividers appropriately, such as in Dirac notation. The other formats have many delimiter options and can handle multiple arguments correctly.

]]>

Several letters have variant glyphs and both the regular and variant characters can occur in the same document with different meanings. In my papers on laser theory, I’ve used θ for the polar angle in spherical coordinates and ϑ for a complex mode coupling coefficient. The following table shows the Greek letters that have variants

Letter.. | Unicode.. | TeX.. | Variant.. | Unicode.. | TeX |

ϵ | 03F5 | \epsilon | ε | 03B5 | \varepsilon |

θ | 03B8 | \theta | ϑ | 03D1 | \vartheta |

Θ | 0398 | \Theta | ϴ | 03F4 | \varTheta |

π | 03C0 | \pi | ϖ | 03D6 | \varpi |

ρ | 03C1 | \rho | ϱ | 03F1 | \varrho |

σ | 03C3 | \sigma | ς | 03C2 | \varsigma |

ϕ | 03D5 | \phi | φ | 03C6 | \varphi |

ε and φ are standard letters in the modern Greek alphabet, but they are variants in math text as shown in the table. This is the choice made by TeX back in the early 1980s and the math community depends on it. Up to Version 3.0, the Unicode Standard displayed ϕ for 03C6, but in 3.0 it was changed to conform with modern Greek usage as documented in Section 2.3.1 Representative Glyphs for Greek Phi of Unicode Technical Report #25 Unicode Support for Mathematics. Unicode 3.1 added ϵ (03F5).

In The TeXbook Appendix F, the upper-case upsilon \Upsilon appears with curvy arms as in the Cambria Math Υ for 03A5. But in the STIX math 2 font, 03A5 appears as an upper-case Latin Y, which isn’t useful for math. Recognizing that fact, Unicode 1.1 encoded 03D2 for the curvy arm glyph ϒ (which unfortunately looks like a Y here; the WordPress blog infrastructure doesn't allow font changes). It has the odd name GREEK UPSILON WITH HOOK SYMBOL. One should not take Unicode names too literally except as being unique identifiers. Until recently, Cambria Math had no glyph at 03D2, but now it’s defined and preferred for math. Admittedly upper-case and lower-case upsilons aren’t commonly used in math, but I anyhow used υ extensively in laser theory to represent complex frequencies such as γ + iω. Here γ is a decay rate and ω is a (real) frequency.

TeX only defines upper-case Greek letters that have glyphs distinct from the Latin alphabet, that is, ΓΔΘΛΞΠΣΥΦΨΩ with the notation \Gamma for Γ. Unicode and Office apps support all upper-case Greek letters using the TeX convention that the first letter is upper case as in \Alpha for Α (upper-case alpha) and. For math, Greek letters that look identical to Latin letters aren’t useful, since they’d be interpreted as Latin letters. Lower-case Greek letters are italicized by default, while upper-case Greek letters are upright by default. This convention also applies to ∂ (\partial—italicized) and ∇ (\nabla—not italicized).

[La]TeX has control words to change the math style of math letters, such as \mathbf{…}, which changes the letters … to bold face. Ironically that construct doesn’t affect Greek letters, probably an early omission, but now unchangeable for archival reasons. Unicode chose to encode the various math alphabets explicitly instead of using variation selectors. Section 2.2 of Unicode Technical Report #25 describes the mathematical alphabets in the Unicode Standard, which are contained mostly in the plane-1 range U+1D400..U+1D7FF along with some characters in the Letterlike Symbols (U+2100..U+214F). Here is Table 2.1 listing the various math styles

Math Style |
Characters from Basic Set.. |
Location |

plain (upright, serifed) | Latin, Greek and digits | BMP |

bold | Latin, Greek and digits | Plane 1 |

italic | Latin and Greek | Plane 1* |

bold italic | Latin and Greek | Plane 1 |

script (calligraphic) | Latin | Plane 1* |

bold script (calligraphic).. | Latin | Plane 1 |

Fraktur | Latin | Plane 1* |

bold Fraktur | Latin | Plane 1 |

double-struck | Latin and digits | Plane 1* |

sans-serif | Latin and digits | Plane 1 |

sans-serif bold | Latin, Greek and digits | Plane 1 |

sans-serif italic | Latin | Plane 1 |

sans-serif bold italic | Latin and Greek | Plane 1 |

monospace | Latin and digits | Plane 1 |

Here the * notes that some characters of the corresponding alphabet are located in the Unicode Letterlike Symbols block.

There’s no sans serif upright Greek alphabet, but LaTeX can display this math style. LaTeX can also display two script variations as described in the post Unicode Math Calligraphic Alphabets, while Unicode only has one. Accordingly, more Unicode math alphabets may be added in the future.

Nemeth braille can represent all the math Greek alphabets and variants as described in the post Nemeth Braille Alphanumerics and Unicode Math Alphanumerics.

]]>

Option |
Value |
s/g |
Meaning |

tomUnicodeBiDi | 0x00000001 | s | Use Unicode BiDi algorithm for inserted text |

tomAdjustCRLF | 0x00000001 | g | If range start is inside multicode unit like CRLF, surrogate pair, etc., move to start of unit |

tomUseCRLF | 0x00000002 | g | Paragraph ends use CRLF (U+000D U+000A) |

tomTextize | 0x00000004 | g | Embedded objects export alt text; else U+FFFC |

tomAllowFinalEOP | 0x00000008 | g | If range includes final EOP, export it; else don’t |

tomUnlink | 0x00000008 | s | Disables link attributes if present |

tomUnhide | 0x00000010 | s | Disables hidden attribute if present |

tomFoldMathAlpha | 0x00000010 | g | Replace math alphanumerics with ASCII/Greek |

tomIncludeNumbering | 0x00000040 | g | Lists include bullets/numbering |

tomCheckTextLimit | 0x00000020 | s | Only insert up to text limit |

tomDontSelectText | 0x00000040 | s | After insertion, call Collapse(tomEnd) |

tomTranslateTableCell | 0x00000080 | g | Export spaces for table delimiters |

tomNoMathZoneBrackets | 0x00000100 | g | Used with tomConvertUnicodeMath and tomConvertTeX. Set discards math zone brackets |

tomLanguageTag | 0x00001000 | s/gg | Sets BCP-47 language tag for range; gets tag |

tomConvertRTF | 0x00002000 | s/g | Set or get RTF |

tomGetTextForSpell | 0x00008000 | g | Export spaces for hidden/math text, table delims |

tomConvertMathML | 0x00010000 | s/g | Set or get MathML |

tomGetUtf16 | 0x00020000 | g | Causes tomConvertRTF, etc. to get UTF-16. SetText2 accepts 8-bit or 16-bit RTF |

tomConvertLinearFormat | 0x00040000 | s/g | Alias for tomConvertUnicodeMath |

tomConvertUnicodeMath | 0x00040000 | s/g | UnicodeMath |

tomConvertOMML | 0x00080000 | s/g | Office MathML |

tomConvertMask | 0x00F00000 | s/g | Mask for mutually exclusive modes |

tomConvertRuby | 0x00100000 | s | See Inserting and Getting Math Text… |

tomConvertTeX | 0x00200000 | s/g | See LaTeX Math in Office |

tomConvertMathSpeech | 0x00300000 | g | Math speech (English only here) |

tomConvertSpeechTokens | 0x00400000 | g | Simple Unicode and speech tokens |

tomConvertNemeth | 0x00500000 | s/g | Nemeth math braille in U+2800 block |

tomConvertNemethAscii | 0x00600000 | g | Corresponding ASCII braille |

tomConvertNemethNoItalic | 0x00700000 | g | Nemeth braille in U+2800 block w/o math italic |

tomConvertNemethDefinition | 0x00800000 | g | Fine-grained speech in braille |

tomConvertCRtoLF | 0x01000000 | g | Plain-text paragraphs end with LF, not CRLF |

tomLaTeXDelim | 0x02000000 | g | Use LaTeX math-zone delimiters \(...\) inline, \[...\] display; else $...$, $$...$$. Set handles all |

Nonzero values within the mask defined by tomConvertMask (0x00F00000) are mutually exclusive, that is, they cannot be combined (OR’d) with one another. These options include setting text as UnicodeMath, [La]TeX (tomConvertTeX), and Nemeth math braille (tomConvertNemeth). You can set only one at a time. But other options can be OR’d in if desired.

A string bstr of Nemeth math braille coded in the Unicode range U+2800..U+283F can be inserted and built up by calling ITextRange2::SetText2(tomConvertNemeth, bstr). If the string is valid, you can get it back in any of the math formats including Nemeth math braille. For example, if you insert the string

⠹⠂⠌⠆⠨⠏⠼⠮⠰⠴⠘⠆⠨⠏⠐⠹⠨⠈⠈⠙⠨⠹⠌⠁⠬⠃⠀⠎⠊⠝⠀⠨⠹⠼⠀⠨⠅⠀⠹⠂⠌⠜⠁⠘⠆⠐⠤⠃⠘⠆⠐⠻⠼

you see

You can also input braille with a standard keyboard by typing a control word \braille assigned to the Unicode character U+24B7 (Ⓑ). (See LaTeX Math in Office for how to add commands to math autocorrect). The \braille command causes math input to accept braille input via a regular keyboard using the braille ASCII codes sometimes referred to as North American Braille Computer Codes. The character ~ (U+007E) disables this input mode. These braille codes are described in the post Nemeth Braille—the first math linear format and can be input using refreshable braille displays. Alternatively, such input can be automated by calling ITextSelection::TypeText(bstr). Just as in entering UnicodeMath, the equations build up on screen as soon as the math braille input becomes unambiguous. The implementation includes the math braille UI that cues the user where the insertion point is for unambiguous editing of math zones using braille. Note that as of this posting, the math braille facility isn’t hooked up to Narrator or other screen readers.

The tomConvertMathSpeech currently only gets math speech in English. Microsoft Office apps like Word, PowerPoint and OneNote deliver math speech in over 18 languages to the assistive technology (AT) program Narrator via the UIA ITextRangeProvider::GetText() function. Other ATs could also get math speech this way. Dictating (setting) math speech would be nice for both blind and sighted folks. Imagine, you can say *a*² + *b*² = *c*² faster than you can type it or write it! The SetText2(tomConvertMathSpeech, bstr) is ready to handle such input, but it’s not implemented yet anyhow.

]]>

Control words for symbols work in either input mode, for example, \alpha inserts *α* in both modes. So, there’s no need to change the input mode for symbol control words. Similarly, Unicode symbols like ∬ work in both input modes. The build-up engine supports Unicode LaTeX since the Office math facility was based on Unicode from the start. Note that UnicodeMath is defined in terms of Unicode symbols, not ASCII-letter control words, but the latter are supported by the input engine for ease of entry on standard keyboards. On-screen keyboards may offer more direct ways of entering Unicode symbols.

If a math zone begins with a $, the input must be TeX or LaTeX, since $ has no special significance in UnicodeMath and Office apps use the math-zone character format effect to define math zones. But the user might not start with a $, so it’s worth handling other ways that distinguish the formats. The LaTeX math-zone start delimiters \[ and \( have useful meanings in UnicodeMath, namely to treat the [ and ( literally instead of treating them as autosizing build-up delimiters.

Some structure control words such as \frac and \binom are only defined in LaTeX and others like \matrix and \pmatrix are defined in both modes. The user pain enters when typing something like \frac{a}{b} in UnicodeMath mode. The {…} get built up as curly braced expressions and the \frac remains unchanged. No fraction results and the user may wonder what went wrong.

When the user types LaTeX-only structure control words like \frac or \binom in UnicodeMath input mode, it’s clear that LaTeX is intended and the user can be asked whether the input mode should switch to LaTeX. Similarly, structure control words valid in both input modes become unambiguous when the user types the argument start delimiter. For LaTeX the start delimiter is {, while for UnicodeMath it’s (. So, \matrix( must be UnicodeMath, while \matrix{ must be LaTeX. Note that LaTeX by design supports the original TeX control-word sequences like \matrix{…} as well as the LaTeX environments like \begin{matrix}…\end{matrix}. In UnicodeMath autobuildup mode, no build up occurs when the user types \matrix{, so it’s possible at that point to switch to LaTeX input without need for retyping.

Both input modes have \begin and \end, but in LaTeX these are environment control words followed by {, whereas in UnicodeMath they represent generic start/end delimiters for which curly braces would be superfluous. So as soon as the user types { following \begin or \end, a cue recommending a switch to LaTeX input mode can be displayed.

Math functions are also treated differently in LaTeX and in UnicodeMath. To enter the sine function in LaTeX, one types \sin, whereas in UnicodeMath, one just types sin. So, if a math function name is entered preceded by \, a cue recommending a switch to LaTeX input mode can be displayed. The Office math display engine needs to know the argument of a math function as well as the function name in order to insert the correct math spacing. LaTeX doesn’t have a formal way of defining the argument, although enclosing it in curly braces is a good idea. UnicodeMath has precise ways of defining the argument. This is also true for integrands of integrals and n-aryands of n-ary operators in general.

The input a^2+b^2=c^2 represents the same equation in either input mode, but a^10+b^10=c^10 represents *a*¹⁰ + *b*¹⁰ = *c*¹⁰ in UnicodeMath and *a*¹0 + *b*¹0 = *c*¹0 in LaTeX. It doesn’t seem possible to distinguish the user intent for such cases, but it’d be worth asking the user who types a^{ or a_{ whether to switch to LaTeX, since superscript and subscripts enclosed in curly braces aren’t common in mathematical expressions. Expressions involving exp{…} do occur, but it’s better typography to use exp{…} instead of raising *e* to a braced power.

Font control words like \mathbf{ are distinctly LaTeX. The TeX binomial-coefficient construct {n\choose k} doesn’t make sense in UnicodeMath (one would type n\choose k without the curly braces). But {n\atop might be used in UnicodeMath since {n\atop k} would build up as n over k (without a fraction bar) enclosed in {}. Admittedly this construct is unlikely since binomial coefficients appear in parentheses, not in curly braces.

We see that there are quite a few [La]TeX constructs that don’t make sense in UnicodeMath and can be used to query the user about switching from UnicodeMath input mode to LaTeX input mode. In addition, such LaTeX-oriented control sequences could be handled directly in UnicodeMath mode. The math build-up engine in Microsoft Office uses the same operator and string stacks for both modes, so it’s fairly straightforward to treat constructs like \frac{…}{…}, \matrix{…}, \begin{matrix}…\end{matrix} directly in UnicodeMath mode. This might make math input more user friendly for people familiar with LaTeX. And it might facilitate migrating to the speedier, more mathematical UnicodeMath input mode. But it does compromise using the build-up engine as a UnicodeMath validator. To that end, if the build-up engine is modified to handle these LaTeX control sequences in UnicodeMath mode, it might be worth having a “strict” mode that would fail input with invalid UnicodeMath. In any event, build-down results are all in one format or the other, not in a mixture of the two.

]]>Formula autobuildup is disabled for LaTeX in Word since it needs some more work. Word adds two new hot keys 1) to build up (ctrl+=) a math zone and 2) to build down (shift+ctrl+=). The hot key alt+= inserts a math zone (or removes one).

Currently, it’s trickier to enable LaTeX in OneNote and PowerPoint since you need to define a new math autocorrect control word to change the input mode. Type alt+= to enter a math zone and on the math ribbon click on the lower right corner of the Tools section. This brings up the Equation Options dialog. Click on Math Autocorrect and in the Replace text box type \TeX and in the With text box type 24C9 alt+x to enter Ⓣ. From then on you can type \TeX<space> in a math zone to switch the input format from UnicodeMath to LaTeX in OneNote and PowerPoint. If you want, you can define \LF as 24C1 alt+x to enter Ⓛ. This control word switches back to UnicodeMath, which we used to call the Linear Format. Note that formula autobuildup is not disabled by default in PowerPoint and OneNote, so you can see how you like it.

Note that the math autocorrect control word \integral enters the mode-locking equation

by building up the UnicodeMath text

1/2π ∫_0^2π▒ⅆθ/(a+b sin θ)=1/√(a^2-b^2)

This won’t build up correctly if LaTeX input is enabled. For PowerPoint and OneNote, you can define your own math-autocorrect control words using LaTeX notation which will build up correctly when LaTeX is enabled. In LaTeX the mode-locking equation is given by

\frac{1}{2\pi}\int_{0}^{2\pi}\frac{d\theta}{a+b\sin{\theta}}=\frac{1}{\sqrt{a^2-b^2}}.

It might be tempting to auto-switch to UnicodeMath build-up if a control-word inserts text with Unicode characters above the ASCII range. But people will probably switch to XeTeX or some other Unicode enabled TeX dialect for which this simple heuristic wouldn’t apply. In fact, Unicode TeX builds up correctly in Office math apps. For example, you can build up

\frac1{2π}∫_0^{2π}\frac{dθ}{a+b\sin{θ}}=\frac1{√{a^2-b^2}}

which is easier to read than the pure ASCII version above. You can insert many Unicode characters from the galleries in the math ribbon.

The LaTeX option supports all TeX control words appearing in Appendix B of the UnicodeMath spec. That includes many math operators, Greek letters, and various other symbols. The verbose LaTeX notations like \begin{equation} and \begin{matrix} aren’t supported, but the more concise TeX notations are supported, such as \matrix{…} and \pmatrix{…}. Fractions can be entered in the LaTeX form \frac{…}{…} or in the TeX form {…\over…}. \displaymath is implied if the math zone fills the hard/soft paragraph and currently it can’t be turned on in inline math zones. Unicode math alphanumerics can be entered using control words like \mathbf{}. As for UnicodeMath, you can toggle math bold and math italic using the bold and italic buttons on the Home ribbon. You can enter script, open-face (double struck), and fraktur letters with control words like \scriptX, \doubleX, and \frakturX, respectively. For example, \scriptS enters as does \mathcal{S}. The article Linear format equations using UnicodeMath and LaTeX in Word contains a list of many supported control words.

More enhancements are likely to be offered in the future. For example, it would be nice to have a Unicode LaTeX build-down option since it’s much easier to read than pure ASCII LaTeX. Also, it’d be nice to offer a formula autobuildup option in Word since it’s easier to see what you’re typing when you don’t have to wade through myriad control words. But the current facility is a big step forward for folks who know LaTeX well.

]]>

First here’s how these features are revealed to sighted users. In all math-enabled Office apps, the innermost math argument containing the IP is lightly shaded and selected text has the same selection background color as text not in math zones. In PowerPoint and OneNote, the math object containing the IP is shaded a bit more lightly than the argument and if the IP isn’t in a math object, the whole math zone has this lighter shading. In Word, the math zone is enclosed in a boundary and the object containing the IP doesn’t have the lighter shading. The user always knows what kind of an argument is involved just by looking at the built-up (Professional) display. This information is also conveyed in math fine-grained speech.

A refreshable braille display typically has a row of 40 or 80 8-dot cells with the dots represented by small rounded pins that are raised by solenoids. The dots are arranged in two columns of four dots. The left column is numbered starting at the top 1, 2, 3, 7 and the right column is numbered starting from the top 4, 5, 6, 8. Like most braille codes, the Nemeth math code uses the dots 1 through 6. This leaves dots 7 and 8 for UI purposes, although dot 7 is occasionally used to indicate upper case. The Nemeth code precedes a letter with the capitalization indicator “⠠” (lone dot 6) to get upper-case letters, e.g., “⠠⠁” for “A” since “⠁” is the braille code for the letter a. So, we don’t use dot 7 to indicate upper case, at least in math zones.

The regular math braille display shows the whole math zone in braille, limited only by the number of display cells. This gives a lot of context to math braille, significantly more than math speech provides, but not as much as screen or paper.

Typically, selected text appears with both dots 7 and 8. So if “a” is selected, it appears as “⣁”. This approach seems well suited to math expressions as well.

We’re left with needing ways to identify a math zone and the insertion point and to highlight the innermost argument containing the IP if any. Braille displays don’t have multiple shading levels, only two extra dots! They also have hot keys.

The IP needs a cell by itself to stand out. As described in the post Text Insertion Point, the IP is *in between* two characters in rich text, although for plain text one can get away with thinking of the IP as being on top of the character that actually follows the IP. Built-up (Professional) math text is rich text notably because it has special display constructs, such as stacked fractions, multilevel subscripts and superscripts, integrals, matrices, etc. For this purpose (and perhaps others), dots 7-8 “⣀” comprise a simple, effective IP. Admittedly this is the same as a lone selected space, but it seems to be readily distinguishable since the user usually knows when something is selected versus having an IP and s/he can easily move the IP (or hit the IP-identification hot key coming up) to check if in doubt.

To reveal the innermost argument containing the IP, one can turn on dot 8 for the characters in that argument. This is similar to the argument shading used in regular displays. To illustrate this approach, consider the fraction 1/2π, which in built-up form is given by the Nemeth braille string “⠹⠂⠌⠆⠨⠏⠼”. If the IP precedes the 2 in the denominator, the braille display would have “⠹⠂⠌⣀⢆⢨⢏⠼”.

At first the dot 8 in the denominator cells here might be confusing, but it resolves ambiguities as to whether the IP “⣀” is inside or outside of a math object. This isn’t a serious problem with fractions since the fraction start, fraction bar, and fraction end appear as the explicit braille codes ⠹,⠌,⠼, respectively, although it’s always helpful to know when the IP is in a math argument. But consider the quantity *a*², which is given in Nemeth braille by “⠁⠃⠆”. In Office apps and MathML, superscripts are represented by two arguments, the base and the superscript. If the IP precedes the base, is the IP at the start of the base or at the start of the superscript object? That position is ambiguous without the dot 8 option. With dot 8, you can tell the difference: in “⣀⠁⠃⠆” the IP precedes the superscript object, while in “⣀⢁⠃⠆” the IP is inside the base in front of the “a”. Distinguishing these positions is essential for unambiguous editing of mathematical text.

Dot-8 highlighting reveals when the IP is at the start or end of an argument or somewhere in between. But it doesn’t define what kind of argument. To get this kind of information on a braille display, it’s handy to have an IP-identification hot key that flashes the name of the argument containing the IP (or “math zone” if the IP isn’t inside an argument) onto the braille display. This name needs to be localized to the current user language, while the regular braille for the math zone is globalized by nature. For example in English, depending on where the IP is in a denominator, the hot key displays “start of denominator” (⠎⠞⠁⠗⠞⠀⠕⠋⠀⠙⠑⠝⠕⠍⠊⠝⠁⠞⠕⠗), “end of denominator” or just “denominator”. This is more informative than the corresponding math speech, which only announces the kind of argument when the IP is at the end of an argument, or the kind of math object when the IP is at the start of an object. This difference occurs because fine-grained speech needs to say the character at the IP, whereas the math braille display continuously shows the characters around the IP, limited only by the number of display cells.

It might be worth having options to enable/disable dot-8 highlighting according to user preference. Even without the dot-8 highlighting, the user can resolve ambiguities by hitting the IP-identification hot key so some users might prefer to work with the simpler braille display.

Lastly, how do you reveal a math zone? If the IP is inside a math-object argument, the presence of dot 8 is a good indicator. As described in the post Braille for Math Zones, math zones start with “⠸⠩” and end with “⠸⠱”. So, the start and end of a math zone are not ambiguous in math braille. In the Microsoft Office math representation, whether the IP at the start of a math zone is inside the math zone or outside is revealed by shading or the Word math-zone border, since the character position is the same for both cases. Ditto for the end of a math zone. I tried setting dot 8 for all cells in a math zone when the IP is in a math zone, but not inside an argument, but it seems too messy. So hopefully the math zone start and end delimiters will suffice; the user can hit the IP-identification hot key to find out whether the IP is in a math zone.

With these uses of dots 7 and 8 and the IP-identification hot key, you can edit virtually all levels of mathematics using a refreshable braille display in an interoperable way with sighted users. Pretty cool, eh?!

]]>We compare STIX 2 Math (STIX2Math.otf) and Cambria Math in an equation and in an expression with nested parentheses. To make the equation accessible, here’s the Nemeth braille and UnicodeMath for it with math-italic variables mapped to upright for simplicity

⠹⠂⠌⠆⠨⠏⠼ |
⠮⠰⠴⠘⠆⠨⠏⠐⠹⠙⠨⠹⠌⠁⠬⠃⠀⠎⠊⠝⠀⠨⠹⠼ | ⠀⠨⠅⠀ | ⠹⠂⠌⠜⠁⠘⠆⠐⠤⠃⠘⠆⠐⠻⠼ |

1/2π | ∫_0^2π dθ/(a+b sin θ) | = |
1/√(a^2-b^2 ) |

The equation occurs, for example, in mode-locking phenomena in lasers, clocks, tuning forks, and other sustained oscillators. Hence the name “mode-locking equation”. More on that at the end. Here’s the equation in STIX Math 2.0

And here it is in Cambria Math

Many symbols look very similar, but note that the STIX *θ* has wider stems on top and bottom while Cambria Math has wider stems on left and right. The images are snipped from a Word document.

Similarly, here’s the nested parenthesized expression ((((((((a+b)))))))) in STIX Math 2.0

And in Cambria Math

It’s interesting to see how much larger the parentheses become in Cambria Math. The parameters guiding the layout of both are given in their respective OpenType math tables.

Re mode locking, Huygens is probably best known for his wave theory of light, but he was also the one who first put the pendulum into a clock. One day he observed that two pendulum clocks which ticked at slightly different rates when apart, ticked at the same rate when hung close together on the same wall. This mode locking phenomenon can be described, in part, by the equation above (see p. 52 of *Laser Physics*, Sargent, Scully and Lamb, Addison-Wesley, 1974)

This latest version of UTR #25 includes a discussion of glyph variants for the digit 0 and for the empty-set symbol ∅. Specifically, it notes in Section 2.7 that sometimes, specific glyph forms are chosen by notational style or are needed for contrast with other notation in the same document. For example, the symbol U+2205 ∅ EMPTY SET can be found in its slashed zero-shaped glyph form in documents typeset in TeX that use the command \emptyset, or in contexts where it is contrasted with the semantically distinct slashed digit zero.

For this and certain other well-established glyph variants of mathematical symbols, standardized variation sequences were added to the Unicode Standard. Thus, for example, the standardized variation sequence <U+2205, U+FE00> represents the oval variant of the empty-set symbol. To avoid the misuse of that sequence for the glyph variant of the digit zero with a short diagonal stroke “0”, the standardized variation sequence <U+0030, U+FE00> represents the digit zero glyph variant.

The report also uses the name UnicodeMath to refer to the math “linear format” in Unicode Technical Note #28. Since there are several math linear formats, it’s helpful to use a more precise name for that linear format.

]]>The most obvious simplification of relegating math to math zones is that it nearly eliminates the need for the Nemeth braille numeric indicator ⠼ in math zones since contractions aren’t used in math zones. In fact, the only need for the numeric indicator in a math zone is in the very rare case that the math style changes inside a number as in **12**34. The code ⠼ is also used to end fractions, but that use never was ambiguous since contractions aren’t allowed in fractions and Nemeth digit codes represent digits inside fractions.

One common scenario for which ⠼ can be omitted is when a digit follows a space in a math zone. Relational operators like = and → are always surrounded by spaces in the Nemeth standard. For example, the limit

is officially written in Nemeth braille as

⠐⠇⠊⠍⠩⠭⠀⠫⠒⠒⠕⠀⠼⠴⠻⠀⠹⠎⠊⠝⠀⠭⠌⠭⠼⠀⠨⠅⠀⠼⠂

Here, the Nemeth standard specifies that since the 0 (⠴) and 1 (⠂) follow a space, they should be preceded by a ⠼. But in a math zone no ambiguity exists if the ⠼ is omitted, resulting in more efficient input and simplified braille display.

The numeric indicator must precede all numbers for UEB math braille in math zones and elsewhere, since UEB uses the same codes (⠁⠃⠉⠙⠑⠋⠛⠓⠊⠚) for 1-9, 0 as for the letters a through j, respectively. The Nemeth digits use those codes shifted down a row (⠂⠆⠒⠲⠢⠖⠶⠦⠔⠴) which don’t overlap letters or other symbols. The Nemeth digit codes appear as parts of some mathematical symbols, such as ⠫⠲ for □. This choice is mnemonic since a square has 4 (⠲) sides. No ambiguity there either. Some digit codes are used in common punctuation, such as 4 (⠲) in ⠸⠲ for period (as distinguished from decimal point). This use isn't ambiguous since the ⠲ is preceded by the punctuation indicator⠸.

In conclusion, it seems that the most efficient braille for math zones is Nemeth math braille minus the use of the numeric indicator except in the rare switching of numeric styles as in *ab**cd*. Such math braille is international except where natural language words are inserted such as “if”. In contrast, the text surrounding math zones is in a specific natural language. UEB is the obvious choice for English text. When Nemeth math zones are embedded in UEB, they start with ⠸⠩ and end with ⠸⠱ in accord with Using the Nemeth Code within UEB contexts. Other languages may be able to embed Nemeth math zones with these codes too. Math braille isn’t simple, but in modern documents it’s simpler than you may have thought ☺.

The present post describes how MathML could be used in generating fine-grained speech. The trick is to reveal where the insertion point (IP) is so that the user knows where the next character input will go.

To see how this works, consider the fraction 1/2π displayed in built-up form as

The coarse-grained speech for this (in English) is “1 over 2 pi”. The fine-grained speech resulting from moving right one character at a time is

“start fraction”

”1”

“end numerator”

“2”

“pi”

“end denominator”

With character navigation, Narrator speaks these strings for the fraction in Word, PowerPoint, and OneNote documents. Hearing this speech, the user knows where the IP is and hence where the next character typed is entered. To enable editing, MathML content needs to offer the same functionality.

The MathML for the fraction is

<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block">

<mml:mfrac>

<mml:mn>1</mml:mn>

<mml:mrow>

<mml:mn>2</mml:mn>

<mml:mi>𝜋</mml:mi>

</mml:mrow>

</mml:mfrac>

</mml:math>

This doesn’t name the numerator and denominator explicitly. Instead, the numerator is defined to be the first child of the <mml:mfrac> entity and the denominator is defined to be the second. The MathML can be used in generating the speech “1 over 2 pi” in a natural language, that is, the coarse-grained speech. Fine-grained speech needs MathML that identifies what’s at the insertion point, which can be a character, the start of the fraction, the end of the numerator, or the end of the denominator. The MathML above doesn’t offer such information.

There are at least two ways to produce such per-character-position speech using MathML by including an <maction> entity. For the first way, when the IP moves by character in front of the fraction, the MathML would be

<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">

<mml:maction actiontype="input">start fraction</mml:maction>

</mml:math>

Dropping the <mml:math> entity for brevity, the MathML output for subsequent move-by-character navigation actions would be

<mml:maction actiontype="input">1</mml:maction>

<mml:maction actiontype="input">end numerator</mml:maction>

<mml:maction actiontype="input">2</mml:maction>

<mml:maction actiontype="input">pi</mml:maction>

<mml:maction actiontype="input">end denominator</mml:maction>

The text in the <maction> entity can be localized into various languages. If this approach becomes popular, it’d be worth standardizing on text strings like “end numerator” to help users as well as localization. The Microsoft Office math-speech engine produces strings with 16-bit speech tokens that index sets of language strings in over 18 languages. But that process occurs internally. For general implementation by ATs, it seems better to use a set of standardized English strings that an AT can associate with other language string sets. A set of such English strings can be obtained by running Narrator over a Word document with equations on an English operating system.

A second way to produce such per-character-position speech using MathML is to generate the MathML for the math object that has the insertion point and include an <maction> revealing where the IP is. For example, if the IP is at the end of the numerator in the fraction above, the MathML would be

<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block">

<mml:mfrac>

<mml:mrow>

<mml:mn>1</mml:mn>

<mml:maction actiontype="insertion point"/>

</mml:mrow>

<mml:mrow>

<mml:mn>2</mml:mn>

<mml:mi>𝜋</mml:mi>

</mml:mrow>

</mml:mfrac>

</mml:math>

Since this MathML has the full context of the insertion point, the AT can create suitable speech. It requires more analysis by the AT than the first <maction> approach, but is more flexible. Such approaches using <maction> are quite general and don’t need specialized methods to decode the math in memory. They could work for all operating systems and applications that support MathML.

(Thanks to Sue-Ann Ma, Neil Soiffer, James Teh, Volker Sorge, Peter Frem and Ziad Khalidi for encouraging me to come up with a way to use MathML for editing).

]]>