The present post describes how MathML could be used in generating fine-grained speech. The trick is to reveal where the insertion point (IP) is so that the user knows where the next character input will go.

To see how this works, consider the fraction 1/2π displayed in built-up form as

The coarse-grained speech for this (in English) is “1 over 2 pi”. The fine-grained speech resulting from moving right one character at a time is

“start fraction”

”1”

“end numerator”

“2”

“pi”

“end denominator”

With character navigation, Narrator speaks these strings for the fraction in Word, PowerPoint, and OneNote documents. Hearing this speech, the user knows where the IP is and hence where the next character typed is entered. To enable editing, MathML content needs to offer the same functionality.

The MathML for the fraction is

<mml:math xmlns:mml=”http://www.w3.org/1998/Math/MathML” display=”block”>

<mml:mfrac>

<mml:mn>1</mml:mn>

<mml:mrow>

<mml:mn>2</mml:mn>

<mml:mi>𝜋</mml:mi>

</mml:mrow>

</mml:mfrac>

</mml:math>

This doesn’t name the numerator and denominator explicitly. Instead, the numerator is defined to be the first child of the <mml:mfrac> entity and the denominator is defined to be the second. The MathML can be used in generating the speech “1 over 2 pi” in a natural language, that is, the coarse-grained speech. Fine-grained speech needs MathML that identifies what’s at the insertion point, which can be a character, the start of the fraction, the end of the numerator, or the end of the denominator. The MathML above doesn’t offer such information.

There are at least two ways to produce such per-character-position speech using MathML by including an <maction> entity. For the first way, when the IP moves by character in front of the fraction, the MathML would be

<mml:math xmlns:mml=”http://www.w3.org/1998/Math/MathML”>

<mml:maction actiontype=”input”>start fraction</mml:maction>

</mml:math>

Dropping the <mml:math> entity for brevity, the MathML output for subsequent move-by-character navigation actions would be

<mml:maction actiontype=”input”>1</mml:maction>

<mml:maction actiontype=”input”>end numerator</mml:maction>

<mml:maction actiontype=”input”>2</mml:maction>

<mml:maction actiontype=”input”>pi</mml:maction>

<mml:maction actiontype=”input”>end denominator</mml:maction>

The text in the <maction> entity can be localized into various languages. If this approach becomes popular, it’d be worth standardizing on text strings like “end numerator” to help users as well as localization. The Microsoft Office math-speech engine produces strings with 16-bit speech tokens that index sets of language strings in over 18 languages. But that process occurs internally. For general implementation by ATs, it seems better to use a set of standardized English strings that an AT can associate with other language string sets. A set of such English strings can be obtained by running Narrator over a Word document with equations on an English operating system.

A second way to produce such per-character-position speech using MathML is to generate the MathML for the math object that has the insertion point and include an <maction> revealing where the IP is. For example, if the IP is at the end of the numerator in the fraction above, the MathML would be

<mml:math xmlns:mml=”http://www.w3.org/1998/Math/MathML” display=”block”>

<mml:mfrac>

<mml:mrow>

<mml:mn>1</mml:mn>

<mml:maction actiontype=”insertion point”/>

</mml:mrow>

<mml:mrow>

<mml:mn>2</mml:mn>

<mml:mi>𝜋</mml:mi>

</mml:mrow>

</mml:mfrac>

</mml:math>

Since this MathML has the full context of the insertion point, the AT can create suitable speech. It requires more analysis by the AT than the first <maction> approach, but is more flexible. Such approaches using <maction> are quite general and don’t need specialized methods to decode the math in memory. They could work for all operating systems and applications that support MathML.

(Thanks to Sue-Ann Ma, Neil Soiffer, James Teh, Volker Sorge, Peter Frem and Ziad Khalidi for encouraging me to come up with a way to use MathML for editing).

]]>Coarse-grained speech isn’t tightly synchronized with the characters in memory and cannot be used directly for editing. It’s relatively independent of the memory math model. In contrast, fine-grained speech is tightly synchronized with the characters in memory and is ideal for editing. It depends on the built-up math model (“Presentation Math”), which is the same for all Microsoft math-aware products but may differ from the models of other math products. Coarse grained navigation between siblings for a given math nesting level can be done with Ctrl+→ and Ctrl+← or Braille equivalents, while fine-grained navigation is done with → and ← or equivalents. The latter allows the user to traverse every character in a math zone. Two special cases are 1) when the IP is directly before the math zone being queried by UIA and 2) when the IP is still in the range’s math zone, but at the end. For 1) the user needs to know that typing something won’t be in the math zone. Typing → then puts the IP into the math zone and typing enters characters inside the math zone. And for 2), the user needs to know that the IP is at the end of the math zone and still in the math zone. Case 1) returns “equation” followed by the speech for the math zone. Case 2 returns “end equation”. (Since many math zones aren’t equations, this choice of words might be a little misleading sometimes, but hopefully not too much so).

The languages with math speech support include Danish (da-DK), German (de-DE), English (en-US), Spanish (es-ES), Finnish (fi-FI), French (fr-FR), Italian (it-IT), Japanese (ja-JP), Korean (ko-KR), Norwegian (nb-NO), Dutch (nl-NL), Polish (pl-PL), Brazil Portuguese (pt-BR), Portugal Portuguese (pt-PT), Russian (ru-RU), Swedish (sv-SE), Turkish (tr-TR), PRC Chinese (zh-CN), Taiwan Chinese (zh-TW).

Math speech is produced by “building down to speech”, sharing the code and concepts of building down “Presentation Math” to UnicodeMath. This approach creates math speech just as fast as it creates UnicodeMath and is faster than representing math zones in other math formats like MathML. A string of language tokens is created and then converted to the active natural language.

On a technical level, math speech is implemented in the RichEdit dll (Office’s riched20.dll) by the GetMathSpeechText function, which has the prototype

HRESULT GetMathSpeechText (ITextRange2 *prg, BSTR *pbstr, LONG Flags)

Coarse-grained math speech is returned in *pbstr if the range prg selects more than one character while fine-grained speech is returned if prg references an insertion point or selects only one character. GetMathSpeechText() uses the same subset of ITextRange2 methods used by MathBuildDown() and hence can be used by all Microsoft Office math-aware applications on all major platforms (Windows, iOS, Mac, and Android). Key methods include ITextRange2::GetChar2() to fetch individual characters from memory and ITextRange2::GetInlineObject() to find out what kinds of math objects are in memory.

Math speech is exposed to UI Automation clients via methods of the UIA interface ITextRangeProvider. So, in principle any AT that uses these methods automatically gets math speech for math zones. Nevertheless, it’s desirable for AT’s to know if math zones are involved. One approach is to identify math zones by a new, explicit UIA math-zone object or by a custom object with a localized name like “math zone”. But a more efficient approach that mirrors what’s in memory is to have a math-zone format attribute. Specifically, TextUnit_Format is one of the units supported by ITextRangeProvider::ExpandToEnclosingUnit and ITextRangeProvider::MoveEndpointByUnit. To find out an attribute, such as UIA_IsItalicAttributeId, of a TextUnit_Format instance, a client calls ITextRangeProvider::GetAttributeValue. AT’s could know if a math zone is active if a new attribute ID, UIA_IsMathZoneAttributeId, is added to identify math zones.

]]>The focus here is on math zones, which are text ranges that have math typography, rather than normal typography. Natural language contractions are not used in math zones, so hopefully we can get general globalized math-symbol mappings. When math zones are embedded in UEB, a math-zone start delimiter would be ⠸⠩ and the math-zone end delimiter would be ⠸⠱ in accord with Using the Nemeth Code within UEB contexts. Math zones are key to working with technical documents since math-zone typography and conventions differ from those for normal text. So, a user needs to know when a math zone starts and ends.

The Nemeth specification describes several symbol construction techniques. Some very productive ones are illustrated in the following table, which also includes the section number in the Nemeth specification. The structure codes such as the termination indicator ⠻ are displayed in red

The mappings given in the table below resulted from scouring the Nemeth specification, which is a pdf file of scanned images. As such it offers no search or link capabilities, and a paper version is more useful than the electronic version. It’s the first document I’ve printed in years, other than occasional tickets, boarding passes, and legal authorizations. There’s also a nicely formatted Nemeth specification in French complete with a navigation pane with links to all the rules. You can search the text including finding braille sequences, since the sequences are encoded in Nemeth Ascii braille. This is valuable in learning about sequences in general and whether a potentially new sequence is already defined. The combination ⠠⠱ doesn’t appear as math in the French specification except as part of the extended tilde ⠈⠠⠱, so that seems like a good candidate for encoding the missing reversed tilde ∽ (∼ is given by ⠈⠱). The French version’s content has differences from the original English version, so I checked both in creating the table entries. It would be nice if someone would enter the 1972 Nemeth specification into Word so that a more accessible pdf could be created in English. A partial version with MathSpeak functionality is given in The Nemeth Braille Code for Mathematics and Science. An ASCII braille version (brf) can be downloaded from here.

Unicode 9.0 has a total of 2310 characters that have the math property (see Math property in DerivedCoreProperties.txt). Of these, many of the more advanced symbols don’t have unambiguous Nemeth representations. In particular, Nemeth doesn’t distinguish between slanted and vertical bar overlays, e.g., 219A ↚ and 21F7 ⇷ are both given by ⠳⠈⠫⠪⠒⠒⠻ . Nemeth doesn’t have white arrows like ⇦, and white/black arrow heads like ⭠. Unicode has symbols like the bowtie ⧑ that don’t have apparent Nemeth representations. One possibility for ⋈ is as a shape with a suggestive name like “bt” as in ⠫⠃⠞, but one still needs to encode whether the sides are black or white since Unicode encodes all four possibilities.

Conversely, Nemeth has quite a few symbols not in Unicode. Often these can be constructed with a combination of Unicode symbols or with math layout objects in applications like Microsoft Word. We can submit proposals to add characters given in the Nemeth specification but not yet in Unicode provided the characters occur in journals or books. Nemeth’s extended tilde is one such character to research. Since Nemeth braille is based on a productive syntax, many symbol combinations can be created. Eventually I hope to collect all Unicode math characters that have reasonable Nemeth braille sequences and add their mapping data to the information associated with Unicode Technical Report #25, Unicode Support for Mathematics.

The table below lists some representative Unicode characters used in mathematical text along with their Unicode names and the corresponding Nemeth math braille sequences. The table doesn’t include any mappings of the Unicode math alphanumerics, since they are defined in the post Nemeth Braille Alphanumerics and Unicode Math Alphanumerics. Relational operators (Nemeth calls them “signs and symbols of comparison”) need to be surrounded by spaces. The spaces are not included in the table since the relational operator property is defined in the MathClass data file and software can insert spaces programmatically. It’d be easy to add the math behavior column in MathClass.txt for quick reference. The full mapping table isn’t included since I don’t know how to convince the MSDN blogging facility to use a braille font that displays nondot place holders and braille is hard to read without them.

]]>Furthermore, in text processing, it’s important to think of the IP as in between characters, since a character you type could end up in the preceding text run, in the following text run or in a text run of its own. Consider an IP that immediately follows the word bold in “**bold**text”. Will the next character you type be bold or not? The answer depends on whether the IP has the formatting of the preceding text run, the formatting of the following text run or some other formatting chosen in part using toolbar format commands or hot keys like Ctrl+B.

Quantitatively, we describe the insertion point by its character position (cp). In the following figure, character positions are represented by the lines separating the letters. The corresponding cp values are given beneath the lines. The text run starting at cp 5 and ending at cp 7 contains the two-letter word “is”. The difference between these cp’s is 2, the count of characters in the text run.

The cp 5 also marks the end of the text run starting at cp 0. We can call such a cp an “ambiguous cp”, since it delimits two adjacent text runs. This ambiguity is illustrated for an IP in between “**bold**” and “text” above.

In BiDi text, such as a mixture of Arabic and English text, the directional ambiguity of an IP between text runs of opposite directionality is usually resolved by the directionality of the current keyboard language. This choice is made because the purpose of the IP is to reveal where the next character typed will be entered. This works unambiguously in BiDi text except when the IP follows a digit and the keyboard is right-to-left. This is because consecutive digits are invariably displayed left-to-right, rather than right-to-left (except for N’Ko). If you then type a digit with a right-to-left keyboard, the IP will be displayed to the right of the digit, but if you instead type a right-to-left letter, that letter will be displayed to the left of the digit(s). There’s no way to know what the user will type next, so in this scenario the IP may not be displayed where the next character typed will be inserted. The bottom line is that in BiDi text you can’t intuit desired behavior 100% of the time, partly because people have conflicting needs. The choices were made because they work the clear majority of the time.

If the top of the caret (text cursor) has a tick mark that points left, an RTL (right-to-left) keyboard is active, while a tick mark that points right indicates an active LTR keyboard. Office apps use a tick-mark-free caret for LTR keyboards. If a document doesn’t have any BiDi text, such a caret is unambiguous. If a document does have BiDi content, some users might prefer to see the right-pointing tick mark on the caret when the keyboard is LTR. Office used to have such an LTR caret, but abandoned it back in the last century. Having the RTL caret was considered sufficiently different from a tickless caret to resolve the ambiguities and then there’s less of a hiccup going between pure LTR docs and BiDi documents. I don’t think we’ve gotten negative feedback on this choice. Conceivably, we should have a user option that’s enabled by default in BiDi locales and disabled by default elsewhere. Also, we might want an option to display a shadow caret where a character would be displayed if typed with a keyboard of the opposite directionality.

In the text object model TOM, insertion points consist of degenerate ITextRange or ITextRange2 objects. The insertion point controlled directly by a user editing text is the ITextSelection[2], which is an ITextRange[2] with additional user-interface functionality. If these ranges select one or more characters, they are called nondegenerate. ITextRange methods refer to the cp at the start of a range as Start and to the cp at the end of the range as End. They are retrieved and set by methods like ITextRange::GetStart() and ITextRange::SetStart(). If ITextSelection2::GetCch() returns 0, the user selection is an insertion point. Alternatively the condition End = Start implies an insertion point. If you’re using a degenerate ITextSelection2 or ITextRange2, you can change the character format inheritance by changing the Gravity property. That is typically a bit faster since figuring out which character format properties the IP should use when moved may incur considerable calculation.

The ITextSelection object has its own character formatting properties to allow the user to change the formatting of the insertion point from the formatting deduced from the backing store and selection activity. In RichEdit, all degenerate ITextRange objects have their own character formatting to give clients such flexibility.

RichEdit uses an n/m algorithm to move the insertion point through Latin and Arabic ligatures and to select part way through such ligatures. The rationale for doing this is given in Ligatures, Clusters, Combining Marks and Variation Sequences. It would confuse most users if → moved past a whole ffi ligature in the word “difficult”. So, caret motion cannot be dictated solely by the font.

People need to know where the IP is so that they know where they enter text. This applies to all editing mechanisms including speech and braille displays. As described in Speaking of math…, the insertion point is identified by speaking the character or object at the insertion point. Most braille displays have 8 buttons for inputting 8-dot braille. However, the popular braille systems only use 6 dots. That leaves dots 7 and 8 available for innovative purposes, such as marking the IP.

VoiceOver indicates the position of the text cursor (IP) on braille displays by flashing dot 8 of the braille cell preceding the IP and dot 7 of the braille cell following the IP. This locates the IP in between two braille cells, much as the text cursor on screen displays the IP in between two characters. Voice over also raises dots 7 and 8 to show the position of the VoiceOver cursor, to help you find it within the line of braille.

Finally, I can’t resist mentioning a cool editing feature of the Visual Studio editor, namely the multiline IP. Create this with a mouse by Alt+dragging the IP over multiple lines. I use it all the time to delete and insert text in whole columns. You can also use alt+dragging to select and operate on blocks of text, e.g., to delete or copy them.

]]>A new version of Unicode Technical Note #28, *UnicodeMath, a Nearly Plain-Text Encoding of Mathematics* is now available. It updates several topics and references and uses the name UnicodeMath instead of Unicode linear format. Since there are several math linear formats, such as Nemeth braille, [La]TeX, and AsciiMath, having the name UnicodeMath clarifies the discussion nicely. The text has been polished in other ways too and some errors have been corrected. No notational constructs have been added, so the version number is only incremented to 3.1.

Here’s a UnicodeMath example in case you don’t want to read the whole spec ☺ The formula

sin θ=(e^iθ-e^-iθ)/2i

displays as

Operators and operator precedence are used to delimit arguments. A binary minus has lower precedence than the superscript operator ^ and the fraction operator /, but a unary minus has higher precedence than ^. This approach contrasts with LaTeX and AsciiMath which require that arguments consisting of more than one element be enclosed in {} or (), respectively. In LaTeX, the formula above is given by

\sin\theta=\frac{e^{i\theta}-e^{-i\theta}}{2i}

In AsciiMath, the formula is given by

sin theta=(e^(i theta)-e^(-i theta))/(2i)

In Microsoft Office apps, you can enter Unicode symbols in UnicodeMath using the corresponding [La]TeX controls words such as \theta, using names that you choose, or using symbol galleries.

]]>Let’s start with the Nemeth braille approach used by MathSpeak, though it doesn’t use braille codes. Nemeth braille uses subscript/superscript level shifters. For example,

A sub/sup level stays active until another level is met. The level shifter to go back to the baseline is “baseline”. This is the way the blind mathematician Abraham Nemeth liked to have people speak subscripts and superscripts to him. Back in his day, computer math speech wasn’t available and people read math to him. His sub/sup speech is efficient and unambiguous. He didn’t even say *x*^{2}_{ }as “x squared”, but as “x sup 2”.

This has the advantage that if *x*^{2} is the second component of the vector **x**, it isn’t misidentified as “x squared”. Superscripts don’t always mean powers. For example, the triple scalar product **a****⋅**(**b×c**) of the vectors **a**,** b**,** **and** c** is given by *ε _{ijk} a^{i} b^{j} c^{k}*, where

Furthermore, the level of nested subscripts/superscripts is always clear with level shifters. On the other hand, saying “e to the minus x squared” gives the meaning of that expression without any parsing. A more verbose version of the Nemeth approach is to say “superscript” and “subscript” instead of “sup” and “sub”. Saying the complete words is helpful at first. But as you get familiar with it, the three-letter abbreviations are faster and easier to follow. Too much verbiage gets in the way of comprehending math.

Except for Nemeth himself, the references linked to at the start of this post all say *x*^{2} as “x squared” and *x*^{3} as “x cubed”. Superscripts as indices aren’t common and a little AI could recognize them. ClearSpeak says *x ^{n}* as “x to the nth power”, while I prefer “x to the n”. “nth” requires localization, whereas ‘n’ alone does not. In my lectures on physics over the years, I don’t think I ever added the word “power”. Although grammatically correct, it wastes time, and being grammatically correct isn’t necessarily a goal for math speech. Math speech wants to be efficient and unambiguous, but some degree of abbreviation helps convey the semantics more efficiently. In fact, mathematics owes a significant part of its success to its concise notations. A side benefit of using abbreviated speech is that localization is simplified: you don’t have to worry much about word order differences and declensions.

If you don’t use the sub/sup/base level shifters, how do you handle compound subscripts and superscripts unambiguously? The various math linear formats except for Nemeth braille all handle compound scripts using tree structures such as TeX’s a^{b_2} or UnicodeMath’s a^(b_2) for *a ^{b₂}*. One could speak these characters, but it’s better to speak what they represent since {} and () are used for a variety of syntactic purposes and may be nested. Accordingly, one can say “a to the b sub 2 end sup”. Here UnicodeMath’s ‘(‘ is replaced by “to the” and the ‘)’ is replaced by “end sup”. For

Numeric fractions like ¼ are spoken as “one fourth” and simple fractions like a/b are spoken as “a over b”. A fraction is compound if it contains one or more operators with lower precedence than division, such as (a+b)/c. For compound fractions, the beginning and end of the fraction need to be spoken to differentiate between expressions like a/(b+c) and a/b + c. If you say “a over b plus c” it means a/b + c, since we adopt the usual convention that division has higher precedence than addition. It also helps to pause a bit before saying “plus c”.

In the spirit of announcing the start and end of compound entities, one might want to speak a compound numerator as “numerator…end numerator” and a compound denominator as “denominator…end denominator”. But both ClearSpeak and MathSpeak prefer to speak a compound fraction as

“start fraction <numerator> over <denominator> end fraction”.

This is similar to TeX’s notation “{<numerator>\over<denominator>}” and to the Nemeth braille fraction ⠹ <numerator> ⠌ <denominator> ⠼ . This choice is more efficient when both numerator and denominator are compound. Both approaches allow nesting of fractions. Briefer choices include “frac…over…end frac” and “b frac…over…e frac”.

The last of these is how Abraham Nemeth liked fractions to be spoken. Furthermore, if a fraction contains another fraction, he’d say “b b frac … o over … e e frac” for the outer fraction and “b frac…over…e frac” for the inner fraction. He’d repeat the ‘b’, ‘o’, and ‘e’ as many times as the deepest fraction’s nesting level, like stuttering. MathSpeak has a similar option that uses “start” for ‘b’, “over” for ‘o’ and “end” for ‘e’. Revealing the nesting levels is similar to the way we speak nested parentheses as “open paren”, “open second paren”, “open third paren”, and so forth as in ClearSpeak, but in opposite nesting order. MathSpeak and Nemeth Braille indicate the nesting level of square roots and other roots, but don’t give a way to indicate the nesting level of parentheses.

One ends up with a plethora of choices. Since different folks like different choices, both MathSpeak and ClearSpeak offer several speech options. Some choices can be handled by a verbosity level. But qualitatively different choices might best be handled with settings in a dialog box. Nemeth sub/sup level shifters versus tree speech of compound scripts is an example of the latter. See also Larry’s Speakeasy, which gives English speech for a wide variety of mathematics.

]]>

In addition to being the most readable linear format, UnicodeMath is the most concise. It represents the simple fraction, one half, by the 3 characters “1/2”, whereas typical MathML takes 62 characters (consisting of the <mml:mfrac> entity). This conciseness makes UnicodeMath an attractive format for storing mathematical expressions and equations, as well as for ease of keyboard entry. Another comparison is in the math structures for the Equation Tools tab in the Office ribbon. In Word, the structures are defined in OMML (Office MathML) and built up by Word, while for the other apps, the structures are defined in UnicodeMath and built up by RichEdit. The latter are much faster and the equation data much smaller. A dramatic example is the stacked fraction template (empty numerator over empty denominator). In UnicodeMath, this is given by the single character ‘/’. In OMML, it’s 109 characters! LaTeX is considerably shorter at 9 characters “\frac{}{}”, but is still 9 times longer than UnicodeMath. AsciiMath represents fractions the same way as UnicodeMath, so simple cases are identical. If Greek letters or other characters that require names in AsciiMath are used, UnicodeMath is shorter and more readable.

Another advantage of UnicodeMath over MathML and OMML is that UnicodeMath can be stored anywhere Unicode text is stored. When adding math capabilities to a program, XML formats require redefining the program’s file format and potentially destabilizing backward compatibility, while UnicodeMath does not. If a program is aware of UnicodeMath math zones (see Section 3.20 of UnicodeMath), it can recover the built-up mathematics by passing those zones through the RichEdit UnicodeMath MathBuildUp function. In fact, you can roundtrip RichEdit documents containing math zones through the plain-text editor Notepad: the math zones are preserved!

As its name implies, AsciiMath uses only ASCII characters, although it converts to MathML with access to a much larger character set. AsciiMath is relatively simple to parse and can handle many mathematical constructs. AsciiMath shares some methodology with UnicodeMath, such as eliminating the outer parentheses in fractions like (a+b)/c when converting to built-up format. AsciiMath is designed to work with a MathML renderer, such as MathJax. In Microsoft Office apps, UnicodeMath builds up to the LineServices math internal format, which represented externally by OMML.

By default, the Office math autocorrect facility contains most [La]TeX math symbol control word definitions such as \beta for β. AsciiMath has a subset of such control words but omits the leading backslash. The user can modify such control words in the Office math autocorrect list or add them explicitly, but it’d probably be worth adding an option to make the leading backslash optional. That would speed up keyboard entry of UnicodeMath via math autocorrect. The RichEdit dll includes the UnicodeMath build up/down facility as well as converters for other math formats, such as MathML and OMML. It would be straightforward to add an option to the RichEdit UnicodeMath facility to accept AsciiMath input in general. Such an option would be handy for people that know AsciiMath.

One C++ oriented autocorrect choice in AsciiMath is that typing != enters ≠. Although I program in C++ almost every day, I think /= is a better choice for entering ≠. For one thing, using != for ≠ complicates typing in an equation like n! = n(n-1)(n-2)…1, which is the main reason we didn’t implement it. But in Office apps this equation can also be entered by typing ! = instead of !=, since math spacing rules insert space between ! and = and the RichEdit UnicodeMath facility automatically deletes a user’s space if typed there (see User Spaces in Math Zones). So, that’s an easy work around for entering an n! equation if one wants to support != for ≠. The RichEdit UnicodeMath facility supports most Unicode negated operators by sequences of / followed by the corresponding unnegated operator as described in the post Negated Operators.

<gripe> Meanwhile the C++ language should recognize ≠, ≤, ≥, and ≡ as aliases for !=, <=, >=, and ==. It seems primitive that C++ doesn’t do so in this Unicode age of computing. At least the C++ editing/debugging environments should have an option to display !=, <=, >=, and == as ≠, ≤, ≥, and ≡. </gripe>

Here’s a table with various formats for the integral

Format |
Representation |

UnicodeMath | 1/2𝜋 ∫_0^2𝜋▒ⅆ𝜃/(𝑎+𝑏 sin𝜃 )=1/√(𝑎^2−𝑏^2 ) |

AsciiMath | 1/(2pi) int_0^(2pi) dx/(a+bsin theta)=1/sqrt(a^2-b^2) |

LaTeX | \frac{1}{2\pi}\int_{0}^{2\pi}\frac{d\theta}{a+b\sin {\theta}}=\frac{1}{\sqrt{a^2-b^2}} |

Note that UnicodeMath binds the integrand to the integral, whereas AsciiMath and LaTeX don’t define the limits of the integrand. The Presentation MathML and OMML for this integral are too long to put into this post.

There is a unicode-math conversion package for Unicode enabled XeTeX and LuaLaTeX. The name UnicodeMath seems sufficiently different from unicode-math that there shouldn’t be any confusion between the two. The unicode-math package supports a variety of math fonts including Cambria Math, Minion Math, Latin Modern Math, TeX Gyre Pagella Math, Asana-Math, Neo-Euler, STIX, and XTIS Math. Did you know there are so many math fonts?

Enjoy the new name UnicodeMath. I am and it already appears near the end of my previous blog post, Nemeth Braille Alphanumerics and Unicode Math Alphanumerics. If you’re interested in the origin of UnicodeMath, read the post How I got into technical WP. The forerunner of UnicodeMath originated back in the early microcomputer days and had only 512 characters consisting of upright ASCII, italics, script, Greek and various mathematical symbols used in theoretical physics. Unicode 1.0 didn’t arrive until 10 years later.

]]>

For the most part, the mappings are straightforward as illustrated in the table below. But due to its generative use of type-form and alphabetic indicators, Nemeth braille encodes some math alphabets not in Unicode, e.g., Greek Script and Russian Script. Meanwhile, Unicode has math double-struck and monospace English alphanumerics, which don’t exist in Nemeth braille. Unicode also has six alphabets that aren’t mentioned in the Nemeth specification but that can be defined unambiguously with Nemeth indicators, namely bold Fraktur (Nemeth calls Fraktur “German”), bold Script, and Sans Serif bold and/or italic. The table below includes unambiguous prefixes for these alphabets chosen such that the Nemeth bold indicator precedes the italic or script indicators, and the Sans Serif indicator precedes the bold indicator. These choices correspond to the orders in which the Unicode math alphabets are named. Changes in this ordering result in alternative prefixes that are also unambiguous, but it seems simpler for implementations and users to standardize on the Unicode name ordering.

The Nemeth specification has Script Greek (in §22) as well as “alternative” Greek letters (in §23). Some of the latter may be referred to as “script”. Specifically, the Unicode math Greek italic letters 𝜃𝜙𝜖𝜌𝜋𝜅 have the alternative counterparts 𝜗𝜑𝜀𝜚𝜛𝜘, respectively. The symbol 𝜗 can be called “script theta”. Since Unicode doesn’t have a math script Greek alphabet, it makes sense to map Nemeth math script Greek letters to the alternative Greek letters, if they exist, on input and to use the Nemeth alternative notation on output. In addition, in Unicode the upper-case Θ has the alternative ϴ. In TeX and Office math, the alternative letters are identified by control words with a “var” prefix, as in \varepsilon for 𝜀 as contrasted with \epsilon for ϵ. Interestingly, modern Greek uses 𝜑 and 𝜀 instead of 𝜙 and 𝜖, but math considers the script versions to be the alternatives.

Nemeth braille has several Russian alphabets (see §22 of the Nemeth spec). These alphabets map to characters in the Cyrillic range U+0410..U+044F. Unicode has no math Russian alphabets, but italic and bold Russian alphabets can be emulated using the appropriate Cyrillic characters along with the desired italic and bold formatting. The Unicode Technical Committee, which is responsible for the Unicode Standard, has not received any proposals for adding Russian math alphabets. At least in my experience, technical papers in Russian use English and Greek letters in math zones. In Russian technical documents, this has the nice advantage of easily distinguishing mathematical variables from normal text.

Unicode has four predefined Hebrew characters in the Letterlike Symbols range U+2135..U+2138: ℵ, ℶ, ℷ, ℸ, respectively. In math contexts, it makes sense to map those Hebrew letters in Nemeth braille to the Letterlike Symbols and to map the other Nemeth Hebrew letters to characters in the Unicode Hebrew range U+05D0..U+05EA. The Unicode Technical Committee has not received any proposals for adding more Hebrew math letters so they probably won’t appear in math zones, except, perhaps, as embedded normal text.

The majority of Unicode math digits can be represented by the appropriate type-form indicator sequences in the table above followed by the numeric indicator ⠼ (if necessary) and the corresponding ASCII digits. For example, a math bold 2 (𝟐—U+1D7D0) can be represented by ⠸ ⠼ ⠆ or “_#2”. This works for the bold and/or sans-serif digits, but not for the double-struck and monospace digits, which have no Nemeth counterparts. Meanwhile Nemeth notation supports italic and bold italic digits, which aren’t in Unicode.

Digits in some math contexts don’t need a numeric indicator, e.g., most digits in fractions, subscripts or superscripts. To optimize common numeric subscript expressions like a_{1}, the numeric indicator *and* the subscript indicator are omitted. In Nemeth ASCII braille, a_{1} is “A1” and in Nemeth braille it’s ⠁ ⠂ . The ASCII braille representation is tantalizing since variables like A1, B2, etc., are used to index spreadsheets and it would be more natural if spreadsheet indices were a_{1}, b_{2}, etc., at least for people with a mathematical background.

In general, Unicode’s math characters are simpler to work with since they can be assigned separate character codes instead of being composed as combinations of 64 braille codes. Unicode has about 2310 math characters (see Math property in DerivedCoreProperties.txt) and to distinguish all of those without indicators would require 12-dot braille! Such a system would be really hard to learn. LaTeX describes characters using control words consisting of a backslash followed by combinations of the 64 ASCII letters. That approach has mnemonic value, but it’s not as concise as the Nemeth braille character code sequences. When you get a feel for the Nemeth approach, a character’s Nemeth sequence gives a good idea of what a character is even if you haven’t encountered it before. UnicodeMath and Nemeth braille are intended to be read by human beings, whereas LaTeX and MathML are intended to be read by computer programs, notwithstanding that some TeXies can read LaTeX pretty fluently! Considering that Unicode math alphabets like double-struck and monospace aren’t yet defined in Nemeth braille, it would be worthwhile to choose appropriate type-form indicators for them. Nemeth math alphabets not in Unicode probably don’t have to be considered unless they show up in published documents.

]]>

First note that Nemeth Braille can be displayed in 6-dot ASCII Braille as shown in this table

The dots are numbered 1..6 starting from the upper left, going down to 3 and continuing with 4..6 in the second column. The letters and numbers look like themselves as do the / and (). The braille cells for 1..9 are the same as those for the letters A..I, but shifted down one row. The cells for the letters K..T are the same as those for A..J but with a lower-left dot (dot 3). Letters are lowercase unless prefixed by a cap prefix code (solo dot 6) or pair of cap prefixes for a span of uppercase letters.

A simple table look up converts Nemeth braille codes to 8-dot Unicode Braille in the U+2800 block. The braille cells for 6-dot braille are the first 64 characters of Unicode braille block. With a little practice you can enter braille codes into Word, OneNote, and WordPad by typing 28xx <alt+x>, where xx is the hex code given by the braille dots. To do this, read dots as binary 1’s and missing dots as 0’s, sideways from right to left, top to bottom. So ⠮ is 101110_{2} = 2E_{16} and the corresponding Unicode character is U+282E.

To get a feel for simple Nemeth braille math, consider the expression 12x^{2}+7xy-10y^{2}. In ASCII Braille it displays as

#12x^2″+7xy-10y^2_4

In Nemeth Braille it displays as

In UnicodeMath and TeX, it displays as 12x^2+7xy-10y^2.

It’s tantalizing that the superscript code ⠘ has the ASCII braille code ‘^’ used by UnicodeMath and [La]TeX. But the subscript code is ⠰, which has the ASCII braille code ‘;’ instead of the ‘_’ used by UnicodeMath and TeX. These braille codes also work differently from the UnicodeMath and TeX superscript/subscript operators in that they are script level shifters that must be “cancelled” instead of being ended. So in the formula above, the Nemeth ‘^’ for the first square is cancelled by the ‘”’, while the ‘+’ terminates the superscript for UnicodeMath and a TeX superscript consists of a single character or an expression of the form {…}. The following table compares how the three formats handle some nested superscripts and subscripts

Here to keep the Nemeth braille code sequences simple, I’ve omitted the Nemeth math italic, English-letter prefix pair ⠨ ⠰ before each math variable. Hopefully there’s a way to make math italic the default, as it is in UnicodeMath, MathML, and TeX, but I didn’t find such a mode in the full specification. A space before literary text terminates the current script level shift, that is, it initiates base level. This is also true for a space that indicates the next column in a matrix, but it’s not true for a function-argument separator as illustrated in the table below. Spaces can also be used for equation-array alignment (you need to think in terms of a fixed-width font).

Simple fractions are written in a fashion similar to TeX’s {<numerator>\over <denominator>}. For example,

or in ASCII braille as ?1/2#. The ⠹ and ⠼ work as the curly braces do in TeX fractions as in {1\over 2}. In UnicodeMath, the fraction is given by 1/2. Fractions can be laid out in a two-dimensional format emulating built-up fractions but using Nemeth braille. Nested fractions require additional prefix codes (solo dot 6). For single-line braille devices it seems worthwhile to use the linear display since the fraction delimiters can be nested to any depth. Stacked, slashed, and linear fractions can be encoded and correspond to those structures in UnicodeMath and in TeX.

The Nemeth alphabets are similar to the Unicode math alphanumerics discussed in Sections 2.1 and 2.2 of Unicode Technical Report #25. One difference is that math script and math italic variants exist for English, Greek, Cyrillic, and German (Fraktur) alphabets, whereas in Unicode math script variants are only available for the English alphabet. We may need to generalize Unicode’s coverage in this area, since TeX also has the ability to represent more math alphabets (see, for example, Unicode Math Calligraphic Alphabets).

At some point, I hope to give a listing of correspondences between UnicodeMath and Nemeth Braille. It’s a long topic, so as a start the following table gives some more examples. Note the spaces needed around the equals sign (and other relational operators), but the lack of a space between the ‘*a’* and “sin” in “*a* sin *x”*. The Nemeth notation is ambiguous with respect to using asin for arc sine.

The Unified English Braille code can handle quite general mathematics as well. See the UEB Guidelines for Technical Material. UEB math braille tends to be less compact than Nemeth math braille, but that disadvantage is offset somewhat by having fewer rules to learn. Nemeth math zones can be embedded into UEB documents as discussed in Guidance for Transcription Using the Nemeth Code within UEB Contexts.

One possible way to reduce the large number of rules governing Nemeth braille would be to use an 8-dot standard in which math operators could be encoded with the aid of bottom row dots. This would work with current technology since Braille displays let you read and enter all possible 8-dot Braille codes. In fact, dot 7 is sometimes used to change lower case into upper case, thereby not needing an upper-case prefix code (solo dot 6) for upper-case letters.

Here’s a Braille ASCII table that’s in the original braille order. Compare the 00 line with the lines 10-40 to see how the braille codes are related. Each code is assigned a number. For example, the $ cell has number 36. The numbers index the symbols tables in Appendix B of the full specification. This indexing is very useful for studying how the codes are used to represent mathematical symbols.

]]>

Understand at the outset that two granularities of math speech are needed: coarse-grained, which speaks math expressions fluently in a natural language, and fine-grained, which speaks the content at the insertion point. The coarse-grained granularity is great for scanning through math zones. It doesn’t pretend to be tightly synchronized with the characters in memory and cannot be used directly for editing. It’s relatively independent of the memory math model used in applications.

In contrast, the fine-grained granularity is tightly synchronized with the characters in memory and is ideal for editing. By its very nature, it depends on the built-up memory math model (described below), which is the same for all Microsoft math-aware products, but may differ from the models of other math products. Coarse grained navigation between siblings for a given math nesting level can be done with Ctrl+→ and Ctrl+← or Braille equivalents, while fine-grained navigation is done with → and ← or equivalents. The latter allows the user to traverse every character in the display math tree used for a math zone. The coarse- and fine-grained granularities are discussed further in the post Math Accessibility Trees. In addition to granularity, it’s useful to have levels of verbosity. Especially when new to a system, it’s helpful to have more verbiage describing an equation. But with greater familiarity, one can comprehend an equation more quickly with less verbiage.

To represent mathematics linearly and unambiguously, UnicodeMath may introduce parentheses that are removed in built-up form. Speaking the introduced parentheses can get confusing since it may be hard for the listener to track which parentheses go with which part of the expression. In the simple example above of (a+b)/2, it’s more meaningful to say “start numerator a plus b end numerator over 2” than to speak the parentheses. Or to be less verbose, leave out the “start”. This idea applies to expressions that include square roots, boxed formulas and other “envelopes” that use parentheses to define their arguments unambiguously. For the UnicodeMath square-root √(a^2-b^2), it’s clearer to say “square root of a squared minus b squared, end square root” instead of “square root of open paren a squared minus b squared close paren”. This is particularly true if the square root is nested inside a denominator as in

which has the UnicodeMath 1/(2+√(a^2-b^2)). By saying “end square root” instead of “close paren”, it’s immediately clear where the square root ends. Simple fractions like 2/3 are spoken using ordinals as in “two thirds”. Also when speaking the UnicodeMath text ∑_(n=0)^∞, rather than say “sum from open paren n equal 0 close paren to infinity”, one should say “sum from n equal 0 to infinity”, which is unambiguous without the parentheses since the “from” and “to” act as a pair of open and close delimiters. This and similar enhancements are discussed in the ClearSpeak specification and in Significance of Paralinguistic Cues in the Synthesis of Mathematical Equations. Such clearer start-of-unit, end-of-unit vocabulary mirrors what’s in memory. The parentheses introduced by UnicodeMath are not in memory since the memory version uses special delimiters as explained below. Parentheses inserted by the user are spoken as “open paren” and “close paren” provided they are the outermost parentheses. Nested parentheses are spoken together with their parenthesis nesting level as in “open second paren”, “open third paren”, etc.

Such refinements can be made by processing the UnicodeMath, but some parsing is needed. It’s easier to examine the built-up version of expressions, since that version is already largely parsed. The built-up format is a *display tree* as described in the post Math Accessibility Trees. For example, to know that an exponent in the UnicodeMath equation a^2+b^2=c^2 is, in fact, a 2 and not part of a larger argument, one must check the character following the 2 to make sure that it’s an operator and not part of the exponent. If the letter z follows the 2 as in a^2z, the z is part of the superscript and the expression should be spoken as “a to the power 2z”. In memory one just checks for a single code, here the end-of-object code U+FDEF. If that code follows the 2, the exponent is 2 alone and “squared” is appropriate, unless exponents are indices as in tensor notation.

The built-up memory format represents mathematical objects like fraction, matrix and superscript by a start delimiter, the first argument, an argument separator if the object has more than one argument, the second argument, etc., with the final argument terminated by the object end delimiter. For example, the UnicodeMath fraction a/2 is represented in the built-up format by {_{frac} *a*|2} where {_{frac} is the start delimiter, | is the argument separator, and } is the end delimiter. Similarly a^2 is represented in the built-up format by {_{sup} *a*|2 }. Here the start delimiter is the same character for all math objects and is the Unicode character U+FDD0 in RichEdit (Word uses a different character). The type of math object is given by a rich-text object-type property associated with the start delimiter as described in ITextRange2::GetInlineObject(). The RichEdit argument separator is U+FDEE and the object end delimiter is U+FDEF. These Unicode codes are in the U+FDD0..U+FDEF “noncharacters” block reserved for internal use only.

Another scenario where the built-up format is very useful for speech is in traversing a math zone character by character, allowing editing along the way. Consider the integral

When the insertion point is at the start of the math zone, “math zone” is spoken followed by the speech for the entire math zone. But at any time the user can enter → (or Braille equivalent), which halts the math-zone speech, enters the numerator of the leading fraction, and speaks “1”. Another → and “end of numerator” is spoken. Another → and “2 pi” is spoken. Another → and “end of denominator” is spoken and so forth. In this way, the user knows exactly where the insertion point is and can edit using the usual input methods.

This approach is quite general. Consider matrices. At the start of a matrix, “*n *× *m* matrix” is spoken, where *n* is the number of rows and* m* is the number of columns. Using →, the user moves into the matrix with one character spoken for each → up until the end of the first element. At that end, “end of element 1 1” is spoken, etc. Up and down arrows can be used to move vertically inside a matrix as elsewhere, in all cases with the target character or end of element being spoken so that the user knows which element the insertion point is in.

Math variables are represented by math alphabetics (see Section 2.2 of Unicode Technical Report #25). This allows variables to be distinguished easily from ordinary text. When converted to speech text, such variables are surrounded by spaces when inserted into the speech text. This causes text-to-speech engines to say the individual letters instead of speaking a span of consecutive letters as a word. In contrast, an equation like rate = distance/time, would be spoken as “rate equals distance over time”. Math italic letters are spoken simply as the corresponding ASCII or Greek letters since in math zones math italic is enabled by default. Other math alphabets need extra words to reveal their differences. For example, ℋ is spoken as “script cap h”. Alternatively, the “cap” can be implied by raising the voice pitch.

Some special cues may be needed to convince text-to-speech engines to say math characters correctly. For example, ‘+’ may need to be given as “plus”, since otherwise it might be spoken as “and”. The letter ‘a’ may need to be enclosed in single quotes, since otherwise it may be spoken as the ‘a’ in “zebra” instead of the ‘a’ in “base”.

Another example of how the two speech granularities differ is in how math text tweaking is revealed. First, let’s define some ways to tweak math text. You can insert extra spaces as described in Sec. 3.15 of the UnicodeMath paper. Coarse-grained speech doesn’t mention such space but fine-grained speech does. More special kinds of tweaking are done by inserting phantom objects. Five Boolean flags characterize a phantom object: 1) zero ascent, 2) zero descent, 3) zero width, 4) show, and 5) transparent. Phantom objects insert or remove precise amounts of space. You can read about them in the post on MathML and Ecma Math (OMML) and in Sec. 3.17 of the UnicodeMath paper. The π in the upper limit of the integral above is inside an “h smash” phantom, which sets the π’s width to 0 (smashes the horizontal dimension). Notice how the integrand starts at the start of the π. Coarse-grained speech doesn’t mention this and other phantom objects and only includes their contents if the “show” flag is set. Fine-grained speech includes the start and end entities as well as the contents. This allows a user to edit phantom objects just like the 22 other math objects in the LineServices math model.

The approaches described here produce automated math speech; the content creator doesn’t need to do anything to enable math speech. But it’s desirable to have override capability, since the heuristics used may not apply or the content author may prefer an alternate phrasing.

]]>