Microsoft Office Math Speech

Microsoft Office math-aware applications can now speak math in over 18 different languages! Try it out with native math zones in Word by enabling Narrator (type CapsLock + Enter) and navigate a math zone as described in the post Speaking of math… There are two math-speech granularities: coarse-grained (navigate by words), which speaks math expressions fluently in a natural language, and fine-grained (navigate by characters), which explains the content at the insertion point (IP) in sufficient detail to enable editing. I can turn off the computer screen and use a keyboard to edit complicated equations accurately by listening to the math speech. Math speech works for all math zones and doesn’t need extra editing by the document author(s). As of this post, Office math speech has been shipping for over a month on Windows, and Word’s math speech, in particular, has already gotten a lot of use. Note that this math facility is built into Office applications (type Alt+= to insert a math zone) and differs from MathType, which can also be used with Office applications.

Coarse-grained speech isn’t tightly synchronized with the characters in memory and cannot be used directly for editing. It’s relatively independent of the memory math model. In contrast, fine-grained speech is tightly synchronized with the characters in memory and is ideal for editing. It depends on the built-up math model (“Presentation Math”), which is the same for all Microsoft math-aware products but may differ from the models of other math products. Coarse grained navigation between siblings for a given math nesting level can be done with Ctrl+→ and Ctrl+← or Braille equivalents, while fine-grained navigation is done with → and ← or equivalents. The latter allows the user to traverse every character in a math zone. Two special cases are 1) when the IP is directly before the math zone being queried by UIA and 2) when the IP is still in the range’s math zone, but at the end. For 1) the user needs to know that typing something won’t be in the math zone. Typing → then puts the IP into the math zone and typing enters characters inside the math zone. And for 2), the user needs to know that the IP is at the end of the math zone and still in the math zone. Case 1) returns “equation” followed by the speech for the math zone. Case 2 returns “end equation”. (Since many math zones aren’t equations, this choice of words might be a little misleading sometimes, but hopefully not too much so).

The languages with math speech support include Danish (da-DK), German (de-DE), English (en-US), Spanish (es-ES), Finnish (fi-FI), French (fr-FR), Italian (it-IT), Japanese (ja-JP), Korean (ko-KR), Norwegian (nb-NO), Dutch (nl-NL), Polish (pl-PL), Brazil Portuguese (pt-BR), Portugal Portuguese (pt-PT), Russian (ru-RU), Swedish (sv-SE), Turkish (tr-TR), PRC Chinese (zh-CN), Taiwan Chinese (zh-TW).

Producing Math Speech

Math speech is produced by “building down to speech”, sharing the code and concepts of building down “Presentation Math” to UnicodeMath. This approach creates math speech just as fast as it creates UnicodeMath and is faster than representing math zones in other math formats like MathML. A string of language tokens is created and then converted to the active natural language.

On a technical level, math speech is implemented in the RichEdit dll (Office’s riched20.dll) by the GetMathSpeechText function, which has the prototype

HRESULT GetMathSpeechText (ITextRange2 *prg, BSTR *pbstr, LONG Flags)

Coarse-grained math speech is returned in *pbstr if the range prg selects more than one character while fine-grained speech is returned if prg references an insertion point or selects only one character. GetMathSpeechText() uses the same subset of ITextRange2 methods used by MathBuildDown() and hence can be used by all Microsoft Office math-aware applications on all major platforms (Windows, iOS, Mac, and Android). Key methods include ITextRange2::GetChar2() to fetch individual characters from memory and ITextRange2::GetInlineObject() to find out what kinds of math objects are in memory.

Exposing Math Speech to Assistive Technologies

Math speech is exposed to UI Automation clients via methods of the UIA interface ITextRangeProvider. So, in principle any AT that uses these methods automatically gets math speech for math zones. Nevertheless, it’s desirable for AT’s to know if math zones are involved. One approach is to identify math zones by a new, explicit UIA math-zone object or by a custom object with a localized name like “math zone”. But a more efficient approach that mirrors what’s in memory is to have a math-zone format attribute. Specifically, TextUnit_Format is one of the units supported by ITextRangeProvider::ExpandToEnclosingUnit and ITextRangeProvider::MoveEndpointByUnit. To find out an attribute, such as UIA_IsItalicAttributeId, of a TextUnit_Format instance, a client calls  ITextRangeProvider::GetAttributeValue. AT’s could know if a math zone is active if a new attribute ID, UIA_IsMathZoneAttributeId, is added to identify math zones. In fact, there is a new UIA math annotation property AnnotationType_Mathematics, which can be retrieved by calling ITextRangeProvider::GetAttributeValue().