An AT like NVDA that handles all math accessibility (speech and braille of various verbosities and options) would use addition 1. With MathML containing the <maction> entity for UI, NVDA could generate math speech and braille enabling both speaking and editing of math. An AT like Narrator that doesn’t understand MathML would use addition 2, getting math speech with one call and math braille with another. This post describes these ways for improving UIA math support.
For the first two additions, it’s easy for an AT to recognize most math formats in a text string returned by UIA, especially since the AT chooses which format to return and can be looking for it. MathML math zones are XML strings that start with <mml:math> and end with </mml:math>. A LaTeX inline math zone starts with “$” or “\(“ and ends with “$” or “\)”. A LaTeX display math zone starts with “$$” or “\[“ and ends with “$$” or “\]”. A UnicodeMath math zone starts with “⁅” (U+2045) and ends with “⁆” (U+2046). Math braille is given by characters in the Unicode U+2800..U+28FF braille block. Non-math text uses other Unicode characters since braille engines can braille natural languages.
Math speech supplied by Office apps usually doesn’t have start and end math speech text. It might be worthwhile to have a math speech format type that includes the delimiters <mathspeech> and </mathspeech>. These delimiters wouldn’t be spoken but could be cues to speak the text as is and afterward to call for math braille if brailling is active. If a <mathspeech> XML element is added, it’d be worthwhile to support the Speech Synthesis Markup Language (SSML) more generally so that math character styles could be spoken with a different pitch, for example. Another possibility is to add <mathspeech> to SSML.
The sections below define methods that provide this UIA math functionality. Note that unless Narrator wants to take advantage of Office Nemeth math-braille capabilities, Windows doesn’t need to do anything other than document the new methods and include them in UIAutomationCore.h. Math speech already works well with Narrator, although it doesn’t offer math verbosity options (which differ from natural-language verbosity options).
Typically, UIA doesn’t have UIA state properties. The properties it exposes are properties of the source content. But to define which math format ITextRangeProvider::GetText() should use by default, UIA needs to set a document property that specifies the math format. Accordingly, we define ITextProvider3 as follows
MIDL_INTERFACE("242A2469-3CAB-403E-9DA6-FAF1327C7FC6") ITextProvider3 : public ITextProvider2 { public: virtual HRESULT STDMETHODCALLTYPE get_Property( /* [in] */ int Type, /* [retval][out] */ int *pValue) = 0; virtual HRESULT STDMETHODCALLTYPE set_Property( /* [in] */ int Type, /* [in] */ int Value) = 0; };
Type specifies the property type. For now, there’s only the default math format: TextProperty_MathFormat (1). Its values are given by
enum MathFormatType { MathFormatType_Default = 0, // Same as GetText MathFormatType_MathML = 1, // Math zones in MathML MathFormatType_Nemeth = 2, // Math zones in Nemeth braille (U+2800 block) MathFormatType_LaTeX = 3, // Math zones in LaTeX MathFormatType_UnicodeMath = 4, // Math zones in UnicodeMath MathFormatType_Speech = 5, // Math zones with speech };
Another property type could be TextProperty_MathVerbosity. To get an ITextProvider3 interface, the client calls
ITextProvider::QueryInterface(__uuidof(ITextProvider3), (LPVOID *)ppTextProvider3)
If this call fails, the program doesn’t have math-format support.
To enable getting more than one math format, e.g., speech and braille, we define the range-level interface
MIDL_INTERFACE("724258C8-8A0D-407D-9622-E5E75D307513") ITextRangeProvider3 : public ITextRangeProvider2 { public: virtual HRESULT STDMETHODCALLTYPE GetText2( /* [in] */ int maxLength, /* [in] */ int Flags, /* [retval][out] */ __RPC__deref_out_opt BSTR *pRetVal) = 0; };
The arguments maxLength and pRetVal are the same as for ITextRangeProvider::GetText(maxLength, pRetVal). The low four bits of Flags are given by the MathFormatType enum above. The AT calls
ITextRangeProvider::QueryInterface(__uuidof(ITextRangeProvider3), (LPVOID *)ppTextRangeProvider3)
If this call fails, the program doesn’t have range-level math-format support.
There are two general kinds of math-zone navigation: 1) from one math zone to another, and 2) within a math zone. The latter can be accomplished with existing functionality, typically by following the program selection changes or by moving by UIA TextUnit_Character and TextUnit_Word.
To navigate up to a math zone or skip onto the next math zone, UIA needs to have a math-zone unit. UIA annotation and attribute values are distinct from the defined UIA TextUnit’s, since attributes are in the 40000 range and annotations are in the 60000 range, while the TextUnit’s are < 10. If we enable the ITextRangeProvider ExpandToEnclosingUnit() and Move(), etc., methods to treat AnnotationType_Mathematics as another unit, then an AT could move by math zones, expand to a math zone, etc. If the AT calls for moving by TextUnit_Format, a math zone would be a format break point. If the QueryInterface for an ITextProvider3 succeeds, then a client could expect that navigation by AnnotationType_Mathematics would work. If a call returns an HRESULT error, navigation and selection by AnnotationType_Mathematics isn’t supported.
People have thought about an alternative approach that uses a UIA math control like a UIA hyperlink or table control. This approach is discussed in Math Accessibility Trees. While it makes sense theoretically, math zones can be numerous and are often very small such as the variable 𝑥, which is only a single character. So, it’s simpler and more efficient to treat math zones as text with a math-zone attribute. AnnotationType_Mathematics seems to fit the bill and it has been defined for several years.
Note that the Text Object Model (TOM) uses a math-zone attribute (tomMathZone—see tom.h) as a unit for navigation and selection. Version 2 of TOM has advanced support for math text processing some of which is described in the post Setting and Getting Math Speech, Braille, UnicodeMath, LaTeX… For UIA purposes, the relatively simple approaches given here seem to suffice. _{}
]]>
A succinct summary of entering and editing math with a keyboard is given in this blog’s first post, Formula Autobuildup in Word 2007. Basically, type the hot key Alt+= to insert a math zone and then type math using TeX control words for symbols. For example, in UnicodeMath mode, typing a/b=c insertsThe UnicodeMath syntax is like that used in programming languages. Naturally there’s much more to math than symbols and fractions, and the keyboard input methods are described in UnicodeMath for the Unicode input method and in LaTeX/TeX input method for the LaTeX/TeX input method.
In UnicodeMath mode, build up to the “Professional” format is automatic as described in When Formula Autobuildup Occurs. In Word’s LaTeX mode, you must request build up. Enter Ctrl+= to build up a math zone into “Professional” format and Shift+Ctrl+= to build the math zone down into the current linear format (UnicodeMath or LaTeX). Or you can click on the corresponding options of the math-zone acetate rectangle.
In addition to the LaTeX/TeX control words, there are operator shortcuts described in Math Keyboard Shortcuts, Negated Operators, Keyboard Operator Shortcuts, Entering Unicode Characters, and Klinke’s Streamlined Math Input Notation. For example, /= autocorrects to ≠ and <= to ≤. Subscripts and superscripts are entered using _ and ^, respectively as discussed in Section 2.2 of UnicodeMath and in Keyboard Entry of Subscripts and Superscripts. Nice things to add include making the leading backslash optional and having an autocomplete drop-down menu of possible control words once you’ve entered the first few characters. For example, many control words start with \left and it would be nice to be able to select the desired one after typing \le rather than having to type in the whole word like \leftrightarrow for .
In LaTeX mode, the subscript, superscript, numerator, and other math arguments are single entities. An entity can be a character or control word for a character like \alpha for α, or it can be an expression in curly braces like {a+b}. In UnicodeMath mode, the argument can be a sequence of alphanumeric characters. You can see such a difference by comparing what a^12 becomes: in LaTeX you get 𝑎¹2 and in UnicodeMath you get 𝑎¹². To get the latter in LaTeX input mode, enter a^{12}.
Unicode has many math characters (see Section 2 of Unicode Technical Report #25, Unicode Support for Mathematics). The post Math Symbol Hierarchy divides the math operator symbols into basic, intermediate, and full Unicode math categories. Most technical papers use the symbols in the basic and intermediate categories. The remaining characters are very specialized, e.g., ⪑, so you’ll probably never need them.
Built-up math zones convert alphabetic characters to math alphabetic characters, e.g., ‘a’ becomes ‘𝑎’, which is given by the Unicode character U+1D44E. Conversion to math alphabetic is overruled for special situations like trigonometric function names and can be overruled for arbitrary text. Also it doesn’t occur for Greek upper-case letters as noted in Math Greek Letters. Math spacing is important and User Spaces in Math Zones explains how UnicodeMath build up may remove a space that’s automatically inserted by math spacing rules. In LaTeX mode, spaces are ignored except to terminate control words.
You can navigate through a math zone Using Left/Right Arrow Keys in Mathematical Text or you can use a mouse. Math Selection is similar to selection of ordinary text, but if you select a math object start/end/separator delimiter, the whole object is selected. Up and down-arrow keys try to go to the logical target, e.g., up arrow in the denominator of a fraction goes to the numerator. In navigating and selecting text, it’s useful to understand the concept of the Text Insertion Point. The insertion point is in between characters, not on top of a character.
You can enter accented characters as discussed in Math Accents and in Representation of Math Accents. You can enter matrices as discussed in Entering Matrices. If you want to line up two or more equations just right, see Equation Arrays.
In OfficeMath, empty numerators, denominators, subscripts, superscripts, and other essential arguments, etc., display the place-holder character ⬚. If you want to hide the ⬚, insert a “zero-width space” given by the Unicode character U+200B as discussed in The Invisibles. In OneNote you can edit optional arguments. These arguments are normally not shown, but you can move inside them by using the left/right arrow keys. When the IP is inside an optional argument, the ⬚ is displayed and you can enter characters. For example, you can convert a square root into an n^{th} root by navigating into the root’s index argument and typing n. To make such changes in Word or PowerPoint, you need to use a context-menu option.
If you become familiar with keyboard entry, you’ll probably find that the fastest way to enter math (see also the Ink section next). But admittedly, it’s not obvious how to enter many things. The math ribbon displays lots of math objects in readily clickable form. As such it provides easily discoverable ways to enter common mathematical expressions. For a comparison of keyboard and ribbon, see Math Ribbon Entry of Subscripts and Superscripts.
Math Context Menus provide context-sensitive ways to modify math objects, such as changing a stacked fraction into a slashed fraction, or aligning a set of equations at their equal signs. See also More on Math Context Menus. You can use the Office Insert Symbol Dialog to insert any Unicode character including all Unicode math symbols. The more common math symbols can be inserted using the symbol galleries on the math ribbon.
Smart phones running OfficeMath don’t sport a math ribbon, but a math on-screen keyboard could let you enter lots of math entities easily. Think of exposing math symbols instead of emoji and using surround menus. Also, smart phones can work with ink…
You can enter equations with a pen as described in OneNote Math Assistant and the links therein. Microsoft’s math ink recognition first shipped in Windows 7 with the applet called the Math Input Panel. This applet lets you enter mathematical text using a pen or a mouse. It recognizes what you enter and displays the result using a private version of RichEdit. It also lets you copy the results to Word, Mathematica, or any other application that reads Presentation MathML.
Many people may find that writing equations by hand is the easiest and fastest way to enter them into a computer. Since I’ve made similar claims for UnicodeMath entry, a colleague of mine and I decided to have a race. I chose nine equations from theoretical physics and we started entering. The colleague entering via hand writing beat me by a nose, but had two errors, whereas I had none. But really, we both won, since we demonstrated that we could enter equations into Word remarkably fast.
Math accessibility falls into two categories: speech and braille. Microsoft Office Math Speech shipped in over 18 languages in January 2017. As described in Speaking of math…, math speech has two granularities: coarse-grained for fluent speech and fine-grained for editing. Together with touch typing on a keyboard, this combination enables a blind, nondeaf person to consume and edit math, both elementary and advanced.
The OfficeMath speech capability could be extended in useful ways such as offering alternate speech as discussed in Speaking Subscripts, Superscripts, and Fractions. Also, the facility “spoon feeds” the math speech to UI Automation. Some Assisted Technologies (ATs) such as NVDA and JAWS would like to get MathML for math zones and generate the math speech (and braille) themselves. Ways to do this will be the subject of a future post. Interestingly MathML can, in principle, be used both for generating math speech and for editing math as discussed in Editing Math using MathML for Speech.
Key infrastructure for math braille shipped in August 2017, namely the RichEdit build up/down machinery used by OfficeMath applications added support for entering and editing math using Nemeth Braille—the first math linear format. More work is needed for applications to expose math braille to end users. The main reason for using Nemeth math braille is given in Braille for Math Zones, which points out that the usual braille digit code ambiguities don’t exist in math zones, which is where the math is. Specifically, braille contractions aren’t used in math zones, so digits can be represented unambiguously using computer braille codes; no numerical indicator is needed for digits in Nemeth math zones (aside from an obscure case). Nemeth braille in math zones works with all languages (is globalized), whereas braille in ordinary text is localized to the language being used.
Other posts describing work on math braille include Unicode – Nemeth Character Mappings, which discusses extending the Nemeth specification to include many Unicode math symbols not in the current Nemeth specification and Nemeth Braille Alphanumerics and Unicode Math Alphanumerics, which relates how the Unicode math alphanumerics can be represented using Nemeth braille. The post Math Braille UI describes ways to reveal the math insertion point (IP) using a refreshable braille display. The braille IP location is complicated relative to that for ordinary text in that math structure characters described in OfficeMath aren’t always represented by a Nemeth code. For fractions, they are, but the start delimiter of a subscript object, for example, isn’t present in the Nemeth code.
Math dictation would be another math input method for blind and sighted users alike. Imagine, you can say 𝑎² + 𝑏² = 𝑐² faster than you can write or type it! Math dictation would work with all devices, computers, tablets, and phones. Hopefully someday…
]]>A good name is OfficeMath. “Office” alludes to Microsoft Office but needn’t be exclusive. “Office” suggests a high-quality level (okay, maybe I’m biased ). OfficeMath might suggest calculations rather than math text, but documentation can resolve that ambiguity, which also exists for the linear formats AsciiMath and UnicodeMath. The heart of OfficeMath is its in-memory model, named “Professional” in the OfficeMath UI. This model is mirrored in the OMML file format. It features N-ary structures such as integrals with limits and integrands, subscripts, superscripts and accents with well-defined bases, and math functions with function names and arguments. This level of detail is ordinarily reserved for content math formats such as Content MathML and OpenMath. OfficeMath incorporated these structures to support high-quality math typography, with the nice side effect of facilitating symbolic manipulations and graphing (OneNote Math Assistant). This post summarizes OfficeMath’s history, model, file format support, interoperability, math font, and math formatting, and includes links to further information in OfficeMath-oriented posts in Math in Office. OfficeMath UI will be discussed in a separate post.
A fun place to learn about the origins of OfficeMath is the post LineServices, which tells how the LineServices line-layout component came to be and how it evolved to yield TeX-quality math typography. OfficeMath depends on other technologies as well, including the creation of the math-font OpenType standard described in High-Quality Editing and Display of Mathematical Text in Office 2007 and OpenType Math Tables. For older history, the post How I got into technical WP describes the first math display program (Scroll, 1970) and predecessors of UnicodeMath.
OfficeMath was based on Unicode from the start. Unicode 3.2 (March, 2002) already had most of the current Unicode math character set. The Unicode Technical Committee is committed to including all attested math symbols in the Unicode Standard, so Unicode makes an ideal foundation on which to build math functionality. It also streamlines incorporation into Microsoft Office applications, since they are based on Unicode.
As with [La]TeX, MathML, MathType, and other math presentation programs, OfficeMath puts all math expressions and equations into math zones. Math-zone typography differs from the typography of ordinary text (see the section on Formatting below).
In the OfficeMath in-memory "Professional" format, mathematical objects like fraction and subscript are represented by a start delimiter, the first argument, an argument separator if the object has more than one argument, the second argument, etc., with the final argument terminated by an end delimiter. For example, the fraction 𝑎/𝑏 is represented in built-up format by {_{frac} 𝑎|𝑏} where {_{frac} is the start delimiter, | is the argument separator, and } is the end delimiter. Similarly, the subscript object 𝑎_{𝑏} is represented by {_{sub} 𝑎|𝑏}. The start delimiter is the same character for all math objects as are the separator and end delimiters. In RichEdit, these delimiters are given by the Unicode characters U+FDD0, U+FDEE, and U+FDEF, respectively. In OMML, the start delimiter is represented by an container element, such as <f> for fraction and arguments appear within argument element containers, such as <num>…</num> for a numerator.
The type of object is specified by a character-format property associated with the start delimiter. In plain text, the built-up forms of the fraction and subscript are identical if the fraction arguments are the same as their subscript counterparts. In the example here, a plain-text search for {_{frac} 𝑎|𝑏} matches {_{sub} 𝑎|𝑏} as well as {_{frac} 𝑎|𝑏}. Searching for OfficeMath equations involves plain-text searches like this together with comparison of the object types as discussed in Math Find/Replace and Rich Text Searches. The OfficeMath math objects are listed in the table in the next section along with their OMML and Presentation MathML representations. The objects are represented by prefix notation: the character formatting of the object start delimiter contains the object properties (see ITextRange2::GetInlineObject()). This differs from infix notation like 𝑎/𝑏, which needs to be parsed. The OfficeMath in-memory format is a “built-up” format as distinguished from linear formats like UnicodeMath and LaTeX.
The OMML format is the XML format that encapsulates the OfficeMath in-memory “Professional” format. When OfficeMath was designed, Presentation MathML 3.0 was nearing publication. But Presentation MathML is missing two important elements which therefore require <mrow> emulations to represent OfficeMath. Specifically, Presentation MathML doesn’t have an explicit N-ary element, nor does it have an explicit math-function element. Furthermore, OfficeMath needs to embed client (Word, PowerPoint, Excel, …) XML easily into the math XML. The MathML <semantics> element can embed such information, but it’s awkward. Accordingly, OMML was created to describe the OfficeMath in-memory format naturally. With best practices, MathML without the <semantics> element can be used to round-trip OfficeMath equations apart from non-math formatting like revision markings and embedded objects.
Here is a listing from MathML and Ecma Math (OMML) of the OMML elements and exact or approximate MathML counterparts
Built-up Office Math Object... | OMML tag... | MathMl |
Accent | acc | mover/munder |
Bar | bar | mover/munder |
Box | box | menclose (approx) |
Boxed Formula | borderBox | menclose |
Delimiters | d | mfenced |
Equation Array | eqArr | mtable (with alignment groups) |
Fraction | f | mfrac |
Math Function | func | mrow with FunctionApply (2061) mo |
Left SubSup | sPre | mmultiscripts (special case of) |
Lower Limit | limLow | munder |
Matrix | m | mtable |
N-ary | nary | mrow msubsup/moverunder with N-ary mo |
Phantom | phant | mphantom and/or mpadded |
Radical | rad | msqrt/mroot |
Group Char | groupChr | mover/munder |
Subscript | sSub | msub |
SubSup | sSubSup | msubsup |
Superscript | sSup | msup |
Upper Limit | limUpp | mover |
Other OMML references are Extracting OMML from Word 2003 Math Zone Images and OMML Specification, Version 2.
More MathML discussion is given in MathML 3.0, Improved MathML support in Word 2007, Rendering MathML in HTML5, and MathML on the Windows Clipboard.
Mathematical RTF is essentially OMML in RTF syntax. See also Office Math RTF and OMML Documentation and Updated RTF Specification.
Linear Format Notations for Mathematics include UnicodeMath and LaTeX Math in Office. See also Recognizing LaTeX Input in UnicodeMath Input Model.
Major interoperability is afforded via Presentation MathML and [La]TeX math. In addition, the Design Science MEE and MathType equations can be converted to OfficeMath as described in Converting Microsoft Equation Editor Objects to OfficeMath. MathType can convert OfficeMath to MathType equations. These equation facilities are compared in Equation-Editor Office-Math Feature Comparison and Other Office Math Editing Facilities. The latter also compares them to the Microsoft Word EQ Field.
With a bit of effort, equations can be imported into Office applications from Wikipedia articles as described in Copying Equations from Wikipedia into Office Applications. You can create HTML documents with equations in them as described in Creating Math Web Documents using Word 2007.
A basic part of OfficeMath is the Unicode OpenType math font. The first such font, Cambria Math, and the OpenType math tables were developed together with the Office 2007 math software, each influencing the other to obtain high quality results. Some history is given in the post High-Quality Editing and Display of Mathematical Text in Office 2007. The font contains extensive math tables, glyph variants and glyphs for most of the Unicode math character set. The tables were incorporated into the OpenType standard as noted in OpenType Math Tables. Posts elaborating on the math font are Special Capabilities of a Math Font and High Fonts and Math Fonts.
Cambria Math and Cambria are serifed fonts designed to look good on digital displays. As such, the stem widths never get skinny, in contrast to Times Roman fonts. If you prefer, the STIX math font is a Times Roman font that includes the OpenType math table support and works with OfficeMath. This font is discussed further in Math STIX Fonts 2.0 and UTR #25 Updates.
This section discusses how OfficeMath handles math formatting involving math spacing, math styles, and alignments, and gives links to posts with further information. A math zone is defined by the math-zone character-format effect, an effect like bold or italic. As such, this is a non-nestable property, unlike math objects like fractions, which can be nested arbitrarily deeply. Adjacent math zones automatically merge into a single math zone.
An essential part of good math typography is math spacing. Within a math zone, OfficeMath follows the math spacing rules given in Appendix G of The TeXbook plus some enhancements that weren’t added to TeX for reasons of archivability. Section 3.16 of UnicodeMath summarizes the rules for the most common situations. Also see User Spaces in Math Zones for ways that OfficeMath autocorrects typical user input spacing errors. Two Math Typography Niceties shows how phantom objects can improve math spacing beyond the standard spacing rules.
Math bold and math italic define different math variables in math zones (𝐚 ≠ 𝑎 ≠ a ≠ 𝒂), while in ordinary text, bold and italic are used for emphasis. In math zones, math bold and math italic characters are different Unicode alphanumeric characters, while in ordinary text, bold and italic are character format attributes with no change in character codes. For example, 𝐚 is U+1D41A, 𝑎 is U+1D44E, a is U+0061, and 𝒂 is U+1D482. Even though the math and ordinary-text uses of bold/italic are unrelated semantically, the user can control these math styles using the usual bold and italic UI as described in Using Math Italic and Bold in Word 2007. There are other math styles that yield still different mathematical variables, such as open-face, script, Fractur, and sans serif (see Section 2.2 of Unicode Technical Report #25). In general, character formatting is controlled in math zones as described in Restricted Math Zone Character Formatting. In informal documents, people may want to use sans-serif characters instead of serif characters for aesthetic reasons rather than for defining different variables. Currently OfficeMath doesn’t support this choice, but maybe it should.
Occasionally one needs to embed ordinary text, such as words, into math zones. OfficeMath defines a character format attribute “ordinary text” for this purpose. Text with this attribute uses standard character formatting for italic, bold, etc. Unless the “ordinary text” attribute is active, the bold and italic settings only affect math alphanumerics; ASCII digits, punctuation, operators, and non-math characters are all rendered nonbold and upright.
In addition, OfficeMath has a “no-build-up” attribute to treat operator characters literally rather than use them in build-up translations. For example, if ‘\’ is marked with this attribute, build up in UnicodeMath mode leaves it as the character ‘\’ rather than converting it with the arguments around it into a built-up “stacked” fraction.
Since math zones are one level deep, you can embed ordinary text into a math zone, but you can’t nest a math zone within that ordinary text or elsewhere within the math zone. This hasn’t proven to be a limitation, although TeX can embed ordinary text inside math zones and nested math zones inside the ordinary text. It always seems to be possible to unwrap such nested math-zone scenarios into unnested math zones.
It’s useful to be able to define math properties for an entire document, rather than specify them for each math zone. This is described in Default Document Math Properties. A new property could be defined to use sans-serif math characters instead of serif characters.
There are two kinds of math zones: inline and display. For example, an inline math zone in TeX has the form $...$ and a display math zone has the form $$...$$. Inline math zones use reduced spacing and character sizes to make expressions fit better in line with normal text. In OfficeMath a display math zone starts at the start of a document or follows a hard or soft paragraph end (U+000D or U+000B, respectively) and ends with a hard or soft paragraph end. In some cases, it would be useful to apply display math-zone formatting to inline math zones, but this isn’t currently available.
Inter-equation alignment and line breaking involve multiple lines. To handle these cases and equation numbering, OfficeMath has the Math Paragraph, while MathML uses tables and MathType uses PILEs. A math paragraph is a sequence of one or more display math zones separated by soft paragraph ends (U+000B). Line breaking can be automatic or manual as described in Breaking Equations into Multiple Lines. Background on paragraph formatting is given in Paragraphs and Paragraph Formatting.
In a document with more than a few equations, it’s useful to number equations referred to from elsewhere in the document. The math paragraph has elegant equation-number support, but it hasn’t been exposed beyond prototyping. The earliest way to handle equation numbering is described in Cool Equation Number Macros for Word 2007. Later ideas are in More on Equation Numbering and equation numbering using equation arrays is described in Equation Numbering in Office 2016. This last approach isn’t quite as convenient as the ideal math-paragraph equation numbering, but it can handle virtually all cases.
]]>Conversion to OfficeMath is only enabled for program modes that support the OMML (Office Math Markup Language) file format. If a file is opened in “Compatibility Mode”, equation objects in the file may not be directly convertible to OfficeMath. Word added OMML support in Word 2007 and PowerPoint and Excel added it in Office 2010. The old doc, ppt, and xls file formats do not support OMML. To convert equation objects in such files, first save them as the corresponding docx, pptx, and xlsx files using the Save As menu option. Then you can click on an equation object and get a menu/dialog that offers “Convert Equation to Office Math” and an option to “Apply to all equations”.
PowerPoint displays OLE objects on a slide wherever you put them. The objects are not embedded in the text of text boxes and hence don’t flow with text. For example, if you line up an equation object with text in a text box and change the text size, the text moves, but the equation object doesn’t move since it’s not part of the text. This differs from Word, for which OLE objects are embedded in the text and therefore flow with text changes.
The equation-editor converter converts OLE objects to native math (OfficeMath) text. To put the OfficeMath for a converted object onto a PowerPoint slide, the OfficeMath is stored in its own OfficeArt text box, which has the same dimensions as the original OLE object. People often position a set of equation objects to lay out equations nicely. Ideally all these objects would end up properly aligned in a single text box. But that’s a tricky recognition task and it isn’t handled by the converter. Users may want to do some cutting and pasting to get optimal results. The same approach is used for converting MEE objects in Excel.
In Word, equation objects are embedded in the text and the corresponding OfficeMath text replaces the objects in that text. So, equation conversion in Word doesn’t have object/text alignment problems, although line and page breaks may change due to the use of different fonts. OfficeMath requires a math font for characters supported by a math font, while MEE and MathType use a collection of non-math fonts that can be customized by the user.
The MEE object Equation-Native binary data (described in later sections) includes relative font sizes but doesn’t provide an overall default font size. For example, if you resize an equation object in PowerPoint, the Equation-Native binary data doesn’t change nor do the text sizes in the Windows metafile used to display the object. If the converted math text is too large for its text box, PowerPoint decreases the font size to fit. In any event, the converted math text typically has a different size from that in the original OLE object.
There are two kinds of fixups performed by the converter: 1) those handling differences in the math models as described in Integrands, Summands, and Math Function Arguments and Subscript and Superscript Bases, and 2) those dealing with equation object errors that don’t affect the object display significantly but change the display of the converted math text. For example, in OfficeMath, empty numerators, denominators, subscripts, superscripts, etc., display the place-holder character ⬚. Since the OLE objects don’t display such a character, the converter fixes up equations by removing empty subscripts and converting left subscripts with no bases into normal (right) subscripts. Similarly, if a math function name like “min” doesn’t have an argument, the converter treats the function name as ordinary text, rather than as a function-apply object with an empty base. In testing PowerPoint presentations, we found many such errors including an extreme case of an MEE subscript object with a subscript consisting of a “pile” of four empty lines. The converted math text shows a column (equation array) of four ⬚’s although the original object shows nothing. It seems reasonable to have the user delete errors that are that complicated. MEE and MathType use the deprecated codes U+2329 and U+232A for the wide-angle brackets ⟨ (U+27E8) and ⟩ (U+27E9), respectively. The converter replaces the former pair by the latter pair. It also changes the upper limit construction for ≝ into the single character (U+225D).
Now things get more technical. The converter is implemented as part of RichEdit and uses the same TOM interfaces as the UnicodeMath/LaTeX/speech/braille build up/down facilities. The Office RichEdit dll (riched20.dll) exports three conversion functions: ConvertEquationFromStorage() converts the object given by an IStorage interface, ConvertEquationFromOleStream() converts the object given by the OLESTREAM Get() method (prototype defined in ole2.h), and ConvertEquationFromStdVector() converts the equation binary data in the “Equation Native” stream. These functions don’t call operating-system OLE functions; hence they can be used on all major platforms. The prototypes for the functions are
HRESULT ConvertEquationFromOleStream( ITextRange2 * prg, ITextStrings2 * pstrs, OLESTREAM * poleStream, // OLE stream to read from BYTE bVersion) // Design Science MathType version # HRESULT ConvertEquationFromStorage( ITextRange2 * prg, // Range for inserting result ITextStrings2 * pstrs, // Rich-text string stack IStorage * pstg) // IStorage for OLE math object HRESULT ConvertEquationFromStdVector( ITextRange2 * prg, ITextStrings2 * pstrs, std::vector<BYTE> & EquationNative, // "Equation Native" binary stream BYTE bVersion) // Design Science version # (3-EE3, 5-MathType)
The interface ITextStrings2 is defined in the Office tom.h (eventually it’ll be in the Windows tom.h) and derives from ITextStrings. It adds the method
ITextStrings2::Rotate(LONG iString)
ITextStrings2::Rotate(-2) reorders the Design Science N-ary arguments to put the naryand (integrand, summand, …) third instead of first. ITextStrings2::Rotate(-1) is the same as ITextStrings::Swap() and swaps the top two strings. If the Type argument of ITextStrings::EncodeFunction() has the tomTeXStyleIsTextColor flag set, the TeXStyle argument has the text color instead of the TeXStyle. The TeXStyle isn't used by the converter since it’s implied by context (although it is stored in the OLE object binary data).
ConvertEquationFromStorage() calls IStorage::OpenStream(L”Equation Native”, …) to retrieve the Design Science OLE object’s “Equation Native” stream and then calls the converter to create the corresponding native math zones.
ConvertEquationFromOleStream() reads a Design Science object's compound file format, defragments it, retrieves the “Equation Native” stream, and calls the converter to create the corresponding math zones.
ConvertEquationFromStdVector() converts the "Equation Native" binary stream to a built-up Office math zone. This function is handy for unit tests. Enter with the EquationNative std::vector<BYTE> starting with the byte following the two "Equation Native" stream headers. The “Equation Native” binary format is illustrated in the next section.
The Design Science OLE object "Equation Native" stream contains the MTEF binary data for a MathType or MEE object. The MTEF data consists of a 28-byte equation-OLE header, a version header (5 bytes for MEE and 12 bytes for MathType) followed by the records for the equation. Container records (rcdLINE, rcdTMPL, rcdPILE, rcdMATRIX) can contain other records including themselves and are terminated by the end record rcdEND. For full documentation, see MathType's Equation Format (MTEF) in the MathType SDK (http://www.dessci.com/en/reference/sdk/).
The following table illustrates the Equation Editor 3.0 binary records for the equation
The two headers in the Equation-Native stream are omitted. Putting the binary into a std::vector<BYTE> and passing it to ConvertEquationFromStdVector(), you insert this equation into the text. Be sure to convert the ASCII hex characters to binary, two per byte with no intervening spaces. Note that the integrand precedes the integral limits. In OfficeMath, the integrand follows the limits, hence the need for ITextStrings2::Rotate().
binary | meaning |
0a 01 030e0000 01 02883100 00 01 02883200 0284c003 00 00 03150200 01 030e0000 01 12836400 0284b803 00 01 12836100 02862b00 12836200 12827300 12826900 12826e00 0284b803 00 00 00 0b 01 02883000 00 01 02883200 0284c003 00 0d 02862b22 00 0a 02863d00 030e0000 01 02883100 00 01 030d0000 01 12836100 030f0000 0b 11 01 02883200 00 00 0a 02861222 12836200 030f0000 0b 11 01 02883200 00 00 00 11 00 00 00 00 |
<normal size/> <line> <fraction> <line> (numerator) 1 </line> <line> (denominator) 2𝜋 </line> </fraction> <integral> <line> (integrand) <fraction> <line>𝑑𝜃</line> <line> 𝑎 + 𝑏 sin 𝜃 </line> </fraction> </line> <script size/> <line>0</line> (lower limit) <line>2𝜋</line> (upper limit) <symbol size/> ∫ (character) </integral> <normal size/> = <fraction> <line>1</line> (numerator) <line> (denominator) <root> <line> 𝑎 <sup> <sup size/> <line/> (null subscript) <line>2</line> </sup> <normal size/> − 𝑏 <sup><sup size/><line/><line>2</line></sup> </line> (end radicand) <line/> (no degree, i.e., square root) </root> </line> </fraction> </line> |
]]>
Format | N-aryand | Math function arg | Sub/sup base |
OMML | explicit | explicit | explicit |
Presentation MathML | no | no | explicit |
Content MathML | explicit | explicit | explicit |
[La]TeX | no | no | no |
UnicodeMath | explicit or implied | explicit or implied | explicit or implied |
MathType/Equation Editor | explicit | no | no |
Nemeth math braille | no | no | no |
It’s clear that a math function argument or an N-aryand should include the first entity that follows the base object. For example, for sin 𝑥, 𝑥 is the argument. Or for the summation (in Nemeth math braille ⠐⠨⠠⠎⠩⠝⠀⠨⠅⠀⠴⠣⠠⠝⠻⠁⠰⠝⠐)
𝑎_{𝑛 }is the summand. But what about the integral (⠮⠰⠴⠘⠠⠿⠐⠑⠘⠤⠭⠘⠘⠆⠐⠙⠭)
If you only include the exponential (⠑⠘⠤⠭⠘⠘⠆⠐), you omit the 𝑑𝑥.
Ideally the N-aryand is delimited explicitly. Presentation MathML has the <mrow> entity which can contain multiple MathML entities. A natural way to represent a math function argument or an N-aryand is to put them inside <mrow>…</mrow> as in (omitting mml: prefix and xmlns field)
<math> <msubsup> <mo stretchy="false">∫</mo> <mn>0</mn> <mo>∞</mo> </msubsup> <mrow> <msup> <mi>e</mi> <mrow> <mo>−</mo> <msup> <mi>x</mi> <mn>2</mn> </msup> </mrow> </msup> <mi>d</mi> <mi>x</mi> </mrow> </math>
In [La]TeX, enclose the N-aryand in {…} as in \int_0^\infty{e^{-x^2}dx}. Notice how much more concise [La]TeX is. Delimiting N-aryands in these ways constitutes best practice. It’s also best practice to delimit math function arguments these ways in MathML and LaTeX.
When the N-aryand isn’t delimited explicitly by notation such as <mrow>…</mrow>, the first math object following the N-ary operator (with limits) should be part of the integrand. If this object is a delimiter object, such as {…} (\{…\} in TeX), a compound N-aryand is well defined. But if it’s a concatenation of math objects as in the integral (⠮⠰⠴⠘⠠⠿⠐⠑⠘⠤⠭⠘⠘⠆⠐⠙⠭)it’s more than just the first object. It’s tempting to choose the concatenation of objects up to a binary operator of precedence of addition or to the end of the expression, whichever comes first. Such concatenation gives the correct result in this case. UnicodeMath has the concept of an argument that consists of such concatenations. This would also work for Nemeth math braille, which has no way of delimiting N-aryands or function arguments explicitly.
UnicodeMath attempts to look like a math notation as closely as a linear format can. But when an expression becomes too ambiguous, UnicodeMath encloses the expression in lenticular brackets 〖…〗(see Sec. 3.4 of that reference). As such, the integral above is written as ∫_0^∞▒〖e^(-x^2 ) dx〗in UnicodeMath. Here ▒ is the “glue” operator that connects the N-aryand to its large N-ary operator. But clearly UnicodeMath could use its definition of an argument to get the correct integrand in this case. I didn't code this refinement up for Office applications because the input method automatically puts the insertion point into the integrand as soon as a space builds up the integral operator. The same approach is used for entering function arguments. This makes it easy to type sin x+y, where x+y is the argument of the sine. To get out of the N-aryand or function argument, the user types an arrow key or clicks a mouse button. In the absence of the glue operator, it does seem like a good idea for UnicodeMath to capture a concatenation of objects for a N-aryand as it does for arguments of fractions, subscripts, etc.
]]>To get the alt text for an equation image, type F12 in your browser (I use Edge for this). This turns on a window with source browsing capabilities. At the top left side is the inspect tool, a little arrow with a rectangle. Click on the inspect tool and then on an equation. Let’s illustrate using the time-dependent Schrödinger equation, which is described nicely in Wikipedia. We see the figure
Copy the text highlighted in blue. This gives the HTML for the equation image
<img class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -2.505ex; width:41.97ex; height:6.343ex;" alt="i\hbar {\frac {\partial }{\partial t}}\Psi (\mathbf {r} ,t)=\left[{\frac {-\hbar ^{2}}{2\mu }}\nabla ^{2}+V(\mathbf {r} ,t)\right]\Psi (\mathbf {r} ,t)" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/f2ae69999ed8b8551b217b9fbdcd8bf73490c82f">
The image alt text is given by the alt="…" field. The LaTeX for the equation is inside the "…". Copy the LaTeX into a Word math zone and build it up in LaTeX mode (type Ctrl+=). You then see
which looks the same as in the browser aside from a change in font.
Back in 2009 I gave a lecture featuring highlights of my PhD advisor Willis Lamb’s life work. I knew his laser theory contributions quite well having written papers and a book with him on laser theory. But I didn’t know his Nobel-prize work on the Lamb shift or his theory of the Mössbauer effect very well. So, I read up on these phenomena in Wikipedia, copied the LaTeX for some equations into an alpha version of PowerPoint 2010, and built them up using an early version of the LaTeX converter code. I gave the lecture at a memorial symposium for Lamb at the University of Arizona’s Optical Sciences Department, wondering if anyone would ask how I prepared the equations in the slides. It was more than a year before Office Math shipped in PowerPoint. Fortunately, no one asked. Perhaps people assumed I had used images from Word or LaTeX. (But how then could the background match the slide pattern so well? )
]]>The information in this post is useful if you’re considering whether to convert MathType and/or Equation Editor OLE objects to native Office math zones, particularly if it becomes easy to convert them. If the conversions are faithful to the original semantics, you gain [La]TeX quality typography, in-line editing and search and support across all platforms including iOS and Android. But if you use features only available with the original OLE objects, you’ll want to stick with MathType or MathType Lite. You can download MathType for free. It downgrades to MathType Lite in 30 days if you don’t pay for it, but that downgrade is a significant upgrade from the Equation Editor. Note that if your computer doesn’t have the MT Extra font needed for some Equation Editor symbols, you can download it from here.
This post doesn’t compare the user interfaces (UI) between the products, although that would make for interesting blog post(s). The UI varies considerably and if you’re used to MathType and its hot keys, you may find it harder to enter math into native Office math zones. But it’s mostly a question of what you’re used to. The Office math autocorrect facility is handy, and you can define your own control sequences. MathType and Office apps support [La]TeX, in case you prefer [La]TeX, and they support math pen input (Windows Math Input Panel).
Here’s the summary comparison table
Format | Long division... | Matrix lines... | User spacing | Symbols | Templates |
MathType | 1 or 2 lines | Row/column | Nudges | Unicode 2.1++ | Yes |
Equation Editor... | 1 or 2 lines | Row/column | Nudges | Unicode 2.1+ | Yes |
Office Math | No | No | Unicode, phantom... | Unicode 11 | Yes |
Both MathType and the Equation Editor support one and two-line long division expressions such as Multiline long division expressions that have intermediate divisions shown below the baseline can be handled by multiple lines with enough nudges. With lots of effort using a flush-right tab and nudges, I managed to produce the multiline long-division expression
But since this takes a lot of effort, there probably aren’t many multiline long-division expressions in Equation-Editor objects (unless there’s an easier way to create them). Office Math has no provision for long division.
MathType and the Equation Editor support lines above and below any row in a matrix and left and right of any column. The lines are defined by the values: 0 for none, 1 for solid, 2 for dashed, and 3 for dotted. For example, you can put lines before and after every row and column as in the 2×2 matrixOffice Math doesn’t support such lines.
Math typography has well-defined rules for horizontal and vertical spacing of symbols and structures as described in the post User Spaces in Math Zones. In most cases, added user spacing detracts from the quality of math typography and is discouraged. Nevertheless, there are occasions where users may want to change the spacing. All three formats have ways to do this, but they are not necessarily interchangeable. In Office Math you can insert any Unicode space to increase spacing and you can remove or add precise spacing via smashes and other phantoms (see Section 3.17 of UnicodeMath). MathType and the Equation Editor don’t have phantoms, but they do have nudges. To nudge text and objects, select what you want to nudge and move it using the arrow keys while holding the Ctrl key down. The nudge offset can vary between −32768 and 32767. For rarely used mathematical special purposes, such flexibility could be useful, and Office Math has no counterpart. I used nudges in simulating the multiline long division above. It can also be useful in nonmathematical contexts. Note that the nudge values do not scale with the font height.
MathType lets users change the math spacing for math quantities, while Office Math gets those values from the math font’s OpenType MATH table. Math fonts such as Cambria Math use values compatible with [La]TeX, the publishing industry’s math standard. MathType lets you define intra-line spacing in multiline equations. Office Math does too, but it’s not currently exposed in the Office apps.
MathType also has a ruler for defining tab settings. In apps like Word, such settings don’t apply in math zones, so this might be a consideration in deciding whether to convert Design Science OLE objects to native math zones. The Equation Editor 3.0 menu doesn’t have a ruler, so it’s not clear how a user could define the tab settings. But the Equation Editor file format includes the tabs and their settings.
MathType and the Equation Editor are based on Unicode 2.0 (1996), while Office Math uses the current Unicode Standard. The math alphanumerics were added in Unicode 3.1 (2002) and many math operators were added in Unicode 3.2 (2002) (see DerivedAge.txt). A few more mathematical symbols have been added since then, notably the long division symbol ⟌... in Unicode 5.1. 364 characters have the math property in Unicode 2.0, while the current Unicode Standard has 2310 math characters. Because of the importance of many of these symbols, particularly the math alphanumerics, MathType has some non-Unicode ways of accessing symbols that weren’t defined in Unicode 2.0. There are also some symbols that MathType supports, e.g., the Apple logo (see character code 0xE7 of MT Extra 4.30), that Office Math doesn’t support natively, although you can insert normal text in a math zone formatted with any font. Office Math supports almost all math symbols in the Unicode math symbol set, which has many symbols not available in MathType. Office Math supports ISO standard math fonts, while MathType does not.
In Office Math if you don’t want to use a math font for symbols that appear in a math font (almost all mathematical symbols), then you have to mark the symbols with the “Text” attribute and you lose automatic mathematical spacing. When the “Text” attribute is used, any characters from any font can be inserted into a math zone.
For a complete listing of MathType symbols, see MTCode Encoding Table and Font Encoding Tables. For descriptions of the Unicode math symbol set see Unicode Technical Report #25 and UnicodeMath. If you’re converting from the Equation Editor or MathType to Office Math, it’s unlikely that you’ll lose any symbols. MathType can’t handle math alphanumerics in the U+1D400..1D7FF block since it can only handle Unicode characters with codes < 0x10000. It simulates common math alphanumerics using font styles with ASCII and Greek characters. The Unicode math alphanumerics are summarized along with MathType support in the following table adapted from Section 2.2 of UTR #25
Math Style | Characters from Basic Set... | Location... | MathType | |||
plain (upright, serifed) | Latin, Greek and digits | BMP | Yes | |||
bold | Latin, Greek and digits | Plane 1 | Yes | |||
italic | Latin and Greek | Plane 1* | Yes | |||
bold italic | Latin and Greek | Plane 1 | Yes | |||
script (calligraphic) | Latin | Plane 1* | Yes | |||
bold script (calligraphic)... | Latin | Plane 1 | No | |||
Fraktur | Latin | Plane 1* | Yes | |||
bold Fraktur | Latin | Plane 1 | No | |||
double-struck | Latin and digits | Plane 1* | Yes | |||
sans-serif | Latin and digits | Plane 1 | No | |||
sans-serif bold | Latin, Greek and digits | Plane 1 | No | |||
sans-serif italic | Latin | Plane 1 | No | |||
sans-serif bold italic | Latin and Greek | Plane 1 | No | |||
monospace | Latin and digits | Plane 1 | No |
Here * means that some of the characters are in the Letterlike block U+2100..U+214F.
The following symbols are in MathType’s MTCode Encoding Table, but not in Unicode (although usually there are Unicode symbols with the same semantics). The codes given are in the Unicode Private Use Area and, as such, depend on the font used.
E949 | Lazy S (Unicode considers this to be a glyph variant of reversed tilde U+223D ∽) |
E94E | Alias delimiter, E954 Round implies |
E955 | Smile under bar, E956 Frown over bar |
E959 | Greater-than almost equal to or less-than |
E95A | Less-than almost equal or greater-than |
E966 | Equals with dotted top line |
E967 | Precedes with colon, E968 Succeeds with colon |
E97A | Paired quadruple vertical dots |
E986..E989... | Dashed solidus, dashed backslash, dashed mid-line, dashed vertical bar |
E98E | Vertical bar with double hook |
E98F | Medium dot operator (free radical) |
E996 | Vertical bar over circle |
E997 | Vertical proportional to |
E998 | Black last quarter moon |
E999 | Black first quarter moon |
E99A | Negative sine wave |
E9E0 | Precedes equivalent to or succeeds |
E9E1 | Succeeds equivalent to or precedes |
E9E2 | Precedes almost equal to or succeeds |
E9E3 | Succeeds almost equal to or precedes |
EB00..EB6F | Has quite a few missing arrows and negations |
The Equation Editor 3.0, MathType and Office Math all have template menus for various kinds of fractions, integrals, matrices, accents, subscripts, superscripts, etc. MathType and Word’s Office Math let users save templates and equations for quick recall. In addition, Office Math has the math autocorrect facility, which allows users to assign templates and equations to math-autocorrect control words. For example, “\integral” is predefined to insert the equation
You can add others using the math ribbon Equation Tools/Equations Option dialog.
One cool way to enter an n×m matrix template is to type alt+= to enter a math zone, then \matrix(&…&@...@) where &…& consists of m − 1 &’s and @...@ consists of n − 1 @’s. So a 3×3 matrix template is given by \matrix(&&@@), which you can type pretty quickly.The \matrix() construct is designed to produce a rectangular matrix, so you only have to enter &’s for the first row.
]]>If you select a pair of equations like
2𝑥 + 3𝑦 = 5
3𝑥 − 4𝑦 = 7
you can ask the math assistant to solve for 𝑥 and 𝑦 getting 𝑥 = 41/17, 𝑦 = 1/17. You can also ask it to show the steps in the derivation by three methods: substitution, matrices, and elimination. It’s like a live teacher! In these simultaneous equations, options don’t appear if you use letters as coefficients.
Two formulas that appear in laser theory as well as other areas of physics are the Gaussian and the Lorentzian
These are normalized so that their integrals from −∞ to ∞ equal 1. Entering these into OneNote and clicking on the Math icon, we can graph them as
Notice how the Gaussian (brown curve) approaches 0 for large |𝑥| much faster than the Lorentzian (blue curve). In the math panel below the graph, there’s an entry window for changing the value of the coefficient 𝑎
For a cool demo showing how you can enter equations with a pen and graph them, check out Mina Spasic’s post on the OneNote math assistant.
Universal OneNote doesn’t have the math ribbon that OneNote 2016 has, but OneNote 2016 doesn’t have the built-in math assistant. If you want to enter equations into either version with a keyboard, type alt+= to insert a math zone and start typing UnicodeMath. The equations build up in front of your eyes! You can switch to LaTeX input by typing 24c9 alt+x to enter Ⓣ or add it to your math autocorrect as \TeX and type that to switch. You can switch back to UnicodeMath by typing 24c1 alt+x to enter Ⓛ. My favorite way to enter math is with a keyboard using UnicodeMath, but admittedly I’m a bit biased . Alternatively, draw in the equations with a pen. At some point people will stop insisting that math is hard!
]]>As noted in Section 3.10 Accent Operators of the UnicodeMath specification, the most common math accents are (along with their TeX names)
These and more accents are described in Section 2.6 Accented Characters and 3.2.7 Combining Marks in Unicode Technical Report #25, Unicode Support For Mathematics. More generally, the Unicode ranges U+0300..U+036F and U+20D0..U+20EF have these and other accents that can be used for math.
The Windows Character Map program shows that the Cambria Math font has all combining marks in the range 0300..036F as well as 20D0..20DF, 20E1, 20E5, 20E6, 20E8..20EA. The range 0300..036F used as math accents in Word looks like
Except for the horizontal overstrikes and the double-character accents shown in red, all these work as math accents in Microsoft Office apps, although many aren’t used in math. In keeping with the Unicode Standard, UnicodeMath represents an accent by its Unicode character, placing the accent immediately after the base character. There’s no need for double-character accents in Microsoft Office math since the corresponding “single” character accents expand to fit their bases as in
In UnicodeMath, this is given by (a+b)~, where ~ can be entered using the TeX control word \tilde. This is simpler than TeX, which uses \widetilde{a+b} for automatically sized tildes rather than \tilde{a+b}.
The combining mark in the range 20D0..20EF that work as accent objects in Office math zones areYou can test accents that don’t have TeX control words by inserting a math zone (type alt+=), type a non-hex letter followed by the Unicode value, alt+x, space. For example, alt+=, z, 36F, alt+x, space gives
MathML 1 was released as a W3C recommendation in April 1998 as the first XML language to be recommended by the W3C. At that time, Unicode was just starting to take hold as Microsoft Word 97 and Excel 97 had switched to Unicode. [La]TeX was developed before Unicode 1.0, so it relied on control words. Accordingly, it was common practice in 1998 to use control words or common spacing accents to represent accents instead of the Unicode combining marks even though many accents didn’t have a unified standardized representation. Unicode standardized virtually all math accents by using combining marks. One problem with using the combining marks in file formats is that they, well, combine! So, it may be difficult to see them as separate entities unless you insert a no-break space (U+00A0) or space (U+0020) in front of them. UnicodeMath allows a no-break space to appear between the base and accent since UnicodeMath is used as an input format as well as in files. Only programmers need to look at most file formats (HTML, MathML, OMML, RTF), so a reliable standard is more important for file formats than user-friendly presentation.
MathML 3’s operator dictionary defines most horizontal arrows with the “accent” property. In addition, it defines the following accents
02C6 ˆ modifier letter circumflex accent
02C7 ˇ caron
02C9 ˉ modifier letter macron
02CA ˊ modifier letter acute accent
02CB ˋ modifier letter grave accent
02CD ˍ modifier letter low macron
02D8 ˘ breve
02D9 ˙ dot above
02DA ˚ ring above
02DC ˜ small tilde
02DD ˝ double acute accent
02F7 ˷ modifier letter low tilde
0302 ̂ combining circumflex accent
0311 ̑ combining inverted breve
Presumably the operator dictionary should be extended to include more math combining marks and their equivalents, if they exist, with the spacing diacritics in the range U+02C6..U+02DD.
Here’s the MathML for the math object 𝑎̂.
<mml:mover accent="true">
mm<mml:mi>a</mml:mi>
mm<mml:mo>^</mml:mo>
</mml:mover>
“Office MathML” OMML is the XML used in Microsoft Office file formats to represent most math. It’s an XML version of the in-memory math object model which differs from MathML. The math accent object 𝑎̂ has the following OMML
<m:acc>
mm<m:accPr>
mmmm<m:chr m:val=" ̂"/>
mmmm<m:ctrlPr/>
mm</m:accPr>
mm<m:e>
mmmm<m:r>
mmmmmm<m:t>𝑎</m:t>
mmmm</m:r>
mm</m:e>
</m:acc>
The Rich Text Format (RTF) represents math zones essentially as OMML written in RTF syntax. Regular RTF uses the \uN notation for Unicode characters not in the current code page. The math accent object 𝑎̂ has the RTF
{\macc{\maccPr{\mctrlPr\i\f0\fs20 }{\mchr \u770? }}{\me\i\u-10187?\u-9138?}}
Unicode RTF is easier to read since characters are written in Unicode
{\macc{\maccPr{\mctrlPr\i\f0\fs20 }{\mchr ̂}}{\me\i 𝑎}}
But none of these is as simple as the UnicodeMath 𝑎 ̂ .
]]>
The RTF specification doesn’t have a control word for GIF images, although it has \jpegblib for jpeg’s and \pngblip for png’s. One way to persist GIFs would be to define a \gifblip. But this approach has the disadvantages that RTF readers that don’t recognize \gifblip would either discard the image altogether or fall back to an alternate format if one is included in the RTF stream. RichEdit’s earlier choice of converting the GIF to a png saves only the initial frame. A simple observation reveals how to solve both problems: the RTF reader isn’t, and furthermore shouldn’t be, responsible for the image format. Instead the program that interprets the image binary data should recognize the image format and validate the data. This is essential for security reasons. Accordingly, RichEdit just uses the \pngblip control word for a GIF even though the binary data defines a GIF! This interoperates with Word, older RichEdit clients, etc. Human readers of RTF can know the difference if they want to by looking at the image data, which starts with “PNG” for a real png and “GIF” for a GIF. The decoding of the images on Microsoft and Android operating systems is performed by the Windows Imaging Component (WIC), which also validates the images. Currently the RichEdit image facility isn’t supported on Apple platforms, but that may change.
When reading a GIF from RTF or via one of the APIs, RichEdit calls WIC to convert the image to a WIC bitmap. RichEdit saves memory space by not storing the original image data, since that data can be faithfully recovered from the WIC bitmap including all metadata.
Animated GIFs vary appreciably in size, ranging from 50 kilobytes for simple cartoon animation like
to several megabytes, such as this 1-megabyte GIF from the movie Singin’ in the Rain
When exported in RTF files, the size doubles since each binary byte is represented by two hexadecimal bytes. So, it’s wise not to insert animated GIFs excessively.
The GIF format is described very well on the web, but here are a few quick thoughts before explaining how GIF animation works in RichEdit. First, GIF metadata is essential for animation. It controls the frame rate on a per-frame basis, the color palette(s) and how the current full image is used in composing the next image. In fact, several partial frames along with the current full image may be combined to create the next displayed frame. Such intermediate frames have “0-delay”, that is, they are included in the composition without being displayed. Some programs display them anyway (see the Windows photo viewer and the animated GIF sample), but I, at least , think that’s misunderstanding the purpose of 0-delay intermediate frames. In fact as discussed further below, some multiframe GIFs use the extra frames purely for refining the image colors, since a single frame is limited to 8-bit color and one color palette.
Each animated GIF gets its own timer in this model since use of a shared timer ends up slowing down the animations. The time delays between frames may be the same or they can vary substantially, such as in the GIF
Animation is enabled/disabled via calling ITextDocument2::SetProperty(tomAnimateImages, Value), where tomAnimateImages is given by 0x95 and Value = tomTrue/tomFalse, respectively. A client that enables animation guarantees that the ID2D1RenderTarget supplied in the call to ITextServices2::TxDrawD2D() remain valid between calls. By default, animations are disabled since previous clients don’t know about this requirement.
Animations only work in Direct2D/DirectWrite mode. Initial attempts to get them running in GDI/Uniscribe mode display the animated images correctly except with a black background. This could be fixed if there is a need for animations in GDI mode.
Animations are disabled for selected images. This allows the user to recognize when images are selected. They are also disabled when the RichEdit control loses the focus. If a GIF has a maximum loop count, the animation stops after reaching that count.
Clients can get an image IStream by calling ITextRange::GetEmbeddedObject() to get the image’s IUnknown interface, then calling IUnknown::QueryInterface for an IServiceProvider interface and finally calling IServiceProvider::QueryService() to get the image IStream. This works for all kinds of images, not just GIFs.
GIFs with frames that all have zero frame delays display only the composite final frame. The multiple frames are typically used to get a composite image that has higher-resolution color than possible with a single 8-bit color palette. The Windows photo viewer nevertheless tries to animate such GIFs so the user sees fluctuating colors. PowerPoint presentation mode displays the correct composite image after one animation sequence, but Word only shows the first frame. See High-Color GIF Images for a detailed discussion of the following stationary multiframe GIF with high-resolution color.
]]>