Inserting and Getting Math Text in RichEdit

Starting with the Office 2007 RichEdit, it has been possible to enter math using the keyboard and to read and write RTF files that contain math zones. The RichEdit Text Object Model (TOM) ITextRange2 interface has methods to handle math programmatically, such as ITextRange2::BuildUpMath() and ITextRange2::SetInlineObject() and GetInlineObject(). But the methods don’t offer convenient ways to enter and retrieve math in other formats, such as MathML and OMML (Office MathML). This post describes some recent additions to handle math in various formats easily using ITextRange2::SetText2(options, bstr) and GetText2(options, pbstr).

These methods offer a variety of options including the tomConvertRTF (0x2000) option to insert and get RTF strings. This capability is exposed in the XAML text object model of the RichEditBox via the Windows.UI.Text.ITextRange::SetText() and GetText() methods with the FormatRtf option. Officially RTF is a byte format, that is, the character codes are 8-bits or multiples thereof and the TOM methods support BYTE strings, even though housed in BSTRs. In addition, the methods support UTF-16 strings that can contain any Unicode characters, so you don’t need to use the hard-to-read RTF Unicode \uN control word. On input (SetText2), UTF-16 RTF is recognized automagically. To retrieve UTF-16 RTF, bitwise OR in the tomGetUtf16 flag (0x00020000), that is, call ITextRange2::GetText2(tomConvertRTF | tomGetUtf16, &bstr). International and math UTF-16 RTF is so much easier to read than the standard 8-bit RTF! Internally, RichEdit reads UTF-16 RTF by converting it to UTF-8 RTF, a format introduced by RichEdit 4.1 in 2002, and reading in the resulting UTF-8 RTF.

Since the tomConvertRTF option made handling RTF easier, we decided to add some more options for the SetText2 and GetText2 methods. Specifically the latest Microsoft Office RichEdit supports the tomConvertMathML (0x00010000), tomConvertOMML (0x00080000), and tomConvertLinearFormat (0x00040000) options for converting MathML, OMML (Office MathML), and the math linear format, respectively. These options let you use RichEdit as a math-zone conversion machine: input math in one format and retrieve it in another.

Note that while RTF is a general rich-text format that can include math zones, MathML and OMML only represent math zones. In a fashion similar to RTF, plain text can include math zones in the linear format delimited by square brackets with quills ⁅ (U+2045) and ⁆ (U+2046). In fact, you can copy RichEdit text with math zones to a plain-text editor such as NotePad, copy that plain text back into RichEdit, Select All and use the ctrl-alt-shift-= hot key to build up all the math zones! Although most rich-text formatting is lost in plain-text copies, the math zones come through with little or no loss.

The new RichEdit can also copy and paste MathML and OMML using the usual copy/paste hot keys and commands. Older RichEdit versions contain the MathML and OMML converters used by PowerPoint and OneNote, but ironically RichEdit itself didn’t expose MathML/OMML copy/paste functionality, so that’s been remedied.

In a nonmath context, there’s a new SetText2 option, tomConvertRuby (0x00100000), to convert strings like “{…|…}” to ruby inline objects, where the first ellipsis represents the ruby text and the second ellipsis the base text. The ASCII curly braces and vertical bar are translated to the internal ruby-object structure characters U+FDD1, U+FDEF, and U+FDEE, respectively. Alternatively the string can contain those structure characters directly. If a digit follows the start delimiter (‘{‘ or U+FDD1}, the digit defines the ruby options

rubyAlign val

Meaning

center (0)

Center <ruby> with respect to <base>

distributeLetter (1)

Distribute difference in space between longer and shorter text in the latter, evenly between each character

distributeSpace (2)

Distribute difference in space between longer and shorter text in the latter using a ratio of 1:2:1 which corresponds to lead : inter-character : end

left (3)

Align <ruby> with the left of <base>

right (4)

Align <ruby> with the right of <base>

 

If you add 5 to these values, the ruby object will display the ruby text below the base text instead of above it. For example, calling ITextRange2::SetText2(tomConvertRuby, bstr) with bstr containing the string “{1にほんご|日本語}” inserts

The string can contain text in addition to ruby objects and the ruby objects can be nested to create compound ruby objects such as

 

We see that the ITextRange2::SetText2() and GetText2() methods provide helpful conversion facilities. You may have noticed that the very important LaTeX/TeX math format doesn’t have such an option yet: no tomConvertTeX. That’s a serious shortcoming that needs to be addressed.