The time has come to summarize the features added in RichEdit 8, which shipped with Windows 8 and Office 2013. Since so much was added, I wrote a number of blog posts over the last twelve months about the larger RichEdit 8 features. The present post lists those features and then describes some smaller features included in RichEdit 8. Two large features, the Text Object Model Version 2 (TOM 2) and the Windows RT TOM don’t have separate posts since they’re described in detail in MSDN. In spite of these other posts, this post is bigger than usual. Features added in previous versions of RichEdit are described in RichEdit Versions 1.0 through 3.0, RichEdit versions, and RichEdit Versions Update to 7.0.
Office 2013 has undergone a substantial shift to a relatively new display facility, Direct2D, and a new text facility, DirectWrite. These are the display facilities that are used on Windows Phone 8, the new Windows RT slates, and optionally on Windows 7 & 8. For further info, see this post.
This post describes a couple of performance improvements: 1) a more efficient display tree, and 2) a faster rich-text formatting mechanism.
This post describes how RichEdit 8 was enhanced to access the Windows 8 spell-checking and autocorrection components directly.
This post describes how the RichEdit selection and grippers work on Windows 8 touch devices
This post describes an implementation of Microsoft UI Automation (UIA) that exposes most objects in a RichEdit instance. This is done via the UIA Text Pattern and includes basic character and paragraph formatting, images, OLE objects, math zones, tables, hyperlinks, and the text inside or associated with these objects.
Over the years, the basic edit control has grown in size to accommodate greatly increased functionality. So now, even a plain-text, single-line control is pretty large. Clients can benefit from “flyweight RichEdit controls” which are stored in RichEdit stories (ITextStory, part of TOM 2) and share the properties of the parent ITextServices. RichEdit uses scratch stories internally for math build up/down, to convert MathML into the internal math representation, and to copy rich text when the text in the original copy selection gets changed. And there’s the main flyweight story that’s used by default. So all clients benefit from flyweight controls even those that don’t use the controls explicitly via the ITextStory interface. For more info, see this post.
Emoji characters posed special challenges due in part to the Unicode unification of 107 Emoji characters with existing characters in the BMP and in part to the 11 keycap Emoji for #, 0, …, 9, which use the U+20E3 keycap combining mark. The original plan was to use the Segoe UI Symbol font for all Emoji, but this font choice is ambiguous for the unified Emoji. RichEdit 8 uses Segoe UI Symbol for all Emoji except the double exclamation mark (U+203C: ‼), which uses the current font if it has this character.
If an ambiguous Emoji character is followed by one of the BMP “Emoji” variation selectors U+FE0E and U+FE0F, RichEdit treats it as an Emoji character. U+FE0E specifies that the character should be rendered using a standard Emoji-capable font, e.g., Segoe UI Symbol, whereas U+FE0F implies that special Emoji rendering should be used. This special rendering is specified, in principle, in a higher-order protocol, but RichEdit 8 doesn’t have such a protocol. For more info, see this post.
Variation selector sequences posed challenges in both the user interface and in font selection. Such sequences consist of a base character, either in the BMP or a surrogate pair, followed by a variation selector, which can also be in the BMP (U+FE00..U+FE0F), or a surrogate pair (U+E0100..U+E01EF). The keyboard arrow, Delete and Backspace keys need to treat a VS sequence as a single character. Font binding is tricky, since currently only special fonts have support for VS sequences. Initially we tried font binding the U+E0100..U+E01EF variation selectors to a Japanese font, since the only usage at the time is in Japan. But this was changed to use the font of the base character, since it’s likely that China will define some VS sequences as well. It’s important that the variation selector is in the same character format run as its base character. See also the Emoji entry above which mentions how VS sequences can help denote how to render emoji characters.
The Text Object Model (TOM) Version 2 adds the interfaces ITextDocument2, ITextSelection2, ITextRange2, ITextFont2, ITextPara2, ITextStoryRanges2, ITextStrings, ITextStory, and ITextRow. The complete TOM model is defined by tom.idl, which includes tom1.idl. The interfaces are all documented in MSDN.
This post describes the ITextRow table interface which allows you to insert tables, examine tables and to perform table manipulations, such as inserting, deleting and resizing table columns. Along with the ITextRange Move methods, the ITextRow methods give complete control over RichEdit’s nested table facility.
The Windows RT Text Object Model gives the Windows RT RichEditBox a TOM-like object model. The Windows RT TOM is a subset of the full TOM2 interfaces. It has the following interfaces, all in the Windows.UI.Text namespace: ITextDocument, ITextSelection, ITextRange, ITextCharacterFormatting, ITextParagraphFormatting, and ITextConstantsStatics. The first five of these interfaces delegate to the TOM2 ITextDocument2, ITextSelection2, ITextRange2, ITextFont2, and ITextPara2, respectively. The large TOM2 enum of values is broken into a set of enums each oriented towards a particular feature. The Windows.UI.Text.idl file defines the interfaces and enumerations.
For the new immersive environment on tablets and on the Windows Phone 8, not only are GDI and Uniscribe absent, so are the functions handled by the venerable user.dll. That program library includes the Windows functions SendMessage, MessageBox, CreateWindow, etc. A version of RichEdit 8 has been created for the immersive environment. All instances are windowless and use D2D/DWrite for measuring and rendering. The client can still send RichEdit messages via the ITextServices::TxSendMessage() method. The advantage of dropping the traditional user.dll is the relative simplicity of the model. But doing so omits significant functionality, at least in the initial version. At the same time, the touch functionality is dramatically improved on Windows 8 and in the immersive environment. The immersive version of RichEdit 8 is used by the Windows Store OneNote.
The Windows 8 RichEdit is mostly a subset of the Office RichEdit 8. The features that are included are documented in MSDN. Here’s a list of omitted features:
- Page/Table Services (PTS): multicolumns, math paragraph, tight wrap around objects
- Text trackers
- Blobs (blobs are used internally for png and jpeg images)
- IRichTextProvider (callback interface used by OneNote and OfficeArt to insert rich text into a RichEdit control)
- XML handlers (used by OneNote and OfficeArt to access RichEdit’s MathML converters)
- Various messages
- Quite a few bugs fixes (Windows 8 shipped before Office 2013)
The Windows Phone 8 RichEdit is based on the standard RichEdit 8 code base, rather than on the earlier WinCE version. A combination of makefile and conditional compilation instructions control various differences.
The default preferred font table was modified to correspond to the fonts on the phone. Related to this is the need to have a table of fonts to use when a file specifies a font on the phone, but not on the desktop. There’s also a last minute font fallback for East Asian scripts. Many changes from the phone teams have been back ported into the main RichEdit code base.
For Windows RT, a special font callback interface, IProvideFontInfo, is defined that is used to replace RichEdit’s built-in font binding with Windows RT’s font binding. A major reason for this replacement is to support Windows RT composite fonts. The IProvideFontInfo interface is obtained by calling ITextHost::QueryInterface for an IProvideFontInfo. It includes the GetRunFontFaceId() method, which returns a font ID given the current font, the font weight, stretch and style, the lcid, a pointer to input characters and character count to be used with the returned font, the current font ID, and an out parameter runCount that gives the character count that the returned font covers. Two known problems exist with this feature: 1) it doesn’t stamp the characters with a CharRep, and 2) the Windows RT font binder doesn’t understand mathematical text. The CharRep is important for BiDi and for font fallback. Hopefully these problems will be addressed in a future release. IProvideFontInfo should not be used in math zones, since font binding in math zones is quite tricky.
RichEdit font binding uses a character repertoire (CharRep) facility. Character repertoires are often the same as Unicode scripts, but they include other sets such as symbols and emoji. In previous versions of RichEdit, the character-repertoire flags and indices along with the functions that manipulated them were scattered around in several files. Furthermore the variables used had no more space for new character repertoires, such as emoji. Accordingly we needed to generalize the facility.
To this end, we collected the character flags functionality and associated defines into the CCharFlags class, which hides many details from calling code. We used it to add support for 15 new character repertoires bringing RichEdit up to date with the scripts that Windows 8 supports. The scripts added are: Symbol, Emoji, Glagolitic, Lisu, Vai, N’ko, Osmanya, PhagsPa, Gothic, Deseret, Tifinagh, Old Italic, Old Turkic, Bopomofo, and Cyrillic Ext B. More character repertoires can be added easily and, in fact, a number have been added in Windows 8.1.
If you specify the charset in creating a font, GDI will ensure that you get a font that handles that charset. Admittedly charsets cover only a subset of the world’s languages (no Indic, Syriac, etc.), but they do cover many important languages, notably Chinese, Japanese, and Korean (CJK). It’s really desirable to choose a font for Chinese characters that suits the user: Simplified Chinese, Traditional Chinese, or Japanese. Another trick is if a character is an end-user-defined character (EUDC) in the Unicode Private Use Area, GDI will ensure that you see a glyph by searching through possible EUDC fonts. These characters are not defined in the Unicode Standard, so you can’t use them reliably for text interchange. But they are popular in CJK locales and a given machine may have fonts with the glyphs that the user wants.
DWrite doesn’t offer such automatic font fallback. Accordingly to handle font fallback better on the DWrite code path, we pass down the current CharRep. This gives access to a default font that is likely to have the character glyphs when the current font does not. Code to handle EUDC for DWrite is included as well.
Windows 8 generalized its font attributes to have style, weight, and stretch. Actually GDI’s LOGFONT has always had font weight, but it hasn’t always been consistent about grouping font files that differ only by weight into a font family. For example, Windows 8 considers Arial Black to be the heaviest weight member of the Arial family rather than an independent font. The only change needed to handle font weight was to expose it in the RTF format with the fweightN control word. Font style includes upright, italic, and oblique. GDI has always had upright and italic, and used oblique when italic is requested and no corresponding italic font is available. To handle explicit requests for oblique, we added the RTF control word oblique and an attribute CEM_OBLIQUE. Font stretch didn’t have a representation in RichEdit’s character formatting or in RTF, so we added the RTF control word fstretchN.
In RichEdit 7 and earlier versions, the default preferred font table is created at run time using a set of calls. The table is indexed by the charrep. In RichEdit 8, most entries are given in a convenient, explicit table. This change facilitated updating the entries to the Windows 8 preferences and creating a modified table for use on Windows Phone 8. There are two kinds of entry: user-interface (UI) and document. Plain-text instances use the UI entries and rich-text instances use the document entries unless the client has sent an EM_SETLANGOPTIONS message with the IMF_UIFONTS option.
RichEdit’s character formatting includes an LCID, which is being deprecated in favor of the BCP-47 language tags. In particular, Windows RT uses BCP-47 language tags as does the Windows RT Windows.UI.Text.ITextCharacterFormat LanguageTag property. We didn’t want to add a new method to ITextFont2, so we implemented the functionality in classic TOM by adding the flag tomLanguageTag for the ITextRange2::GetText2() and SetText2() methods. The approach uses the OS LCIDToLocaleName and LocaleNameToLCID functions. We also implemented a facility for converting BCP-47 strings that LocaleNameToLCID doesn’t recognize into LCIDs for internal consumption. This facility doesn’t handle arbitrary BCP-47 tags, but it handles those used by Windows 8 that don’t have LCIDs (see following feature).
Support was added for seven new Windows 8 keyboards that don’t have LCIDs. This involves decoding