RichEdit Character Formatting

RichEdit’s name derives from its ability to represent rich text. Such text is comprised of text runs with different sets of character and paragraph formatting properties along with embedded objects, such as images. Some discussion of paragraph formatting is given in an earlier post. The present post discusses how character formatting is represented in RichEdit and how it differs from CSS (cascading style sheet) formatting used, for example, with HTML.

Typical character formatting properties include bold, italic, underline, text color, background color and font. These properties affect how the characters are displayed. Other properties may attach special behaviors to the text in addition to affecting the display. Such properties include the hidden-text, math-zone, protected-text, and hyperlink effects. Much relevant information is given in the documentation of RichEdit’s character formatting APIs, such as the EM_SETCHARFORMAT message and the ITextFont interface.

The APIs can change most any combination of properties. This capability is implemented by allowing properties to have a NINCH value (no input, no change). As such, any or even all properties can remain unchanged by the API. For EM_SETCHARFORMAT, property NINCHs are represented by zero mask flags in CHARFORMAT::dwMask. For ITextFont duplicates, the ITextFont::Reset(tomDefault) method sets all properties to be the NINCH value.

The ability to specify a partial set of properties is reminiscent of CSS. For CSS, a subset of properties can be applied to a text node in a document tree. Properties that are not specified for a node are inherited from those active at a higher node in the tree. A full set of properties is defined at the highest node (the root). A feature of such a structure is that if a root property is changed, the new property value is used for lower nodes unless one or more intervening nodes have explicit values for that property.

RichEdit has a relatively flat model with two formatting levels: the edit instance’s default formatting set and modified sets resulting from invoking formatting APIs on text runs. The former is a little like root-node formatting. But these two levels do not work as a two-level CSS tree. Internally RichEdit stores character formatting properties in a CCharFormat structure, which contains a superset of the public CHARFORMAT2 properties. Unlike CHARFORMAT2, CCharFormat does not have a mask (dwMask); all its properties are defined.

To see how this differs from the CSS inheritance approach, start with an empty instance and insert some plain text like “Hello world” that doesn’t result in any automatic character formatting. That is, all the characters can be displayed using the default font, no complex scripts are involved, no automatic hyperlinks or math zones exist, etc. As such there is a single text run with character formatting completely specified by the default CCharFormat.

Let’s suppose that the default character formatting isn’t bold and apply bold to the “Hello”. That text run will then have a new CCharFormat that’s identical to the default CCharFormat except for being bold. What’s different about this model from CSS is that once a new CCharFormat is created, it remains unchanged if the default properties are changed later on. In this example, if the default CCharFormat is now changed to be underlined, the bold “Hello” will not be underlined, but the “world” will be underlined, since the “world” still uses the default format.

Another way of saying this is that a CCharFormat is a fully populated character-formatting property set. It’s not a “property bag” that contains only some properties, allowing the remaining properties to be inherited from higher nodes.

A good example of where RichEdit needs to add property bags is to store OpenType properties. Such a property has a 32-bit identifier id and a 32-bit value value. By default the text has no such properties. A generic mode could apply default OpenType properties to the text, e.g., properties like standard English ligatures and pair kerning. To apply discretionary properties, such as elegant swash glyph variants, the RichEdit client could call ITextFont2::SetProperty(id, value). The ITextFont2 interface was defined in RichEdit 6.0 along with several other interfaces which I’ll post about sometime soon. Up through RichEdit 8.0, which will ship with the next version of Office, support of discretionary OpenType properties isn’t implemented. Of course, a lot of OpenType features have been supported for a while, including those for complex scripts (since RichEdit 3.0) and for mathematical typography (since RichEdit 6.0).

In principle this property-bag approach could be used for the CCharFormat properties instead of creating CCharFormat’s for text runs. This alternative would allow the default character format properties to be used on text runs that don’t have explicit values for those properties in a property bag. This two-level model is not as fast as using fully defined CCharFormat’s, but the flexibility might be worth the performance hit.