Digging through old doc files, I ran across the following summary of RichEdit up through Version 3.0. It’s more detailed than my post on RichEdit Versions, so it might be of interest to history buffs, anyhow. And it does describe the riched20.dll that still ships with Windows, mostly for purposes of backward compatibility. I wrote this document back in 1998 in preparing for an internal seminar on RichEdit 3.0. It even mentions that RichEdit 3.x would be an ideal development environment for WYSIWYG editing of built-up mathematical expressions! Sure hit that nail on the head. Naturally the statement “there are three main versions of RichEdit” is quite out of date.
What is RichEdit?
There are three main versions of RichEdit: 1.x, 2.x, and 3.0. Since all are being used, it makes sense to group the RichEdit features as they were introduced by these three versions. In general, RichEdit adds selective character and paragraph formatting along with embedded objects to the plain text editing facilities well-known in system edit controls.
A RichEdit instance consists of a single story, galley-like text that can be exported and imported using plain text or RTF. Each version of RichEdit is a superset of the preceeding one, except that only FE builds of RE 1.0 have a vertical text option (a relatively elegant vertical option could be added to RE 3.0 if there’s sufficient demand).
RichEdit 1.0 was originally developed for rich-text email. Major differences between the various builds of RE 1.x and RE 2.0 are that the latter is based on Unicode, is a single world-wide binary (not including BiDi, Thai or Indic scripts), has multilevel undo, has a powerful set of com interfaces, and is substantially more Word compatible. RE 2.1 adds BiDi capabilities.
Major differences between RE 2.x and RE 3.0 include the latter's better performance, richer text, outline view, zoom, font binding, more powerful IME support, and rich complex script support (BiDi, Indic, and Thai). RE 3.0 is a single, scalable, world-wide binary that offers high performance and substantial Word compatibility in a small package.
RichEdit 2.0 also includes simpler plain-text and single-line controls. RE 3.0 adds rich/plain ListBox and ComboBox controls.
RichEdit 1.0 Features
1. Text Entry/Selection. Mostly standard (system-edit control) selection and entry of text. Selection bar support. Word-wrap and auto-word-select options. Single, double, and triple click selection.
2. ANSI (SBCS and MBCS) editing. No Unicode
3. Basic set of character/paragraph formatting properties
4. Character formatting properties: font facename and size, bold, italic, solid underline, strikeout, protected, link, offset, and text color.
5. Paragraph formatting properties: start indent, right indent, subsequent line offset, bullet, alignment (left, center, right), and tabs.
6. Find forward: includes case-insensitive and match-whole-word options.
7. Message-based interface: almost a superset of the system edit-control message set plus the two OLE interfaces, IRichEditOle and IRichEditCallback.
8. OLE embedded objects: requires client collaboration based on IRichEditOle and IRichEditCallback interfaces.
9. Right-button menu support: needs IRichEditOleCallback interface.
10. Drag & Drop editing.
11. Notifications: WM_COMMAND messages sent to client plus a number of others. Superset of common-control notifications
12. Single-level undo/redo.
13. Simple vertical text (Far East builds only)
14. IME support. (Far East builds only)
15. WYSIWYG editing using printer metrics. This is needed for WordPad, in particular.
16. Cut/Copy/Paste/StreamIn/StreamOut with plain text (CF_TEXT) or RTF with and without objects.
17. C code base
18. Different builds for different scripts.
RichEdit 2.x Additions
1. Unicode. Big effort needed to maintain compatibility with existing nonUnicode documents, i.e., ability to convert to/from nonUnicode plain and rich text. Substantial effort needed to run correctly on Win95.
2. General international support. General line breaking algorithm (extension of Kinsoku rules), simple font linking, keyboard font switching.
3. FE support. E.g., Level 2 and 3 IME support
4. Find Up as well as down.
5. BiDi support (RichEdit 2.1)
6. Multilevel undo. Extensible undo architecture that allows client to participate in app-wide undo model.
7. Magellan mouse support
8. Dual-font support. Keyboard can automatically switch fonts when active font is inappropriate for current keyboard, e.g., Kanji characters in Times New Roman.
9. Smart font apply. Font change request doesn’t apply Western fonts to FE characters.
10. Improved display. An off-screen bitmap is used when multiple fonts occur on the same line. This allows, for example, the last letter of the word “cool” not to be chopped off.
11. Transparency support. Also in windowless mode.
12. System selection colors. Used for selecting text
13. AutoURL recognition
14. Word edit UI compatibility. Selection, cursor-keypad semantics.
15. Word standard EOP (end-of-paragraph mark: CR). Can also handle CRLF
16. Plain-text controls as well as rich-text. Single character format and single paragraph format.
17. Single-line controls as well as multiline. Truncate at first end-of-paragraph and no word wrap.
18. Accelerator and Password Controls.
19. Scalable architecture to reduce instance size.
20. Windowless operation and interfaces (ITextHost/ITextServices). Added primarily for Forms^3.
21. Com dual interfaces: TOM (Text Object Model)). This powerful set of interfaces is described separately.
22. CHARFORMAT2. Added font weight, background color, locale ID, underline type, superscript/subscript (in addition to offset), disabled effect. For RTF roundtripping only, added amount to space between letters, twip size above which to kern character pair, animated-text type, various effects: font shadow/outline, all caps, small caps, hidden, embossed, imprint, and revised.
23. PARAFORMAT2. Added space before/after and Word line spacings. For RTF roundtripping only, added shading weight/style, numbering start/style/tab, border space/width/sides, tab alignment/leaders, various Word paragraph effects: RTL paragraph, keep, keep-next, page-break-before, no-line-number, no-widow-control, do-not-hyphenate, side-by-side.
24. More RTF roundtripping. All of Word’s FormatFont and FormatParagraph properties.
25. Improved OLE support.
26. Code Stability and stabilization. E.g., parameter and object validation, function invariants, re-entrancy guards, object stabilization, etc.
27. Strong testing infrastructure including extensive regressions tests and Genesis testing. Shipped with no priority 1 or 2 bugs and not many postponed bugs.
28. Improved Performance. Smaller working set, faster load and redisplay times, etc.
29. C++ code base. The code is written in C++. Provided a solid foundation on which to build RichEdit 3.0.
RichEdit 3.0 Feature Additions
1. Zoom. The zoom factor is given by ratio of two longs.
2. Paragraph numbering (single-level). Numeric, upper/lower alphabetic or Roman numeral.
3. Simple tables (no wrap inside cells). Limited UI: no resizing, but can delete/insert rows. With LineServices, can align columns centered, flush right, and decimal. Cells are simulated by tabs, so text tabs and carriage returns are replaced by blanks.
4. Normal and heading styles. Built-in normal style and heading styles 1 through 9 are supported by the EM_SETPARAFORMAT and TOM APIs.
5. Outline view (similar to Word’s). Supports normal style and headings 1 through 9. Can collapse to heading level n, promote/demote headings/text, move paragraphs up/down. Can persist collapse status.
6. More underline types (dashed, dash-dot, dash-dot-dot, dot)
7. Underline coloring. Underlined text can be tagged with one of 15 document choices for underline colors.
8. Hidden text. Marked by CHARFORMAT2 attribute. Handy for roundtripping of information that ordinarily shouldn’t be displayed.
9. More default hot keys, which act as Word’s default hot keys act. E.g., European accent dead keys (US keyboards only) and outline-view hot keys. Number hot key (Ctrl+L) cycles through numbering options available, starting with bullet.
10. Smart-quotes (toggled on/off by Ctrl+") for US keyboards.
11. Soft hyphens. (0xAD in plain text; \- in RTF).
12. Italics Caret/Cursor. Also hand cursor over URLs.
13. LineServices Option: RichEdit 3.0 can use Office’s LineServices component for line breaking and display. This elegant option was added primarily to facilitate handling complex scripts (BiDi, Indic, and Thai). In addition a number of improvements occur for simple scripts, e.g., center, right, and decimal tabs, fully justified text, underline averaging giving a uniform underline even when adjacent text runs have different font sizes. It opens the door to incorporating LineServices FE enhancements, such as Ruby, Warichu, Tatenakayoko, and vertical text. LineServices also paves the way for WYSIWYG editing of built-up mathematical expressions and RichEdit 3.x looks like the ideal development environment for this.
14. Complex Script Support: RichEdit 3.0 will support BiDi (text with Arabic and/or Hebrew mixed with other scripts), Indic (Indian scripts like Devangari), and Thai. For support of these complex scripts, the LineServices and NT Uniscribe components are used, which run on Win95 and later OSs.
15. Font binding: RichEdit 3.0 will automatically choose an appropriate font for characters that clearly do not belong to the current charset stamp. This is done by assigning charsets to runs and associating fonts with those charsets. Please see the section on Font Binding below.
16. Charset-specific plain-text read/write options, notably ability to read a file using one charset and write it with a different one.
17. UTF-8 RTF. Used preferentially for cut/copy/paste and optionally externally, this file format is substantially more compact than ordinary RTF, faster, and is completely faithful to Unicode.
18. Office 9 IME support (MSIME98). This more powerful IME capability has been factored out into an independent module (see RichEdit Architectural Improvements). Features include:
a. Reconversion - In the past, the user needs to delete the final string first and then type in a new string to get to the correct candidate. This feature enables the user to convert the final string back to composition mode, allowing easy selection of a different candidate string.
b. Document feed - This feature provides IME98 with the text for the current paragraph, which helps IME98 to do more accurate conversion during typing.
c. Mouse Operation - This feature allows the user to have better control over the candidate and UI windows during typing.
d. Caret position - This feature provides the current caret and line information, which IME98 uses to position UI windows (e.g., candidate list).
19. AIMM support. Users can invoke the IE/AIMM object, which enables users to enter Far East characters on US systems (NT4.0 & Win95).
20. More RTF round tripping.
21. Improved 1.0 compatibility mode, e.g., MBCS to/from Unicode character-position (cp) mappings. Is being used to emulate RE 1.0 in NT 5.
22. Increased Freeze Control. The display can be frozen over multiple API calls and then unfrozen to display the updates.
23. Increased Undo Control. Undo can be suspended and resumed (needed for IME).
24. Increase/Decrease Font Size. Increases or decreases font size to one of six standard values (12, 28, 36, 48, 72, 80 pts).
RichEdit 3.0 Architectural Improvements
1. Input module: IME has been factored out into separate generally usable input module that supports the latest Office 9 IMEs. RichEdit 3.0 itself knows nothing of IMEs! In principle other IME clients can use this input module. Did need to add some methods to RichEdit’s object model (the approach is discussed in a separate section).
2. Virtual Win32 Environment: OS-dependent calls have been separated out into a class of their own. RE 3.0 works in a virtual Win32 with some multilingual enhancements. Most calls are static, so no runtime overhead is encountered. Facilitates building RichEdit with different OSs, e.g., Windows CE.
3. Factored Rich Text status: allows aspects of rich text to be used with plain-text semantics. E.g., multiple fonts, coloring, and underlining. Useful for font binding and IME highlighting. Plain text UI remains the same, so EM_SETCHARFORMAT and EM_SETPARAFORMAT apply to whole control.
4. Dual Line Methods. Lines can be broken, queried, and displayed with or without LineServices. Simple text can be handled with small instance size and higher speed. More sophisticated text can use the elegant LineServices component.
RichEdit 3.0 Performance Improvements and Maintenance
1. Many performance/size improvements.
a) reduced size of (to 1/3) and generalized internal versions of RichEdit 2.0’s character and paragraph formatting structures (CHARFORMAT2 and PARAFORMAT2). Easy to add properties to these important structures, although the additions typically won't be available to the message interface.
b) reduced size of many other structures as well.
c) declared constant data structures const, so that they are included in the code segment and are shared by all active processes.
d) reduced the number of system calls by more caching of frequently used data
e) eliminated redundant code.
2. Faster startup time: most initialization is postponed to the creation of the first control. C runtime is no longer needed.
3. Cleaned up code base. Used the same notation (Hungarian, etc.) for local variable names throughout. Added many new comments and improved many old comments. Counts are now LONGs rather than the nefarious DWORDs, which might be described as “wishful thinking”! Eliminated evolutionary dead code. Simplified C++ model: no more multiple inheritance and almost no operator overloading (except for new and assignment).
4. Numerous bug fixes. Eliminated some memory leaks and reference counting errors. Fixed various bugs postponed from RichEdit 2.x.
RichEdit 3.0 Rich System Controls
1. System edit-control mode that emulates the OS edit controls more accurately.
2. ListBox and ComboBox controls similar to system versions, but supporting Unicode and font binding on Win95 as well as on NT. These controls can be made rich, opening the door to substantially more elegant dialogs.
What RichEdit 3.0 Isn't
1. Native HTML control. There are HTML « RTF converters that can be used with RichEdit. There’s the Trident control, which is substantially bigger.... We have a prototype for direct HTML I/O that uses the TOM interfaces, but it hasn’t been tested adequately for general use. This prototype only roundtrips HTML that RichEdit understands.
2. Active X control. We have a prototype RichEdit Active X control (ATL), but it too hasn’t undergone testing. Note there is a RichEdit 1.0 Active X control and in the future there may be a VB control based on RichEdit 3.0.
3. MFC RichEdit class. Note there is a RichEdit 1.0 MFC class.
4. Multistory editor (like Word). Each RichEdit instance corresponds to a single story. Word has many stories, e.g., body text, header, footer, footnote, textbox. A RichEdit instance can be used for any one of those, but to handle more, you need one instance for each story.
Office 97 SDM
Office 9 SDM (3.0)
Office 9 Command Bars (3.0)
Word 97 (non-SDM dialogs)
Default Exchange Client
Outlook 97 body/to/from/subject/notes
Outlook 9 body/to/from/subject/notes
Pocket Word 2.0
WordPad (NT 5.0)
MFC RichText Control
VB RichText Control
Forms^3 97 edit engine
Forms^3 9 edit engine
Layout Control Pack for IE
FrontPage source viewer
How Create a RichEdit Instance (1)
HRESULT hRE = LoadLibrary("RICHED20");
hwndRE = CreateWindow(TEXT("RichEdit20W"), TEXT(""),
rc.right - rc.left, rc.bottom - rc.top, hwndParent,
NULL, hinst, NULL);
... // Send messages to hwndRE
How Create a RichEdit Instance (2)
A RichEdit control is based on an ITextHost object interacting with an ITextServices object. The latter doesn’t have a window of its own. The CreateWindow() call above creates an ITextHost object, which, in turn, creates an ITextServices object.
Alternatively, you can create an ITextHost object directly that, in turn, creates as many ITextServices objects as you desire. This is the way Forms^3 uses RichEdit for dialogs. It’d also be a great way to make a table object, for which each cell would have its own ITextServices object.
The way to create an ITextServices object is to call the function (it’s a bit complicated, since it allows the object to be aggregated)
IUnknown *punkOuter, // Outer unknown, may be NULL
ITextHost *phost, // Client's ITextHost; must be valid
IUnknown **ppUnk); // Private IUnknown of text services engine
if(FAILED(CreateTextServices(NULL, this, &pUnk)))
hr = pUnk->QueryInterface(IID_ITextServices, (void **)&_pserv);
You can then use the the _pserv pointer to call any ITextServices method, including TxSendMessage(), which is a faster way to send messages to the control than the system SendMessage(). But warning: CreateWindow() and the usual message interface is substantially easier to implement, since you don’t have to create an ITextHost object. As shown below, if all you want to do ist to use some ITextServices methods, you can get an ITextServices interface to a control created by CreateWindow().
How to use RichEdit
There are five main ways to use a RichEdit 2.x or 3.0 control:
2. ITextServices methods
3. Keyboard input including cut/copy/paste
4. File read/write (plain text or RTF)
5. TOM (Text Object Model) methods
The most familar ways (messages and keyboard) are useful, but may not have the performance or functionality that you need. We describe each of these approaches in the remainder of this talk.
For ordinary keyboard input (not IME), RichEdit acts very similarly to Word. Word has more hot keys, but the cursor keypad and letter/punctuation keys work essentially the same way. Ditto for mouse operations.
RichEdit Message Interface
There are many RichEdit messages. In addition to the system edit control messages defined in winuser.h, there are many new messages defined in richedit.h. All edit messages handled by RichEdit (specifically by ITextServices::TxSendMessage()) are listed below. System edit and RichEdit 1.0 messages are defined in the system SDK. RichEdit 2.0 and 3.0 messages aren’t documented in my copy of the SDK, but should be documented on http://richedit sometime soon, and in the SDK sometime later. Note that a number of RichEdit 1.0 messages have been generalized in later versions. E.g., EM_STREAMIN/OUT take an optional codepage value (which can be 1200, i.e., Unicode, or CP_UTF8, i.e., UTF-8). RichEdit only understands enough about IME messages to know to invoke the IME input module (see Input Module). Hence not all IME messages are listed below.
System edit control messages not handled by RichEdit
System edit control messages handled by RichEdit
RichEdit 1.0 messages
RichEdit 2.0 messages
Far East specific messages (some are RE 1.0)
RichEdit 3.0 messages
BiDi specific messages
Extended edit style specific messages
Outline view message
Message for getting and restoring scroll pos
Zoom and increment/decrement fontsize
The RTF control words recognized by RichEdit are given below. Not all of these control words are fully implemented, but almost all are round tripped.
adeff, animtext, ansi, ansicpg, b, bgbdiag, bgcross, bgdcross, bgdkbdiag, bgdkcross, bgdkdcross, bgdkfdiag, bgdkhoriz, bgdkvert, bgfdiag, bghoriz, bgvert, bin, blue, box, brdrb, brdrbar, brdrbtw, brdrcf, brdrdash, brdrdashsm, brdrdb, brdrdot, brdrhair, brdrl, brdrr, brdrs, brdrsh, brdrt, brdrth, brdrtriple, brdrw, brsp, bullet, caps, cbpat, cell, cellx, cf, cfpat, clbrdrb, clbrdrl, clbrdrr, clbrdrt, collapsed, colortbl, cpg, cs, deff, deflang, deflangfe, deftab, deleted, dibitmap, disabled, dn, embo, emdash, emspace, endash, enspace, emdash, expndtw, f, fbidi, fchars, fcharset, fdecor, fi, field, fldinst, fldrslt, fmodern, fname, fnil, fonttbl, footer, footerf, footerl, footerr, footnote, fprq, froman, fs, fscript, fswiss, ftech, ftncn, ftnsep, ftnsepc, green, header, headerf, headerl, headerr, highlight, hyphpar, i, impr, info, intbl, keep, keepn, kerning, lang, lchars, ldblquote, li, line, lnkd, lquote, ltrch, ltrdoc, ltrmark, ltrpar, macpict, noline, nosupersub, nowidctlpar, objattph, objautlink, objclass, objcropb, objcropl, objcropr, objcropt, objdata, object, objemb, objh, objicemb, objlink, objname, objpub, objscalex, objscaley, objsetsize, objsub, objw, outl, page, pagebb, par, pard, piccropb, piccropl, piccropr, piccropt, pich, pichgoal, picscalex, picscaley, pict, picw, picwgoal, plain, pmmetafile, pn, pndec, pnindent, pnlcltr, pnlcrm, pnlvlblt, pnlvlbody, pnlvlcont, pnqc, pnqr, pnstart, pntext, pntxta, pntxtb, pnucltr, pnucrm, protect, pwd, qc, qj, ql, qr, rdblquote, red, result, revauth, revised, ri, row, rquote, rtf, rtlch, rtldoc, rtlmark, rtlpar, s, sa, sb, sbys, scaps, sect, sectd, shad, shading, sl, slmult, strike, stylesheet, sub, super, tab, tb, tc, tldot, tleq, tlhyph, tlth, tlul, tqc, tqdec, tqr, trbrdrb, trbrdrl, trbrdrr, trbrdrt, trgaph, trleft, trowd, trqc, trqr, tx, u, uc, ul, uld, uldash, uldashd, uldashdd, uldb, ulhair, ulnone, ulth, ulw, ulwave, up, utf, v, viewkind, viewscale, wbitmap, wbmbitspixel, wbmplanes, wbmwidthbytes, wmetafile, xe, zwj, zwnj.
ITextServices Windowless Interface
As described above, you can get an ITextServices interface using CreateTextServices(), but this requires that you implement your own ITextHost object. If you use CreateWindow() instead, you can still use ITextServices methods by using the following code:
SendMessage(hedit, EM_GETOLEINTERFACE, 0, (LPARAM)&punk);
hr = pUnk->QueryInterface(IID_ITextServices, (void **)&_pserv);
.... // Use _pserv methods
All ITextServices methods are typed simply as HRESULT. This differs from standard com interface functions, which are typed HRESULT STDMETHODCALLTYPE. The methods are:
TxSendMessage(msg, wparam, lparam, plresult)
TxDraw(dwDrawAspect, lindex, pvAspect,ptd, hdcDraw,
hicTargetDev, lprcBounds, lprcWBounds, lprcUpdate,
pfnContinue, dwContinue, lViewId)
TxGetHScroll(plMin, plMax, plPos, plPage, pfEnabled)
TxGetVScroll(plMin, plMax, plPos, plPage, pfEnabled)
OnTxSetCursor(dwDrawAspect, lindex, pvAspect, ptd,
hdcDraw, hicTargetDev, lprcClient, x, y)
TxQueryHitPoint(dwDrawAspect, lindex, pvAspect, ptd,
hdcDraw, hicTargetDev, lprcClient, x, y, pHitResult)
TxGetNaturalSize(dwAspect, hdcDraw, hicTargetDev, ptd, dwMode,
psizelExtent, pwidth, pheight)
Getting to the TOM Interfaces
// Skeleton function to manipulate text using TOM ITextRange interface
HRESULT Manipulate(HWND hedit)
IUnknown * punk;
ITextRange * prg;
SendMessage(hedit, EM_GETOLEINTERFACE, 0, (LPARAM)&punk);
hr = punk->QueryInterface(IID_ITextDocument, (void **)&pdoc);
hr = pdoc->Range(0, 0, &prg);
RichEdit 3.0 will assign a charset to plain-text characters depending on their context. E.g., Hangul symbols get HANGUL_CHARSET, nonneutral ANSI characters get ANSI_CHARSET in any event, Chinese characters get SHIFTJIS_CHARSET if kana characters are found nearby and GB2312_CHARSET if no kana are found nearby. Greek characters get GREEK_CHARSET, etc. Note that we’re using Unicode internally, so this use of charset differs from the original one used in font specifications. But charset seems to be a pretty good match with what we want, which is a script, and our CHARFORMAT has a well-defined place for the charset. It also helps with some anomalies in Win95, where we can't always use Unicode. Neutral characters like blanks and digits get assigned a charset depending on their context. For example, a blank surrounded by characters of the same charset gets that charset. More generally neutrals/digits for BiDi text are assigned charsets in a way based on the Unicode BiDi algorithm. Once charsets are assigned, we scan the text around the insertion point forward and backward to find the nearest fonts that have been used for the charsets. If no font is found for a charset, we use the font chosen by the client for that charset. If the client hasn’t specified a font for the charset, we use the default Office 9 font for that charset. If the client wants some other font, it can always change it, but the hope is that this approach will work most of the time. Our current default font choices are based on the following table:
Western, CE, ME...
Times New Roman
Hence in our default font-binding table (entries have charset, facename, size), we allow ANSI_CHARSET to match all 8 125x charsets, while the appropriate charset matches other fonts on a one-to-one basis. More precisely, we use the ANSI_CHARSET choice whenever no other alternative is found. The client will be able to specify a finer granularity than this, e.g., assign a specific ARABIC_CHARSET for Arabic runs, a specific Greek font for Greek runs, etc. This finer granularity will also be used if a font with the desired charset stamp is found somewhere in the document before the area being font-bound.