Math Find/Replace and Rich Text Searches

A number of readers have inquired how to Find/Replace mathematical expressions in Word 2007. This post shows how it could be done nicely, although unfortunately this functionality didn’t make it into Word 2007. A previous post shows how to find simple variables in a math zone. The basic idea of finding more complex expressions is to use a rich-text search.

A rich-text search matches one or more rich-text properties in addition to matching the associated plain text according to various options. The basic algorithm for a rich-text search is to loop on a plain-text search followed by tests for the desired rich text properties. If a plain-text hit also satisfies the rich-text property tests, then a desired rich-text search hit is found.

To illustrate this approach, consider searching for a mathematical expression. This functionality ships in Office 2007’s RichEdit control, although it’s not used by any applications to date and it’s only partially described in a Microsoft Confidential document. In math built-up format, as distinguished from math linear format, mathematical objects like fraction and subscript are represented by a start delimiter, the first argument, an argument separator if the object has more than one argument, the second argument, etc., with the final argument terminated by an end delimiter. For example, the fraction a over b is represented in built-up format by {frac a|b} where {frac is the start delimiter, | is the argument separator, and } is the end delimiter. Similarly the subscript object ab is represented by {sub a|b }. Here the start delimiter is the same character for all math objects and is the Unicode character U+FDD0 in RichEdit (Word uses a different character). The kind of the object is specified by a rich-text object-name property associated with the start delimiter. So in plain text, the built-up forms of the fraction and subscript are identical if the fraction arguments are the same as their subscript counterparts. In the example here, a plain-text search for {frac a|b} matches {sub a|b } as well {frac a|b}.

Searches generally deal with plain text only, so a search for a fraction would match any object with two arguments if the arguments are the same as those of the fraction. A rich-text search is able to match only fractions when searching for fractions and only subscripts when searching for subscripts. In general, only one kind of built-up object is matched.

This is accomplished by executing an iterative loop with a plain-text search followed by a check on the object-name property for each math object in the search text. So long as the checks for object names fail and there's more text to search in the target text, the loop iterates. If there’s no more text to search, the search fails. If a plain-text match occurs and each math object has the same name in the search text as its counterpart in the target text, then the loop is exited with a successful search. Else iteration continues.

This kind of search is a special case of fuzzy rich-text searches in which a single rich-text property has to match for certain text runs as well as having a plain-text match for the whole search string. Other kinds of rich text searches require more or all rich-text properties in the search and target texts to match. For years Microsoft Word has offered one such kind of rich-text search: it requires the uniform character formatting of the whole source string to match that of a target hit as well as having a plain-text match. The math search discussed here differs in that only the math-object start-delimiter name property of each object needs to match.

Find/Replace combines this Find process with the option to replace the found expression with the mathematical expression entered into Replace text control. A cool way to enter the desired Find and Replace strings in the Find/Replace dialog text fields is to type Alt+= to turn on the math zone in these fields and then type the desired math expressions using the linear format as in Word 2007. The text controls need to be RichEdit rich-text controls to do this. Or you can paste the desired math expressions into the Find and Replace text fields. I'll give a more complete specification for how this is done in RichEdit in a later post.