When Formula Autobuildup Occurs

People, especially testers, often ask when does formula autobuild up (FAB) occur? After using it for a while to enter equations into Word, one gets a feel for how it works, but may still wonder if there’s some well defined way to predict autobuild up. A general answer is that build up occurs when the user enters a character that is unambiguously not part of the previous mathematical expression(s). As such the character itself is not part of the resulting built-up expression. Examples are very helpful in giving the idea of when build up happens, but they are not able to answer when buildup occurs in general. To predict in a particular mathematical context, one needs to understand the relevant linear format rules. That may not be easy, since to make things work naturally, the rules can be quite involved. This post describes some of this methodology in greater detail. For completeness, changes involving italicization of math variables are also covered here, although this isn’t formally part of formula autobuildup. To handle formula autobuildup and italicization, the MathBuildUp function needs to look at every character typed by the user in a math zone.

The linear format grammar is heavily recursive. For example, a subscript expression is defined in terms of subscript operand, which, in turn, is defined in terms of more general operand, which, in turn, is defined in terms of subscript expression. A computer program can keep tract of such intricate recursion, but for human beings a simpler precedence-oriented technique can be used that gives the same results. In fact, such a technique has the advantage of being more efficient for the computer as well, so MathBuildUp uses it instead of the linear-format grammar directly.

The operators in the linear format grammar have precedences which, in increasing value, are open, close, list, concatenation, divide, n-ary, subsub, enclosure, and accent. Nominally the deal is that spans of nonoperator characters are pushed onto a rich-text string stack, and operators are pushed onto an operator stack or popped off it making built-up objects, subject to tailored precedence rules. These rules are enhanced versions of those that one might use to implement a four-function calculator in Computer Science 101. For example, if a concatenation operator like plus is encountered with a division operator (which has higher precedence than plus) like divide on top of the operator stack, then the corresponding fraction is ready to build up as a built-up 2D fraction object. Operators of precedence “open” like the left parenthesis are almost invariably pushed onto the stack, since they represent the start of a subexpression which needs to be calculated. In contrast operators of type “close” are almost never pushed on the stack, since they complete a delimited (bracketed) expression and should force build up of all operators back to the corresponding open operator and then build up of the delimited expression itself or discard of the delimiters in some cases.

Note that many characters that are not operators in algebra nevertheless behave as operators in the linear format. This includes space characters, along with arithmetic operators like +, *, =, etc. Note also that the absolute-value and norm operators require a more complicated formalism to handle (sometimes a ‘|’ acts like an open operator and sometimes like a close). Similarly period and comma are trickier, since when sandwiched between ASCII digits they’re treated as parts of the surrounding numbers, while otherwise they have the precedence of a concatenation operator like plus.

Given a degenerate range or insertion point (the clue to attempt autobuildup), the first thing MathBuildUp does it to check for conversions to/from math italic. If the character typed is an ASCII or Greek alphabetic with a math italic counterpart, then the character is translated to the math italic version and MathBuildUp returns.

If the character is ‘_’, ‘^’, or ‘ ’, and the preceding character is a math italic and still more characters precede that character in the document, then a span of math italics is compared to a dictionary of function names. If found, the name is translated back to ordinary text, e.g., “sin” is translated back to “sin” and MathBuildUp returns. Note that the main part of MathBuildUp may also perform such transformations. In Word 2007, users can change the contents of the function-name dictionary.

If no such translation is made, MathBuildUp establishes the actual range to check for formula build up. It does this by going back to the start of the current argument if the insertion point (IP) is inside a built-up function or else to the start of the math zone or hard carriage return (CR) preceding the IP, whichever is closest to the IP. Then the choice is narrowed by advancing to the first major build-up operator. If such an operator is found before getting back to the IP, then the range is expanded backward to include the numerator for a division operator or the script base for ‘_’ or ‘^’. Then build up is attempted on the text in this range.

MathBuildUp scans this range, pushing simple operands (spans of atoms) onto a rich-text string stack and operators onto an operator stack. Unlike the simple calculator case which uses plain-text operand strings, MathBuildUp needs to use rich-text strings, since the operands can have various combinations of attributes like bold, italic, revision markings, font styling, etc., as well as embedded objects like pictures or already built-up formulas. Formula build up of an expression is usually attempted when a close operator immediately follows the expression, or when the operator isn’t an open and one of the following conditions is true:

· Precedence of the operator is less than that of the previous operator

· Precedences of the operator and the previous operator both equal concatenation, division, or subsup.

If the expression so found is valid, the strings contributing to the built-up form are combined appropriately into the built-up form, which ends up becoming the top of the string stack and the operator stack is popped accordingly.

After the whole range is processed, if all the text has been found valid and something has been converted, the converted text replaces the corresponding original linearly formatted text.

Alternatively if something is still invalid about the whole range, but the top string on the stack has been built up correctly, then this string alone replaces the linearly formatted text corresponding to this string. So if you type “(a+b/c+” the fraction builds up even though the expression as a whole isn’t syntactically correct, since there’s an unmatched left parenthesis. For this string to be correct, you need to add a corresponding right delimiter, which can be a \close operator if you don’t want it to display a glyph.

Naturally there are lots of embellishments to these rules and MathBuildUp is an intricate piece of code. But it sure is a joy to use.