When Formula Autobuildup Occurs

People, especially testers, often ask when does formula autobuild up (FAB) occur? After using it for a while to enter equations into Word, one gets a feel for how it works, but may still wonder if there’s some well defined way to predict autobuild up. A general answer is that build up occurs when the user enters a character that is unambiguously not part of the previous mathematical expression(s). As such the character itself is not part of the resulting built-up expression. Examples are very helpful in giving the idea of when build up happens, but they are not able to answer when buildup occurs in general. To predict in a particular mathematical context, one needs to understand the relevant linear format rules. That may not be easy, since to make things work naturally, the rules can be quite involved. This post describes some of this methodology in greater detail. For completeness, changes involving italicization of math variables are also covered here, although this isn’t formally part of formula autobuildup. To handle formula autobuildup and italicization, the MathBuildUp function needs to look at every character typed by the user in a math zone.

The linear format grammar is heavily recursive. For example, a subscript expression is defined in terms of subscript operand, which, in turn, is defined in terms of more general operand, which, in turn, is defined in terms of subscript expression. A computer program can keep tract of such intricate recursion, but for human beings a simpler precedence-oriented technique can be used that gives the same results. In fact, such a technique has the advantage of being more efficient for the computer as well, so MathBuildUp uses it instead of the linear-format grammar directly.

The operators in the linear format grammar have precedences which, in increasing value, are open, close, list, concatenation, divide, n-ary, subsub, enclosure, and accent. Nominally the deal is that spans of nonoperator characters are pushed onto a rich-text string stack, and operators are pushed onto an operator stack or popped off it making built-up objects, subject to tailored precedence rules. These rules are enhanced versions of those that one might use to implement a four-function calculator in Computer Science 101. For example, if a concatenation operator like plus is encountered with a division operator (which has higher precedence than plus) like divide on top of the operator stack, then the corresponding fraction is ready to build up as a built-up 2D fraction object. Operators of precedence “open” like the left parenthesis are almost invariably pushed onto the stack, since they represent the start of a subexpression which needs to be calculated. In contrast operators of type “close” are almost never pushed on the stack, since they complete a delimited (bracketed) expression and should force build up of all operators back to the corresponding open operator and then build up of the delimited expression itself or discard of the delimiters in some cases.

Note that many characters that are not operators in algebra nevertheless behave as operators in the linear format. This includes space characters, along with arithmetic operators like +, *, =, etc.  Note also that the absolute-value and norm operators require a more complicated formalism to handle (sometimes a ‘|’ acts like an open operator and sometimes like a close). Similarly period and comma are trickier, since when sandwiched between ASCII digits they’re treated as parts of the surrounding numbers, while otherwise they have the precedence of a concatenation operator like plus.

Given a degenerate range or insertion point (the clue to attempt autobuildup), the first thing MathBuildUp does it to check for conversions to/from math italic. If the character typed is an ASCII or Greek alphabetic with a math italic counterpart, then the character is translated to the math italic version and MathBuildUp returns.

If the character is ‘_’, ‘^’, or ‘ ’, and the preceding character is a math italic and still more characters precede that character in the document, then a span of math italics is compared to a dictionary of function names. If found, the name is translated back to ordinary text, e.g., “sin” is translated back to “sin” and MathBuildUp returns. Note that the main part of MathBuildUp may also perform such transformations. In Word 2007, users can change the contents of the function-name dictionary.

If no such translation is made, MathBuildUp establishes the actual range to check for formula build up. It does this by going back to the start of the current argument if the insertion point (IP) is inside a built-up function or else to the start of the math zone or hard carriage return (CR) preceding the IP, whichever is closest to the IP. Then the choice is narrowed by advancing to the first major build-up operator. If such an operator is found before getting back to the IP, then the range is expanded backward to include the numerator for a division operator or the script base for ‘_’ or ‘^’. Then build up is attempted on the text in this range.

MathBuildUp scans this range, pushing simple operands (spans of atoms) onto a rich-text string stack and operators onto an operator stack. Unlike the simple calculator case which uses plain-text operand strings, MathBuildUp needs to use rich-text strings, since the operands can have various combinations of attributes like bold, italic, revision markings, font styling, etc., as well as embedded objects like pictures or already built-up formulas. Formula build up of an expression is usually attempted when a close operator immediately follows the expression, or when the operator isn’t an open and one of the following conditions is true:

·         Precedence of the operator is less than that of the previous operator

·         Precedences of the operator and the previous operator both equal concatenation, division, or subsup.

If the expression so found is valid, the strings contributing to the built-up form are combined appropriately into the built-up form, which ends up becoming the top of the string stack and the operator stack is popped accordingly.

After the whole range is processed, if all the text has been found valid and something has been converted, the converted text replaces the corresponding original linearly formatted text.

Alternatively if something is still invalid about the whole range, but the top string on the stack has been built up correctly, then this string alone replaces the linearly formatted text corresponding to this string. So if you type “(a+b/c+” the fraction builds up even though the expression as a whole isn’t syntactically correct, since there’s an unmatched left parenthesis. For this string to be correct, you need to add a corresponding right delimiter, which can be a \close operator if you don’t want it to display a glyph.

Naturally there are lots of embellishments to these rules and MathBuildUp is an intricate piece of code. But it sure is a joy to use.

Comments (4)

  1. math poweruser says:

    I just want to say that the equation editor is a huge pain to use!

    What’s the point of having wysiwyg functionality if it doesn’t even let you select what you want (for copy & paste or edit)? Half of the time (especially with integrals) I can’t select certain part of the equation, because word suddenly selects the whole thing or some random parts. This is totally annoying and forces me to retype most of the stuff I could just have copy pasted!

    Second, the parsing of hand typed formulas is totally unusable. You should do it similar to open office where you can use {} to specify which parts are affected. In word, when I type sqrt and hit space, all I get is an empty square root with no way of adding something to it.

    It should be possible to type formulas in one go without hitting space each time. Example:

    int from a to b {1 over {1 – x^2}} sqrt{x} dx

    I hope these issues get fixed asap.

  2. Robert says:

    The new math support in Word is a great improvement over the equation editor in previous versions of Office. The linear format can be very useful once you get used to it (math poweruser, try this: int_a^b 1/(1-x^2) sqrt(x) dx).

    I am having more trouble with the buildup form. I think it is confusing that bases for subscript and nary operators are part of the expression itself. In an expression like x_(n_i) you now get three caret positions between x and n although the two letters are right next to each other. Also the caret could be much more responsive than it is (sems to be a general problem of Word) and it could better adjust itself to the height of the expression being typed.

    In effect you often do not know where you are. Word tries to alleviate this by shading the current field, but this only seems to add to confusion because the shading is too similar to the selection highlight.

    BTW, thanks for your detailed explanation of OMML and math RTF. It helped me a lot in adding math RTF to my rtf writer.

  3. math poweruser says:

    Ok I agree it’s not all that bad and it’s true it’s a step up from previous versions.

    I was just very frustrated when writing above post because of word not letting me select certain parts of my formulas for copy & paste.

    Btw, is there a way to double underline an equation (such as the result)? When using the regular underline button (choosing double underline style) the whole equation gets underlined and the lines usually go through the middle of it, not below it.

    I found you can apply two times an underbar (from the accent menu), but it doesn’t look quite right because the lines are too spread out.

  4. MurrayS3 says:

    Formula autobuildup and the WYSIWYG editing are different than what people are used to, but once you get into them, they are way faster than previous approaches. The linear format also tends to resemble a true mathematical notation much more than other formats that use a myriad {} to delineate operands, e.g, TeX.

    The reason that Word seems to suddenly select the whole math object is because you’ve selected one of the structure characters of that object, i.e., the opening delimiter, or an operand delimiter. At this point it’s necessary to select the whole object to have an unambiguous selection.

    RichEdit’s shading differs from Word in a way that reveals more clearly what’s going on. As with Word, you get dark shading for the innermost argument containing the insertion point. This shading is embedded in lighter shading that reveals the function to which the argument belongs. I like this approach better than Word’s (I implemented it 🙂 but the Word folks felt it was too busy looking when combined with the math-zone acetate (Word’s rectangle around the math zone). You can see the RichEdit approach with the Microsoft Math graphing calculator that ships with the Encarta Student Edition.