Linear Format Notations for Mathematics

Article
08/30/2010

I have been having a great discussion with Christian Lerch about computer-oriented mathematical notations. He has a program that lets you input MathML using a pure ASCII syntax. It is similar to ASCIIMathML. A lightly commented EBNF grammar of his MathEL language as implemented for the time being (still beta and evolving a bit) is given in https://km-works.eu/mathel-interactive/img/MathEL-ebnf.txt. Christian has a more elaborate reference manual in the works. One interesting feature is that he omits the leading backslash in the TeX-like names for symbols. This approach is definitely more readable and is the choice for eqn/troff distributed with Unix. TeX loomed so large in our thinking that we just automatically used the [La]TeX names by default. Office’s autocorrect facility allows users to define any names they want so users can add names without leading backslashes. Perhaps I can add an option that does not require the backslashes in math zones.

Another approach has been discussed, along with many earlier mathematical notations, in the article Stephen Wolfram's Mathematical Notation: Past and Future (2000). Wolfram’s discussion is both edifying and entertaining to read. I think he makes considerable progress with Mathematica’s notation, including introducing the five special characters we ended up adding to Unicode (U+2145..U+2149) for things like differential d (U+2146). However, his thinking is a little constrained by a very desirable goal: whatever math you write should be computable by Mathematica.

Something like Presentation MathML has sufficient flexibility to produce mathematics that cannot be computed without additional context and semantics. Once you allow a notation to have such ambiguity, you can go farther than Wolfram is willing to go. Wolfram calls the linear format “StandardForm” and the presentation format “TraditionalForm”.

I think that a linear format resembles traditional math notation far more closely if one uses Unicode symbols whenever possible. Having an autocorrection facility translate ASCII names for operators and Greek letters into Unicode symbols provides a convenient way to input the desired symbols, but the real format is defined in terms of Unicode symbols, not the ASCII names for these symbols. In addition to looking more mathematical, the notation is then automatically globalized, aside from some Arabic locales. The [La]TeX ASCII symbol names have a minor English-language prejudice. Hopefully this bias isn’t too objectionable to non-English-language speakers. The Unicode symbols are international by their very nature. As such an international linear format for math should use Unicode symbols, rather than ASCII, although it is fine to use the latter for input.

In addition it is nice to have little fix ups that reduce the number of brackets/parens needed to overrule operator precedence. For example, int_-infty^infty should be okay, since it’s clear by context that the – in –infty must be a unary minus. Also it’s nice to have a^b^c mean something (treat it right associatively), unlike in TeX, where it’s an error. OTOH, a^b_c is different, since TeX’s meaning for it is quite handy. Expressions like 1/.2 should create a fraction with .2 in the denominator rather than two sentence fragments. So the ASCII period and comma end up being handled differently when part of a number than in ordinary text. You also need a way to bind in n-aryands, like integrands and summands. So I introduced the glue operator U+2592, which both glues the n-aryand to the n-ary operator and terminates a limit, if one appears. To keep things looking natural, I considered a simple operand to be an alphanumeric span, rather than TeX’s single letter. The expression a_123 is a sub 123 rather than a sub 1 multiplied by 23. A complex operand can include parenthesized expressions so that f(x)/g(x) builds up to the ratio of f(x) to g(x).

Another requirement is to be able to round-trip the presentation (Professional/built-up) format through the linear format and back. This feature is very handy, since it lets us represent essentially arbitrary mathematical expressions/equations in “plain” text. One particular use is for user definable autocorrect entries. For example, you can assign \binomial with the linear format (a+b)^n=∑_(k=0)^n ▒(n¦k)a^k b^(n-k) and presto! When you type \binomial <space> or carriage return, it builds up to the binomial formula. Well not yet in Word 2010 (you need to select Professional Format explicitly), but it does in PowerPoint 2010 and OneNote 2010.

For special edge cases, I needed a couple of somewhat arcane constructs like the lenticular brackets〖〗for round-tripping compound arguments since if parentheses were used, the parentheses would be displayed. The result is that the Unicode linear format is a very general mathematical notation, but it is not a context-free grammar. This would certainly upset Wolfram and it does make it harder to implement and maintain in its full generality, although it is quite efficient in practice. It is the closest to a true mathematical notation that I could define. Let me hasten to add that many, many people, inside and outside Microsoft, have contributed ideas that shaped the version that Microsoft Office 2010 offers.

Bertrand Russell once wrote, “A good notation has a subtlety and suggestiveness which at times make it seem almost like a live teacher…and a perfect notation would be a substitute for thought.” The Unicode linear format certainly isn’t a substitute for thought, but it is a more mathematically natural notation than previously available on computers.

Linear Format Notations for Mathematics

Additional resources