Linear Format Notations for Mathematics


I have been having a great discussion with Christian Lerch about computer-oriented mathematical notations. He has a program that lets you input MathML using a pure ASCII syntax. It is similar to ASCIIMathML. A lightly commented EBNF grammar of his MathEL language as implemented for the time being (still beta and evolving a bit) is given in http://km-works.eu/mathel-interactive/img/MathEL-ebnf.txt. Christian has a more elaborate reference manual in the works. One interesting feature is that he omits the leading backslash in the TeX-like names for symbols. This approach is definitely more readable and is the choice for eqn/troff distributed with Unix. TeX loomed so large in our thinking that we just automatically used the [La]TeX names by default. Office’s autocorrect facility allows users to define any names they want so users can add names without leading backslashes. Perhaps I can add an option that does not require the backslashes in math zones.

Another approach has been discussed, along with many earlier mathematical notations, in the article Stephen Wolfram’s Mathematical Notation: Past and Future (2000). Wolfram’s discussion is both edifying and entertaining to read. I think he makes considerable progress with Mathematica’s notation, including introducing the five special characters we ended up adding to Unicode (U+2145..U+2149) for things like differential d (U+2146). However, his thinking is a little constrained by a very desirable goal: whatever math you write should be computable by Mathematica.

Something like Presentation MathML has sufficient flexibility to produce mathematics that cannot be computed without additional context and semantics. Once you allow a notation to have such ambiguity, you can go farther than Wolfram is willing to go. Wolfram calls the linear format “StandardForm” and the presentation format “TraditionalForm”.

I think that a linear format resembles traditional math notation far more closely if one uses Unicode symbols whenever possible. Having an autocorrection facility translate ASCII names for operators and Greek letters into Unicode symbols provides a convenient way to input the desired symbols, but the real format is defined in terms of Unicode symbols, not the ASCII names for these symbols. In addition to looking more mathematical, the notation is then automatically globalized, aside from some Arabic locales. The [La]TeX ASCII symbol names have a minor English-language prejudice. Hopefully this bias isn’t too objectionable to non-English-language speakers. The Unicode symbols are international by their very nature. As such an international linear format for math should use Unicode symbols, rather than ASCII, although it is fine to use the latter for input.

In addition it is nice to have little fix ups that reduce the number of brackets/parens needed to overrule operator precedence. For example, int_-infty^infty should be okay, since it’s clear by context that the – in –infty must be a unary minus. Also it’s nice to have a^b^c mean something (treat it right associatively), unlike in TeX, where it’s an error.  OTOH, a^b_c is different, since TeX’s meaning for it is quite handy. Expressions like 1/.2 should create a fraction with .2 in the denominator rather than two sentence fragments. So the ASCII period and comma end up being handled differently when part of a number than in ordinary text.  You also need a way to bind in n-aryands, like integrands and summands. So I introduced the glue operator U+2592, which both glues the n-aryand to the n-ary operator and terminates a limit, if one appears. To keep things looking natural, I considered a simple operand to be an alphanumeric span, rather than TeX’s single letter. The expression a_123 is a sub 123 rather than a sub 1 multiplied by 23. A complex operand can include parenthesized expressions so that f(x)/g(x) builds up to the ratio of f(x) to g(x).

Another requirement is to be able to round-trip the presentation (Professional/built-up) format through the linear format and back. This feature is very handy, since it lets us represent essentially arbitrary mathematical expressions/equations in “plain” text. One particular use is for user definable autocorrect entries. For example, you can assign \binomial with the linear format (a+b)^n=∑_(k=0)^n (n¦k)a^k b^(n-k) and presto! When you type \binomial <space> or carriage return, it builds up to the binomial formula. Well not yet in Word 2010 (you need to select Professional Format explicitly), but it does in PowerPoint 2010 and OneNote 2010.

For special edge cases, I needed a couple of somewhat arcane constructs like the lenticular brackets〖〗for round-tripping compound arguments since if parentheses were used, the parentheses would be displayed. The result is that the Unicode linear format is a very general mathematical notation, but it is not a context-free grammar. This would certainly upset Wolfram and it does make it harder to implement and maintain in its full generality, although it is quite efficient in practice. It is the closest to a true mathematical notation that I could define. Let me hasten to add that many, many people, inside and outside Microsoft, have contributed ideas that shaped the version that Microsoft Office 2010 offers.

Bertrand Russell once wrote, “A good notation has a subtlety and suggestiveness which at times make it seem almost like a live teacher…and a perfect notation would be a substitute for thought.” The Unicode linear format certainly isn’t a substitute for thought, but it is a more mathematically natural notation than previously available on computers.


Comments (9)

  1. Greg Hullender says:

    I use the linear format almost exclusively, and to make it easier, I've defined a large number of keyboard shortcuts. For example, "ctrl-G a" gives me a greek alpha. I find I like that a lot better than typing the longer TeX-inspired names, although it does mean memorizing a number of control sequences.

    Has anyone else done this in a systematic way? Ideal, for me, would be a math keyboard with all the extra characters nicely labeled–something about as complex as the old APL keyboards.

    –Greg

  2. MurrayS says:

    Math keyboards are an old favorite of mine. My PS Technical Word Processor assigned the characters on the symbol ball/daisy to alt key combinations. Script characters used the ScrollLock key. Ctrl characters were used for navigation and other editing purposes, so they weren't available for symbols. Italic was determined by context. My custom math character set had 512 characters. It predated Unicode by several years.

    Early on in my years at Microsoft, I wrote a facility to allow users to define keyboard hot keys including arbitrary combinations of left/right ctrl/shift/alt keys as well as the NumLock/ScrollLock. This approach is described a bit in Section 4.2 of the linear format paper (http://www.unicode.org/…/UTN28-PlainTextMath-v3.pdf). The product never shipped, since I had to work on RichEdit, another favorite of mine.

    With Windows it isn't quite as simple, because alt keys are typically assigned to useful menu hot keys and ctrl keys have handy purposes such as for cut/copy/paste. So it seemed to me that TeX-like names were a reasonable compromise. Originally Knuth, too, had a custom math keyboard and characters set, but decided to use the ASCII names instead.

    With autocompletion, the TeX like names could be entered substantially faster. With the current math autocorrect facility, you can define simpler names, such as a for alpha. But maybe someday I'll dust off my user definable keyboards and symbol boxes. A math keyboard could become active only in math zones, which would reduce the conflict with standard hot keys.

  3. Hermann Klinke says:

    I just discovered Office math capabilities (I use OneNote 2010 only though). It is definitely the best math editor I have seen and while it can be improved (like the placement of the cursor right after applying an AutoCorrection), I don't see how one could enter math any more efficiently than the way it's implemented in Office. I for example created an AutoCorrect list that allows me to enter math in real-time during a lecture so it has to be extremely efficient. None of my AutoCorrect entry  requires more than 2 characters and is really easy to remember (only 4 rules to remember). I also don't use any keyboard shortcuts so far except the one for math zones (ALT+=). E.g. I append double quotes to letters to turn them greek (e.g.: a" for alpha) or I append a dot to turn letters into common double struck letters or other common letter-like symbols. It's still a work in progress though since I am still reading up on the unofficial documentation and there are still a few symbols I haven't added yet. I'd love to do a guest post on this when I am done, since I believe there is no faster and easier to remember way to enter math than the AutoCorrect list I am working on.

    Are there plans to release official documentation on all the math features of Office and what feature is available in what product and what version?

  4. MurrayS says:

    Your math autocorrect list seems very effiicient and useful. It'd be cool to have a blog post on it.

    More documentation will become available as time goes on. Most of the functionality is in Word 2007/2010, OneNote 2010, and applications that incorporate the 2010 version of OfficeArt, such as PowerPoint 2010 and Excel 2010. There are some differences (mostly in the user interfaces) and documenting them would make for a good blog post as well.

  5. EdB says:

    Greg: I defined a Math keyboard for using with Word 2007 when I was an undergrad, although I didn't actually produce a physically-labeled one. It's fallen out of use for me since I switched to Mac, but I could get a very good speed entering equations with it (often such that Word couldn't keep up with my typing) and would rarely need to resort to the ribbon (only for esoteric symbols and matrices; even then I would use the keyboard to navigate it). It's available online (free) at http://ed.mvps.org/physics/

  6. Yanming says:

    more compatibility with latex please

    As a long time latex user, I was excited to find out (only recently) that office starts to support a markup language that shares many features with latex. I think it is a big step towards to the right direction.

    One thing I'd really appreciate is for the linear format to be more compatible with latex. I have briefly read some docs on the linear format and can understand that the linear format has a different design philosophy and goals than latex. I can also see how the linear format tried to improve over the latex. But on the other hand, for many users like me, who are very familiar with latex, it will be a big advantage if the two formats are as similar as possible. For example, I can understand it might be more natural for some people to type "X hat " rather than "hat X", and can appreciate the simplicity of using a/b instead of dfrac{a}{b}, but I cannot see any real advantage of using paratheses in the places where latex uses braces.

    Another solution might be to allow different dialects. Too many dialects obviously will unnecessarily complicate matters, but given that people who type equations a lot are likely to be familiar with latex, to allow a latex dialect might be a big plus.

  7. MurrayS says:

    I agree that it would be great to have a LaTeX input option. While I personally prefer the current linear format since it's a more mathematical notation and somewhat more efficient, I'm also a strong supporter of standards and conventions. In particular, if you have LaTeX in your fingers, why should you have to learn a new notation? LaTeX works fine for you. Naturally we have to prioritize this feature request (and it's a common request) against other things we're doing.

  8. Tom Dietterich says:

    First let me say how much I love the new math mode. I have a question for which I haven't been able to find the answer in the various documents and blog posts.  How do I enter the equivalent of latex: hat{x}?  If I type hat(, this creates a container with a hat over it, but the parenthesis is still there, so I need to delete the parent, then hit the backarrow key to position the cursor inside the container to type the "x". I'm sure there is a better way to do this.

  9. MurrayS says:

    Thanks for the compliment! Office math follows the Unicode custom of placing the combining mark after the expression that gets the combining mark. So zhat<space space or operator> puts the hat over the z. (a+b)hat puts a wide hat over the a+b as described in the linear format document (http://www.unicode.org/…/UTN28-PlainTextMath-v3.pdf) in Section 3.10 Accent Operators.