LineServices

One of the key technologies behind the high quality display of mathematical text in Word 2007 and RichEdit 6.0 is a special component called LineServices along with its sibling Page/TableServices. In addition to handling math display, various versions of LineServices are responsible for line layout in Word, Publisher, RichEdit, PowerPoint, Internet Explorer, FrontPage, Visio, and Vista. It was developed by one of the most amazing teams at Microsoft. Because LineServices is used by components like RichEdit and Vista controls, it’s indirectly available to developers outside Microsoft. The low-level interfaces to run it directly are tricky to use and in general aren’t documented very completely. Microsoft developers who use LineServices generally consult with the LineServices team.

This post tells some of how LineServices came to be and developed over time. It all started in 1994 with one of Microsoft’s most talented engineers, Eliyezer Kohen, a Turkish computer scientist who obtained his PhD at the Technische Hochshule in Zurich with Niklaus Wirth (author of Pascal, among other things). Eliyezer had led the two-man team (with Greg Hitchcock) that developed the Microsoft TrueType rasterizer as well as the two-man team (with Dean Ballard) that developed the original OpenType specification. Peter Pathe was heading Word at the time and wanted to improve Word’s line layout. He figured that Eliyezer could get a team going to do this in time for Word 97.

Eliyezer was convinced he could factor out Word’s line layout, but because of Word’s mind boggling backward compatibility requirements and the lack of adequate help from the Word team, he refused to agree to the Office 97 time frame. I was working on RichEdit 2.0 next door to Eliyezer back then and you should have heard the prolonged arguments between Eliyezer and Peter! In that time frame, Eliyezer actually had a native American working with him named Lennox Brassel along with Ping Wong. Then he hired his first St. Petersburg mathematician, Sergey Genkin, who was recommended by another St. Petersburg mathematician already at Microsoft, Andrei Burago (more about Andrei in a bit). Here I use the term mathematician to mean someone who majored in mathematics, although he changed to computer science afterwards. Sergey’s first job after arriving in the USA in 1991 was back East working on a TeX compatible system, a very useful background for his later work on the math handlers.

Eliyezer needed additional developer cycles, so he asked Sergey if knew any more smart guys back in St. Petersburg. Sure enough Igor Zverev could come and RichEdit was fortunate enough to have Igor’s and Andrei’s services for a while in developing RichEdit 2.0. Not long after, yet another St. Petersburg mathematician Victor Kozyrev joined the team. The team developed LineServices 2.0, which shipped first with a little program called Greetings Workshop in 1997. It also shipped with Publisher in 1998.

LineServices 3.0 was developed and shipped with Word 2000, Internet Explorer 5.0, RichEdit 3.0, and PowerPoint 2000. In addition to Western text, LineServices supported several special layout objects: the reverse object for BiDi text, the ruby object for East Asian phonetic annotations, Takanakayoko (Japanese for “horizontal in vertical”, but sounds better than HIV), and warichu (two lines in one). From my point of view ruby was just a fraction in disguise, so naturally I was very excited to see that LineServices could do such a beautiful job displaying ruby. The initial ruby handler was developed by Rick Sailor.

LineServices handles lots of tricky problems with text layout and calls the client back for all the information it needs. LineServices never makes any operating system calls directly and it can run on any operating system that has a C compiler. It’s written in C in what amounts to a well defined subset of C++, complete with opaque pointers that would be the this pointers if C++ were used. Maybe someday the team will upgrade to C++ which is used by almost all Office applications today. The team has the strange habit of seriously designing a product before ever writing one line of code. What’s even stranger is that when they do finally write the code, it has very few bugs in it. My own approach is to dive in writing code and then use the well-known physicist approach to evaluating things called successive approximations. I can’t figure out everything in advance, so I try something and then gradually improve on it. Those guys figure out most of the design without writing any code.

After Office 2000, Eliyezer & Co. embarked on Page/TableServices, which was natural. Eliyezer had started with characters in implementing TrueType, then progressed to lines with LineServices, and pages and tables were the next items on the layout hierarchy. To pull that off the team needed the help of another St. Petersburg mathematician, Anton Sukhanov, who had been a whiz kid in Russia winning various computer-science puzzle competitions. (Actually all these St. Petersburg people had been involved in math competitions). Anton enhanced the ruby handler among other things. The team developed PTS as it’s called and revised LineServices to work well with it.

About that time I simply couldn’t stand not having some math layout any longer, so in February 2001 I wrote a math handler for LineServices patterned after the ruby handler. While I was at it, I installed the ruby handler in a recursive way, so that you could have ruby objects nested like continued fractions. This upset the authors of the HTML ruby specification, since they said ruby was not supposed to be nested. Nevertheless the easiest way to display complex ruby is to use one level of nesting, but I digress. My simple math handler showed that LineServices could do mathematics, although my spacing was pretty awful. More precisely spacing was delegated to the user, who unfortunately seldom knows what correct spacing is.

A really cool thing about my LineServices math handler was that it convinced people that we had the infrastructure to layout math. Fortunately I didn’t appreciate at the time how hard it would be to lay out math as well as TeX. Else I might not have been able to persuade people to work on it. It seems that most things that are really worthwhile are way harder than you think they’ll be when you start working on them.

Of course, Eliyezer didn’t need any convincing; he and his team had designed LineServices so that it could be used to layout math. They just didn’t tell anyone! So after Office 2003 shipped, Eliyezer convinced Bill Gates and Steven Sinofsky that we should handle math layout. Chris Pratley helped by demoing OneNote for Bill and others using my simple formula autobuild up and LineServices math handler (OneNote uses RichEdit for editing and display of text at the paragraph level). Eliyezer started studying mathematical typography in earnest. He got his hands on every math typography book he could find, but by far the most influential and useful was Donald Knuth’s The TeXbook. After a few months Eliyezer enlisted the help of Victor and Andrei to design a math handler. Andrei had been working in the Windows font group, a background that would prove to be incredibly helpful on the math project. Andrei’s eyes for high-quality typography and his understanding of mathematics and TeX were crucial to the success of the project.

They’d often come into my office announcing they were planning to do this and that and I would sometimes protest that what they had in mind wasn’t workable from a client’s, e.g., RichEdit’s, point of view. So they’d revise the plan and eventually they had a math handler design that was compatible with my general approach and offered really high quality mathematical typography. While this was going on, I was developing the formula autobuildup input method using RichEdit and my old math handler, since the LineServices team hadn’t written any code yet.

Since TeX was so valuable in the design process, Eliyezer wanted to talk with Knuth, who happened to be an old friend of Eliyezer’s PhD advisor, Niklaus Wirth. Butler Lampson arranged a meeting in November, 2003 and Eliyezer, Andrei, Victor, and I had the good fortune to spend an extraordinary afternoon with Donald Knuth at his home on the Stanford University campus. Among many things, Donald showed us how he uses TeX to typeset his papers and books exactly the way he wants them to look. He applies special tweaks to achieve perfection, such as “smashing the descender” on one radicand to make a sum of square roots line up in a pleasing way and shimming characters to place them more beautifully in a formula. This interaction inspired us to think that we could automate some of Donald’s tweaks using special OpenType math tables and associated code, such as the “cut-ins” described in one of my earlier posts.

Eliyezer’s health gradually declined and he decided to retire after the initial math handler design. Sergey Genkin took over leadership of the math project. Sergey had been the lead developer on LineServices since the middle of 1998. One day in the summer of 2004, Sergey, Victor, and Andrei came into my office all excited and announced that they had been able to display the mathematical expression a+b! It was a real achievement, since the spacing on each side of the + was the desired 4/18th em and a lot of code had checked out correctly. One of the things they soon discovered was that LineServices alone was not adequate to layout high quality mathematics: you need PTS too!

The problem is that in a computer window, unlike on a printed page, the layout width can vary substantially from one invocation to another. Hence a program has to be able to break equations automatically to accommodate different window widths. TeX delegates most equation breaking to the user, but that’s not a good option for browsers, slide shows and other windowed usages. Also in general you need PTS to handle placement of equation numbers. Yet another brilliant St. Petersburg mathematician had joined the PTS team, namely Alexander Vaschillo, and he implemented equation breaking, alignment and numbering. Also although LineServices had been designed with math in mind, it was necessary to generalize both its code and PTS’s code to give the math handlers the power they needed. Igor generalized LineServices and Anton generalized PTS accordingly. I brought in two shelves of my mathematics books, one shelf oriented toward theoretical physics and one toward pure math. Many of the books predated computer typesetting and a number were in languages other than English. It was very useful to have these books in refining our ideas.

At this point one can understand better how we came to use OMML (Office MathML) as a file format for mathematics rather than MathML. OMML is a pretty close representation of the LineServices/PTS (PTLS) math objects. These math objects were created after extensive study of mathematical typography, rather than by study of MathML. It’s natural to have a file format that mirrors one’s internal formats. In addition, we needed to be able to put any inline text and objects inside math zones. MathML cannot currently handle embedding of other XML namespaces, at least in a way that’s exposed to an XML parser.

Naturally there’s more to the math story such as the interactions with the Word team (who developed OMML, integrated the PTS and LineServices math handlers into Word 2007, and provided quantities of feedback), the math font teams responsible for the Cambria font collection and the math OpenType library, and the test teams. But those very important interactions have to wait for future posts.