LineServices


One of the key technologies behind the high quality display of mathematical text in Word 2007 and RichEdit 6.0 is a special component called LineServices along with its sibling Page/TableServices. In addition to handling math display, various versions of LineServices are responsible for line layout in Word, Publisher, RichEdit, PowerPoint, Internet Explorer, FrontPage, Visio, and Vista. It was developed by one of the most amazing teams at Microsoft. Because LineServices is used by components like RichEdit and Vista controls, it’s indirectly available to developers outside Microsoft. The low-level interfaces to run it directly are tricky to use and in general aren’t documented very completely. Microsoft developers who use LineServices generally consult with the LineServices team.

This post tells some of how LineServices came to be and developed over time. It all started in 1994 with one of Microsoft’s most talented engineers, Eliyezer Kohen, a Turkish computer scientist who obtained his PhD at the Technische Hochshule in Zurich with Niklaus Wirth (author of Pascal, among other things). Eliyezer had led the two-man team (with Greg Hitchcock) that developed the Microsoft TrueType rasterizer as well as the two-man team (with Dean Ballard) that developed the original OpenType specification. Peter Pathe was heading Word at the time and wanted to improve Word’s line layout. He figured that Eliyezer could get a team going to do this in time for Word 97.

Eliyezer was convinced he could factor out Word’s line layout, but because of Word’s mind boggling backward compatibility requirements and the lack of adequate help from the Word team, he refused to agree to the Office 97 time frame. I was working on RichEdit 2.0 next door to Eliyezer back then and you should have heard the prolonged arguments between Eliyezer and Peter! In that time frame, Eliyezer actually had a native American working with him named Lennox Brassel along with Ping Wong. Then he hired his first St. Petersburg mathematician, Sergey Genkin, who was recommended by another St. Petersburg mathematician already at Microsoft, Andrei Burago (more about Andrei in a bit). Here I use the term mathematician to mean someone who majored in mathematics, although he changed to computer science afterwards. Sergey’s first job after arriving in the USA in 1991 was back East working on a TeX compatible system, a very useful background for his later work on the math handlers.

Eliyezer needed additional developer cycles, so he asked Sergey if knew any more smart guys back in St. Petersburg. Sure enough Igor Zverev could come and RichEdit was fortunate enough to have Igor’s and Andrei’s services for a while in developing RichEdit 2.0. Not long after, yet another St. Petersburg mathematician Victor Kozyrev joined the team. The team developed LineServices 2.0, which shipped first with a little program called Greetings Workshop in 1997. It also shipped with Publisher in 1998.

LineServices 3.0 was developed and shipped with Word 2000, Internet Explorer 5.0, RichEdit 3.0, and PowerPoint 2000. In addition to Western text, LineServices supported several special layout objects: the reverse object for BiDi text, the ruby object for East Asian phonetic annotations, Takanakayoko (Japanese for “horizontal in vertical”, but sounds better than HIV), and warichu (two lines in one). From my point of view ruby was just a fraction in disguise, so naturally I was very excited to see that LineServices could do such a beautiful job displaying ruby. The initial ruby handler was developed by Rick Sailor.

LineServices handles lots of tricky problems with text layout and calls the client back for all the information it needs. LineServices never makes any operating system calls directly and it can run on any operating system that has a C compiler. It’s written in C in what amounts to a well defined subset of C++, complete with opaque pointers that would be the this pointers if C++ were used. Maybe someday the team will upgrade to C++ which is used by almost all Office applications today. The team has the strange habit of seriously designing a product before ever writing one line of code. What’s even stranger is that when they do finally write the code, it has very few bugs in it. My own approach is to dive in writing code and then use the well-known physicist approach to evaluating things called successive approximations. I can’t figure out everything in advance, so I try something and then gradually improve on it. Those guys figure out most of the design without writing any code.

After Office 2000, Eliyezer & Co. embarked on Page/TableServices, which was natural. Eliyezer had started with characters in implementing TrueType, then progressed to lines with LineServices, and pages and tables were the next items on the layout hierarchy. To pull that off the team needed the help of another St. Petersburg mathematician, Anton Sukhanov, who had been a whiz kid in Russia winning various computer-science puzzle competitions. (Actually all these St. Petersburg people had been involved in math competitions). Anton enhanced the ruby handler among other things. The team developed PTS as it’s called and revised LineServices to work well with it.

About that time I simply couldn’t stand not having some math layout any longer, so in February 2001 I wrote a math handler for LineServices patterned after the ruby handler. While I was at it, I installed the ruby handler in a recursive way, so that you could have ruby objects nested like continued fractions. This upset the authors of the HTML ruby specification, since they said ruby was not supposed to be nested. Nevertheless the easiest way to display complex ruby is to use one level of nesting, but I digress. My simple math handler showed that LineServices could do mathematics, although my spacing was pretty awful. More precisely spacing was delegated to the user, who unfortunately seldom knows what correct spacing is.

A really cool thing about my LineServices math handler was that it convinced people that we had the infrastructure to layout math. Fortunately I didn’t appreciate at the time how hard it would be to lay out math as well as TeX. Else I might not have been able to persuade people to work on it. It seems that most things that are really worthwhile are way harder than you think they’ll be when you start working on them.

Of course, Eliyezer didn’t need any convincing; he and his team had designed LineServices so that it could be used to layout math. They just didn’t tell anyone! So after Office 2003 shipped, Eliyezer convinced Bill Gates and Steven Sinofsky that we should handle math layout. Chris Pratley helped by demoing OneNote for Bill and others using my simple formula autobuild up and LineServices math handler (OneNote uses RichEdit for editing and display of text at the paragraph level). Eliyezer started studying mathematical typography in earnest. He got his hands on every math typography book he could find, but by far the most influential and useful was Donald Knuth’s The TeXbook. After a few months Eliyezer enlisted the help of Victor and Andrei to design a math handler. Andrei had been working in the Windows font group, a background that would prove to be incredibly helpful on the math project. Andrei’s eyes for high-quality typography and his understanding of mathematics and TeX were crucial to the success of the project.

They’d often come into my office announcing they were planning to do this and that and I would sometimes protest that what they had in mind wasn’t workable from a client’s, e.g., RichEdit’s, point of view. So they’d revise the plan and eventually they had a math handler design that was compatible with my general approach and offered really high quality mathematical typography. While this was going on, I was developing the formula autobuildup input method using RichEdit and my old math handler, since the LineServices team hadn’t written any code yet.

Since TeX was so valuable in the design process, Eliyezer wanted to talk with Knuth, who happened to be an old friend of Eliyezer’s PhD advisor, Niklaus Wirth. Butler Lampson arranged a meeting in November, 2003 and Eliyezer, Andrei, Victor, and I had the good fortune to spend an extraordinary afternoon with Donald Knuth at his home on the Stanford University campus. Among many things, Donald showed us how he uses TeX to typeset his papers and books exactly the way he wants them to look. He applies special tweaks to achieve perfection, such as “smashing the descender” on one radicand to make a sum of square roots line up in a pleasing way and shimming characters to place them more beautifully in a formula. This interaction inspired us to think that we could automate some of Donald’s tweaks using special OpenType math tables and associated code, such as the “cut-ins” described in one of my earlier posts.

Eliyezer’s health gradually declined and he decided to retire after the initial math handler design. Sergey Genkin took over leadership of the math project. Sergey had been the lead developer on LineServices since the middle of 1998. One day in the summer of 2004, Sergey, Victor, and Andrei came into my office all excited and announced that they had been able to display the mathematical expression a+b! It was a real achievement, since the spacing on each side of the + was the desired 4/18th em and a lot of code had checked out correctly. One of the things they soon discovered was that LineServices alone was not adequate to layout high quality mathematics: you need PTS too!

The problem is that in a computer window, unlike on a printed page, the layout width can vary substantially from one invocation to another. Hence a program has to be able to break equations automatically to accommodate different window widths. TeX delegates most equation breaking to the user, but that’s not a good option for browsers, slide shows and other windowed usages. Also in general you need PTS to handle placement of equation numbers. Yet another brilliant St. Petersburg mathematician had joined the PTS team, namely Alexander Vaschillo, and he implemented equation breaking, alignment and numbering. Also although LineServices had been designed with math in mind, it was necessary to generalize both its code and PTS’s code to give the math handlers the power they needed. Igor generalized LineServices and Anton generalized PTS accordingly. I brought in two shelves of my mathematics books, one shelf oriented toward theoretical physics and one toward pure math. Many of the books predated computer typesetting and a number were in languages other than English. It was very useful to have these books in refining our ideas.

At this point one can understand better how we came to use OMML (Office MathML) as a file format for mathematics rather than MathML. OMML is a pretty close representation of the LineServices/PTS (PTLS) math objects. These math objects were created after extensive study of mathematical typography, rather than by study of MathML. It’s natural to have a file format that mirrors one’s internal formats. In addition, we needed to be able to put any inline text and objects inside math zones. MathML cannot currently handle embedding of other XML namespaces, at least in a way that’s exposed to an XML parser.

Naturally there’s more to the math story such as the interactions with the Word team (who developed OMML, integrated the PTS and LineServices math handlers into Word 2007, and provided quantities of feedback), the math font teams responsible for the Cambria font collection and the math OpenType library, and the test teams. But those very important interactions have to wait for future posts.

 


Comments (13)

  1. Ian Easson says:

    While this is all very interesting, I would like to know when Word’s layout of ordinary text is going to be improved.  Even hyphenation is turned off by default (and a user can even choose not to install it!).  For justified paragraphs, the layout is particularly bad ; the user has to know about the truly obscure option to "Use full justification like WordPerfect 5.x" to get even a rough approximation of what can be achieved in text layout with something like TeX.  Even in Word 2007, which was supposed to expose to the user all the fomerly obscure capabilities of Word through the Ribbon, all the text layout options are buried many levels deep (and not even accessible via the Ribbon) where no sane user would even find them.

    How about some attention to this issue?  OK, rants over, I got that off my chest.

    By the way, good work on the math layout side.

  2. MurrayS3 says:

    Ian, some people around here are probably going to ask me how much I paid you to post that comment. LineServices, PTS and our font technologies support all you ask for and more. But it takes time to integrate these features reliably into Word. As you can imagine, adding the math support to Word 2007 was a tour de force in inself, so adding the globalized optimal paragraph algorithm and full OpenType support was too much to fit into the Office 2007 schedule. As they say, Rome wasn’t built in a day. Please do keep on ranting. As they also say, the squeaky wheel gets the oil.

  3. anony.muos says:

    PLEASE ADD FULL OPENTYPE SUPPORT IN WORD 2007’S SUCCESSOR. PEOPLE WILL BE RELIEVED.

  4. Lionel Fourquaux says:

    > Ian, some people around here are probably going to ask me how much I paid you to post that comment.

    You don’t have to pay, I’m giving you a "Me too" for free. Please add a better justification algorithm in Word!

    > various versions of LineServices are responsible for line layout in Word, Publisher, RichEdit, PowerPoint, Internet Explorer, FrontPage, Visio, and Vista

    Is there some chance that the version used by Internet Explorer could be upgraded to support math display? MathML support (builtin, à la Mozilla/Firefox, not using a third party binary behavior) is requested from time to time, and it looks like you have already done most of the work. It would be great to get equation support in HTML e-mails using Windows Mail.

  5. MurrayS3 says:

    Lionel, I agree it would be great if Internet Explorer could display MathML with Word 2007’s typographic quality. And it’s true that most of the hard work is already done. I guess we have to keep our fingers crossed. Btw, you do have equation support in Outlook 2007 email. We use it a lot in discussing math-oriented subjects.

  6. I have talked about math in Unicode before, like in For those who enjoy mathematics (or, ‘Also new in

  7. This post discusses aspects of Word’s first math editing and display facility: the EQ field. This field

  8. This post discusses aspects of Word’s first math editing and display facility: the EQ field. This field

  9. Thanks for very interesting article. btw. I really enjoyed reading all of your posts. It’s interesting to read ideas, and observations from someone else’s point of view… makes you think more.

    So please keep up the great work. Greetings.

  10. Henrik Holmegaard says:

    ISO-IEC Technical Report 15285 discusses three composition models, (a) models in which a character code directly designates a glyph code and its glyph geometry, (b) the intelligent composition model of the Unicode Consortium, and (c) the composition model developed by one of the ISO Standing Committees within which ISO-IEC 10646 character codes are replaced by ISO 100036 glyph codes. Model (a) is a truly terrible technology and model (c) is not applicable since the registration authority (AFII) is disbanded and the registration discontinued.

    So, Apple has had model (b) since 1992 and Microsoft has had model (b) since  a point in time about five years later, but Apple and Microsoft do not support the intelligent composition model at the top level of operating system user interfaces. There are a range of reasons for this, unfortunately.

    ISO-IEC Technical Report 15285 points out that if ISO-IEC 10646 implementation level 3 is invoked then non-unique spelling is involved in the intelligent composition model. Non-unique spelling was supported by the American National Standards Institute (rather, the American National Standards Institute vetoed unique spelling).

    There is no rendering specification for the SFNT Spline Font file format, so there is no way to tell in purchasing a type product if it provides for non-unique spelling or indeed for private spelling in the CMAP Character Map. There is also no way to tell in purchasing a type product which Common Locale Data Repository character repertoires it supports for the official writing systems of the European Union.

    There is not even a way to tell if the CLDR data is correct since official orthographic organizations are not involved in proofing the data – it is privately provided by enthusiasts some of whom can’t spell in their primary writing systems. Microsoft does not use the Common Locale Data Repository, so there is no common repository for foreign software publishers in the United States of America.

    There are also problems with marrying normative code space to illustrative data space. First, Microsoft which holds the chair of ISO-IEC 10646 has opposed long identifiers in writing systems other than English. This is not helpful since Apple has implemented a command Show Character Selected in Application to support interactive identification of input character codes for output glyph codes.

    Second, at the end of the day what is and what is not a logical marriage of code space to data space is contextual. Æ, æ is a monophthong in Danish and Norwegian that follow Old English, unsurprisingly,  but a diphthong in French, so the diphthong ligature substitution in Apple and Microsoft implementations is illogical if it is not contextualised in terms of writing spaces, which do not exist in the Unicode imaging architecture.

    These and other problems call for a conference as model (a) is a disaster that should be disparaged publically (and should have been disparaged a decade ago) while model (b) is immature and incapable of multilingual interactivity at this point.

    Best wishes,

    Henrik Holmegaard

    technical writer, mag.scient.soc.

  11. The earlier post Breaking Equations into Multiple Lines describes equation line breaking and alignment.

  12. In Windows 7, WordPad has undergone many improvements even though it uses RichEdit 4.1+ for editing and