More Word Feedback

Works – I am afraid I know next to nothing about Works, except that we ship a converter they give us in the Word box.

Chinese on the Mac. It’s funny you mention that, because a friend of mine who is the PM in charge of MacWord is not only Chinese originally, but he is absolutely passionate (and I mean crazy-passionate) about supporting Unicode and Asian languages on the Mac, and is more or less single-handedly responsible for getting that into the 2004 product by browbeating those around him in MacBU. My understanding is that 2004 does support Chinese, BTW. If you find it surprising that it took awhile to support Chinese in MacWord given the huge population of Chinese speakers, you need to factor in the tiny percentage of those people who have Macs (then multiply that by the percent that don’t pirate software, and you get real market for Chinese in MacWord). Fundamentally without a business case, things only get done out of passion (like my friend’s).

Reveal codes in Word. Well, we get that request a lot of course. The internal architectures of WordPerfect and Word are essentially totally different. WordPerfect has a tagged format system, which most of us are familiar with now from HTML, although it predated HTML by a long shot (I don’t know if it was influenced by SGML or not – maybe a WP person could tell us? My guess is not). So to “reveal codes”, WP just shows its internal format. Word however, is designed as a set of “objects” with properties. To make something Bold in WP, you (effectively) put Bold tags around the ends of it. In Word, that “run” of text is assigned the property “Bold”. Actually, there is some indirection involved. Any run of text in Word with unique properties has a unique “property bag” assigned to it. The property bag is defined elsewhere in the document. If more runs of text are created that use the same format, the property bag is reused by reference – that is, the text is assigned the properties from bag #427, and somewhere else #427 is defined as bold, green, italic, etc. many different runs of text can refer to bag #427. Same for paragraphs, sections, and so on. That’s a lot of gobbledygook to say that there are no “codes” to reveal. If you use the “Format/Reveal Formatting” feature in Word2002 or 2003, what you see is the contents of the property bag for the text you had your insertion point in, and you can then change them. So, asking for reveal codes is sort of like asking a Mazda rotary engine owner if you can see the pistons in his engine. They don’t exist. Generating a tagged format like HTML or XML from Word is therefore an export/conversion process, where these object-property sets have to be converted into a serial form that the markup languages use. Likewise import means converting these sets of tags into properties assigned to runs of text. I believe you can read more about the architecture of Word if you do a Google on Charles Simonyi. He was the architect for the original WinWord – it was his second go-round (at least) since he came from Xerox where he had worked on word processing tools of similar design (so I am told).

Some people pointed out that Open Office is not an *exact* clone of Office. That wasn’t my point – all I was saying was that as a designer, I am interested in innovative, clever, usable designs to solve problems. When I looked at Open Office 1.1.1 the other day, nothing jumped out at me. If there are some neat designs in there, please share the details.

Creating and modifying Word binary docs outside of Word. Well, I think from a technical perspective that’s a risky proposition. We wouldn’t try it, that’s for sure. It’s the kind of thing you can get sort of working but it never leaves that stage due to the complexity involved. That’s why our binary save converter for Word95 format was actually a version of Word95 hooked up to read RTF and spit out 95 *.doc. Creating a Word binary from scratch is tough. RTF is used for this purpose instead since it is easier to deal with than Word binary for apps other than Word (remember that is why we created it – it stands for Rich Text interchange Format). The new XML format is designed for exactly that purpose – and it is easier to work with than RTF. You can create the WordML doc (or even a minimal subset) on a server using XML tools, then send the XML to Word on the client and Word will load it up. If you’re missing a lot of the Word specific stuff, that’s OK – Word will fill in the missing bits with defaults. In fact, you can skip generating the doc on the server if you want – just generate an XML data file in your own schema and provide an XSLT for Word to use when opening the file. That pushes a lot of the processing onto the client.

BTW, a lot of the confusion around XML in Word2003 was that people thought it was just a file format – probably because Open Office uses it that way, and the long tradition of SGML which was so document focused. To us, WordML is handy, but what is really cool is the support for schemas that customers or other developers define. WordML in this sense acts as the “envelope” for the “letter” that is the real customer data. The Pro version of Office allows you to tag up a document in Word (although we think that’s a pretty unfriendly thing to ask a normal user to do, it is the first step for developers). More interestingly, as a developer, you can build structured templates using your own schema, and have users create docs using them that are pre-structured. You can hook the save event to get the document out as XML, and then you’re off and running. There’s a thing on the client called the “schema-library” that associates XML namespaces of your choice with XSL files, solutions, etc. This means once you’re set up in the schema-library, you can dump blobs of XML to Word (via e-mail attachments, or code), and Word will check the XML you provide – find the associated files to deal with it locally, and transform that XML using a presentation that can also retain the XML markup you supplied. Note this important difference – this is not converting one schema into another like a file converter (although it can be used that way) – it is generating presentation to wrap around the actual customer data, which is retained in the resulting file.

To be clear, if you’re thinking only in terms of file format, then the XML you’re imagining has things like “bold”, “italic”, indent”, etc. And then the conversions you imagine are sort of like converting “<b>” into “<bold>” or whatever. This is necessary stuff but not all that exciting. What I’m talking about are schemas of the form “customer ID”, “quantity”, price”, etc. These are database schemas with semantic markup of the data. Without support for this sort of thing in the application, then XML really is just another file format. A handy one to parse outside the creating app – no question there – but the exciting bit is when you can hook business data into your documents, modify it in the content of the tools you are familiar with, and print or save or update database – whatever. This can cut out a lot of steps in today’s workflow, and not only be faster but also reduce error.

Working on XML in Word2003 was a blast – it seemed like every week we’d come up with a new amazing thing you could do with it. The last two or three years have been some of the most fun I’ve ever had at work – OneNote was of course a thrill ride, and the XML stuff in Word2003 really was breaking exciting new ground.

Comments (35)

  1. With WordML in there, how long until Word’ll be able to open/save <a href="">OASIS Open Office format</a> documents?

  2. Hrm… Something went wrong there…

  3. Alex says:

    This is like pointing out the obvious, but asking a slashdot idiot to share the details is more like asking him/her for more bashing. A slashdot idiot will never share the detail, if he/she could, he/she would already share it. They just love to accuse others, they can’t stand up against serious arguments.

    Watch out for more bashing.

    I have used OO, the only attractive thing about is that it is a cheap solution. That’s pretty much it, it clearly doesn’t compete against Office. It became better than Office, because it is open source. Slashdot idiots love to claim that every open source app out there is somehow better than the closed ones, there are idiots who even claim that Gimp is better than Photoshop. It just doesn’t make sense.

  4. Russ C says:

    It looks like your the only one doing any bashing here Alex, Please stop trolling and be Civil.

    I read Slashdot avidly and haven’t used either Linux or Apple in years … It’s just a good idea to be aware of everyones opinion, and I certainly don’t consider myself to be an Idiot.

  5. Simon says:

    What is your opinion of Mellel? I used Word for a long time (high school and college), and switched last year because I went to OS X and didn’t want to pay as much for MS Word (Mellel is $25.)

  6. Alex says:

    re:Russ C

    That’s the beauty of being a Slashdot idiot, you just don’t know that you are one.

    I also read Slashdot, if you think for example Microsoft should die, or Bill Gates should die, you are certainly an idiot. I am also a Linux user, being a Linux user doesn’t make you an idiot, agreeing on most of the stuff claimed on slashdot certainly does. So stop being an idiot and think a little.

  7. Jeremy P says:

    I remember I commented on the guy who was all bent over on revealing codes. :)) I wonder if he knew he didn’t have a point.

    I agree with what Alex said in his final paragraph.

    The first paragraph however seems to be just a repetition of the idea that there are people who can only talk about and can’t see past the myth that all Microsoft is capable of doing is squashing the little guy and making less than desirable products—remember I said MYTH. I think they make great products and have excellent business sense (not a perfect business nor perfect products) but where is it written things are supposed to be perfect?

    I guess it’s about time I say this turned out to be an interesting blog.

  8. The amusing thing is, you can get a pretty good feel for the formatting just simply by selecting "Reveal Formatting" from the Format menu.

  9. Chris, you made an excellent point earlier- something along the lines of "there are 400 million Word users, and if 1% of them hate it and post feedback in my blog, that’s 4 million posts". People are more inclined to speak up with their negative feedback than their positive, simply because the anger of negative opinion is a far more seductive motivation to speak out. So if your feedback sounds far too negative, just remember (according to your own calculations) that there are 399 of us happy cusomters to every disgruntled person that posts here.

    Also- I never in my wildest dreams expected to read the candid history of MS Word written by a current MS employee on his personal weblog. Thank you for putting in the time to write it- I appreciate that your personal time must be at a premium right now.

  10. Color me the "ordinary user," although I’m a Certified Word/Excel Expert. The last year I have had to learn XML schemas to do my work and though I initially protested WordML because it wasn’t vanilla XML, I’ve learned better. Another paradigm-changing moment occurred when Word 2003 arrived and it was far better than I expected β€” the file format was intact, most every bug that had been fixed, and the Help File wasn’t as bad as I thought it would be (moving it through the task pane).

    Just a side note, and one you’ve undoubtedly heard for a decade: The reason the Word Count feature is ESSENTIAL is that my own work as a writer demands that I stay within certain limits, such as 200, 800, 1500, or 4000 words. Going over any given number (even if my editor is using OOo or WP) returns a nasty message for rewrite. So for every journalist, writer, student, and teacher who uses Word, Word Count is the equivalent of having a watch that keeps time accurately β€” when you need it, it becomes very important.

  11. Tonetheman says:

    WordML is the way to go for Word. Period. This is the type of stuff that should have been done long ago. Actually you could argue that for most products that allow users to manipulate content like Word. Anyway today I will not bash MS or Word, WordML is cool and really useful. Keep it up!!!!

  12. D. Brian Ellis says:

    Bravo Chris. Another informative article. You never cease to amaze me. I find the object model behind Word you describe fascinating. Question: How does that relate to interoperability with the other Office suite products? Do Excel, etc. use a similar model but with different objects and attributes, or do they simply receive a file labeled "Word File Format: Import This…"? Also, how does this model handle the newer Code-Behind model of the 2K3 suite? For instance how is a C# code-behind handled in a .doc and when transfered to Excel? Thanks, keep up the good work!


  13. Kent says:

    Chris, your candor is right in line with what the Channel 9 folks are trying to do. No BS!

  14. RobSaunders says:

    Thanks Chris, I completely understand since your not involved with Works…

    To date I have only enjoyed using three WordProcessors.

    ClarisWorks (3/4), Word 2000/2002, and Word 2001/OSX.

    That should be a good enough compliment πŸ˜‰

  15. Brian Johnson says:

    Hello Chris,

    I see you’re responsible for Word too so I have an interessting feature request:

    Why isn’t possible to create a grossary for the document I write in Word? I’m talking about something to enable me to mark the word I just entered and by right-clicking it/using a key-shortcut enter an explanation. The glossary at the end of the document (or wherever I want it) will be automatically built (sorted, layouted etc.). I’m NOT talking about user dictionaries in Word in which you can enter the words which Words dictionary doesn’t know. That such a mature word processing application (my Word 2003 reads 11.5604.5606 – so it’s a major version 11) still doesn’t have such a feature is pity. As a software developer I (after your excellent articles now even better) unterstand how MS decides on future features but it’s unbelievable that such a feature "has not mad it".

    What do you say?

  16. Brandon Paddock says:

    I’m guessing that a part of that feature would be to show a tooltip or something with the appropriate definition when someone is reading the document?

    It sounds like an interesting proposal but I’m just trying to figure out the usefulness. I mean: why this would be better than just adding a glossary to the end of your file by, well, typing it.

    I’m betting some of what you want could be done by a developer using VSO and word 2003, though it’s really hard to say without knowing more about the "features" of this, um, feature πŸ™‚

  17. Matt says:

    About your point that you can’t do reveal codes in Word, I think you’re missing a subtle point: WP can reveal codes because it’s a direct visual reflection of their implementation. Word’s implementation is different, but it doesn’t actually imply that it would be impossible to support a reveal codes-like user interface, does it? It would be a pain to keep the two models in sync, but there is a pretty direct mapping between the taggging-a-run-of-text model for the UI and the bag-of-properties-for-a-run-of-text model that forms the implementation.

    Not impossible, just a pain to build and to maintain.

    Would it really be easier? That’s harder to say.

  18. Russ C says:

    re: Alex

    Which part of saying that I avidly read Slashdot, implies that I agree, hand on heart with anything that is posted there ?

  19. Russ C says:

    Or indeed everything πŸ™‚

  20. David Mooney says:

    Another vote on the glossary feature. It would be nice if it had the side effect of acting like the glossary worked in the old .hlp help system. In those days, the help browser would underline defined words with a green dashed underline and if you clicked on it it would pop up a small window with the definition. It was a great feature which seems to have been forgotten in later generations of help systems.

    Anyway, glossaries are a common "feature" in corporate documents and to have a Word feature like Insert Table of Contents, but Insert Table of Defined Terms (aka a Glossary).

  21. These days the glossaries (in help) seem to be inline. Nice, but there’s a possibility that terms will change in different ways on different help pages. It’s a side-effect of going to HTML for help, I think, and probably the only downside. (Dealing with old-style Help authoring was hell.)

    I’d love to see that glossary feature myself. Still, as long as you’re careful, you can use styles to do something along the lines of HTML’s definition lists.

  22. David Mooney says:

    Gah! Insert "would be cool" before that last period. πŸ™

  23. Alex says:

    re:Russ C

    I didn’t call you an idiot, did I? I mentioned the Slashdot idiots, and you came in and objected to me as if you are a Slashdot idiot. If you were not a Slashdot idiot, then why did you object? If you at least read Slashdot you should have known what I am talking about when I refer to a Slashdot idiot.

  24. Tom_Yardley says:

    I love wordperfect because of the reveal codes feature, your description that Word has no codes to reveal was eye-opening.

    Maybe I’ll give some of your software a try.

  25. Dave says:

    Check out

    for a third-party product that "reveals codes".

  26. WP says:


    "So if your feedback sounds far too negative, just remember (according to your own calculations) that there are 399 of us happy cusomters to every disgruntled person that posts here."

    Get real. What’s the saying: 1 negative comment = the way 100 people feel?

  27. WP, logically, you can both be right. The numbers we’re talking about are in the millions, so it is easy to have 100 negative people for each negative comment and 399 positive people.

    Actually, the ratio for positive comments is way higher than for negative. People with a negative experience tend to complain ~10 times as much as people wth a positive experience exclaim about their happiness. So you might want to multiply the positive posters here by 1000, if you were so inclined.

    Of course, the reality is that the people who read and comment on this blog are in no way an accurate statistical sample of the real user base.

  28. Simon Marks says:

    If you’re just looking to put in definitions of words, then use the Research Task Pane in Word 2003 (and a bunch of other Office 2003 products). If you Alt-click onto any word in a document then you get given a bunch of info about the word, including spelling, translation, thesaurus, etc. For free you can access the Encarta Dictionary. It’s a relatively easy task to create your own service that can be accessed if you want a glossary for particular terms (we have an Internal Microsoft Glossary for all the arcane TLAs and code-names that noone can remember what they mean)

  29. Sam Smith says:

    I would just like to second the opinion that OpenOffice format documents should be recognized as a serious format and supported as one, although I can understand reasons for not doing this.