Knowledge is the "key" when talking about XML support in Office

There has been a lot of speculation over what the XML support in Office really entails. I think knowledge is really the key here, and I ask all of you who are interested to take some time to actually get your feet wet and understand the facts. I’ve made a number of posts over the past 3 months where I’ve tried to help explain the basics of the existing XML support in Office 2003 because that really helps to better understand what’s coming in Office ’12’. Conversations are obviously more constructive when we all understand what’s going on (and aren’t just basing our knowledge on random blogs and articles).

For those of you with Office 2003, you can get started right away. For those that don’t have Office 2003, there are free online labs you can use where you basically get to play around with Office 2003 online:

Here are the different posts I’ve made that will help you all in better understanding what you can do, and how easy it is. I think if people played around with this more, we’d have less inaccuracies like the recent Myth of the Binary Key.

Here are the posts I made on Word’s XML support:

  1. Intro to Word XML Part 1: Simple Word document – In this entry, I show how easy it is to create a basic Word XML document from scratch. You can just use a text editor like notepad to get started and viola, you’ve made you’re own Word document.

  2. Intro to Word XML Part 2: Simple Formatting – In this entry, I build off the first document and show how formatting is applied to text. It’s important to see how this works, because it’s a different model than that used in HTML. There is a really flat hierarchy of objects that each have properties associated with them. It’s slightly confusing at first, but once you understand it, it makes dealing with formatting really easy.

  3. Intro to Word XML Part 3: Using Your Own Schema – This is where it gets exciting (at least for me). You can use your own XML tags to add much more meaning to the documents. This allows for true interoperability of your documents with any system. It’s a core part of our XML support.

  4. Intro to Word XML Part 4: Schema Validation – If you want to run validation against the XML you put in the documents, you can create a schema and give that schema to Word. We will validate the XML that you put in the document against the schema and report those errors.

  5. Intro to Word XML Part 5: Opening custom XML – If you are already dealing with XML files, you can open those directly in Word for display and editing. You can either use the built in default view that Word applies, or you can create your own XSLT that creates a custom view on that XML. This is a great way of getting XML into a document.

  6. Intro to Word XML Part 6: Locking down your XML structures – Once you’ve created a Word document that has your XML in it as well, you may want to lock that document down so that people editing that document don’t accidentally change the structures you’ve applied. You can use the range level protection functionality to lock the XML tags down, and only let the end users edit the content inside the tags.

I hope these can help with better understanding what you can do with the XML support in Word. Of course these are just introductions, and you can do a whole lot more. Let me know if there are other topics you’d like to see me go into. Have a great weekend.


Comments (7)

  1. David Thielen says:


    Can you explain the data structure for <w:pict><w:binData w:name=’wordml://08000001.wmz’>?

    When there is an OLE object in a WordML file, along with the data that is

    understood only by the application that created the object, there is also a

    bitmap of the the drawn object – so it can display on systems that do not

    have the program that created the object installed.

    I created an example (using the equation editor) and the result is:



    <v:shapetype …

    <w:binData w:name="wordml://08000001.wmz">H4sIAAAA …


    <v:shape …

    <o:OLEObject …



    The binData is probably uuencoded. But I decoded it and it doesn’t seem to be any know format – I even tried it as a metafile. (My tests were to decode it and then try to load it in PaintShop Pro using each format. I also looked for the EMF signature bytes – they aren’t in there.)

    One person posted that it is a zip of the emf file – but WinZip had no luck with it either.

    So my question is, what is the format of this image?

    thanks – dave

  2. David: my guess is it’s an OLE DocFile (structured storage – think filesystem-in-a-file, metadata in the era before NTFS streams). The format isn’t publically documented, but Microsoft licenses the documentation and even the parser, and you can find unofficial documentation pretty much anywhere

  3. Yuki says:

    Bury this format already. Nobody wants it.

  4. FARfetched says:

    Dave T, I suspect that the embedded object is Base64 encoded rather than UUencoded, and Hyperion is probably right about it being an OLE DocFile (although I would have guessed EMF too & it might be an EMF file inside the OLE wrapper).

    Yuki, obviously somebody wants it, or there wouldn’t be implementation questions and examples. Sure, I’m leaning toward ODF myself for a number of reasons. If I do go with ODF, I expect I’ll have to deal with MS Office XML/HTML on occasion — and being able to transform the content into (and out of) my workflow is what’s important. If Microsoft is making it easier than it used to be, that’s wonderful.

  5. all hail chern says:

    rofflecopter, that’s hilarious!

    i work as a sysadmin in a well-known university and i think it would be wonderful if the good work of becta was extended! i’ll certainly be among those pushing hard for open formats (and not the idiotic microsoft interpretation thereof) throughout academia.

  6. BrianJones says:

    David, it’s actually just a zlib compressed wmf file.

    You can even ungzip an emz/wmz into an emf/wmf on the command line. To convert back, just gzip the file.

    In the new O12 formats, we’ll just keep them as emf/wmf since we are getting compression from the package format (ZIP). In the case of HTML and our XML though, we wanted to get some additional file size improvements.