Intro to Word XML Part 5: Opening custom XML

[This post was removed due to legal concerns]

Comments (5)

  1. David Giusto says:


    The fall back transform c:Program FilesMicrosoft OfficeOFFICE11XML2WORD.XSL

    is not used if your XML contains a non-namespaced <body> tag.

    The simple case:





    Yields very different results than:





    It turns out that if <body> is anywhere in the XML stream word just takes the content (all the text() nodes) up to the end </body> but not anything after it. Open this with Word and where is "MNOP QRST":











    Now change <body> to <xbody> and try it again.

    This is not really an issue unless you grab part of an HTML page or your data model includes <body> – Just keepin’ you accurate.


    P.s. Be sure to read my comment on the 7/26 topic:

  2. BrianJones says:

    Hey Dave, that’s very observant. Have you figured out yet why that’s happening? It’s because Word thinks the file is an HTML file, and not an XML file.

    In Word, we don’t really pay attention to the file extension. Instead we sniff through the file and see if we can figure out what it is. Take a .doc file and rename it to .xml. It will still open in Word without a problem.

    If you add the xml declaration <?xml version="1.0"?> to the top of your example file, then we’ll know it’s not an HTML document and open it properly. This would also happen if you’d used a namespace for the body tag.

    I also saw your comment on the other post yesterday. It was a great comment, and there were a number of things you said that I agreed with. Over the coming months I hope to drill in a lot deeper on subjects like bullets and numbers and complex formatting so other people can better understand how it works and how to take advantage of it. Thanks for your feedback!


  3. David Giusto says:

    Brian, I knew that there would be a reasonable explanation for this, I almost got there with the reference to HTML. It still seems odd that the content before the <body> is included but content after </body> is not. I guess that if I check the HTML spec I will find the <head> tag and friends may allow content. But that’s a topic for another day.

    So how did I get here? I was playing with the example on <w:cfChunk> from 7/20.

    It seems that your WordML pseudo code has a non-namespaced body tag – you get the picture.

    I had just read John Durant’s Blog on the cfChunk topic about an hour before yours and was left looking for an actual example since John’s description was a bit ambiguous – you know us engineers it all has to be very specific, pictures are good, examples are better. Thanks for the examples!! In all fairness John does give one of my projects a plug in his 8/9 blog topic on CALS tables.

    I have a comment about cfChunk but I’ll post it in that blog-lette so it’s in the right context. You may have to hop around a bit to follow my train of thought since the topics cross over so much and I want to post in the correct topic stream as I digress.

    Back on topic? – Bullets are not bad, numbered lists are a bit dicey, its hybrids and multi-level lists that will be a challenge to describe in the forum. There is an excellent explanation of this topic in the book by Simon, Evan, and Mary referred to in the 7/8 topic by Evan himself.

    The book is here: the sample chapter is a must read. At least read the last paragraph on page 67 (book not pdf page) it is eloquent!

    My issue with lists is that there are multiple ways to define the same structure which makes difficult to effectively convert the XML. I’m being careful not to use ‘transform’ here since the circular links between Styles, List instances, and List definitions are fairly complex for XSLT. These three objects are the poster children for argument that WordML is like a relational database. It is possible to create an XML instance that is schema compliant but not valid as a word document if all these pointers and links don’t jive. This is completely understandable when one realizes WordML is based on an object model and is much more than just a ‘document instance’.

    I can’t tell you how many deer-in-the-headlights looks I get when I try to explain the difference between Document XML and Database XML or a hierarchy and a relational model to someone who is familiar with traditional XML document publishing. It always gets to terms that they can understand – I will say “it’s like the difference between newspapers and magazines”. They are both words printed on paper but they have very different editorial, production, and printing processes with very different applications and target audiences

  4. What a busy week. I’ve been trying to keep up with all the news while also getting ready for PDC (and…

  5. I’ve had a few folks ask me about the XML format from Word 2003, and whether or not it would be supported…