Capturing XML and Inserting XML in a Word Document


In almost any solution you build in Word, there will be times that you either want to capture the content from part of the document, or insert some additional content at some location in the document. One example scenario I’ve seen for this are clause libraries where folks want to store a collection of document fragments that can be dynamically inserted into a new document based on what the user is writing about.

In Word, it’s easy to both store these fragments as well as insert them into new documents. I already talked a bit about how the cfChunk element makes it easy to insert these fragments if you want to do it on a server (or anywhere that the Word OM isn’t available). If you do have your solution running in Word though and have access to the OM, you can use the .xml property and .insertXML method off the range object.


range.xml


Take any document and open it up in Word 2003. Make a selection that you want to store, and then bring up VBE (Alt + F11). Go to the immediate window (Ctrl + G) and type msgbox selection.range.xml and press return. The message box can’t display the whole output, but you can see that you are returned a string that represents the selection you made in the WordprocessingML format. You can take this string and store it off as it’s own XML document if you want to store it and re-use it in other documents. You can also load it into an XML DOM and make modifications to it, then reinsert it into the document. The way you insert the XML is through the .insertXML method:


range.insertXML


Take the same document and place your cursor somewhere that you want to insert some rich text. Go back into the immediate window in VBE, and type selection.Range.InsertXML (“<foo>My text</foo>”) and press return. You can see that your text (and the XML tag <foo>) have been inserted in your document where your cursor was. If you don’t see the <foo> tag, just press CTRL + Shift + X to turn the XML tag view on.


In this example we inserted some data-only XML. You can also insert WordprocessingML if you want to apply formatting. With the insertXML method you can insert as much rich content as you want. It could be an entire other document if that’s what you wanted to insert. This method gives you the ability to allow the users of your solutions to quickly build up a document from preexisting content.


-Brian

Comments (14)

  1. wpoust says:

    I hope in the near future the RichText box provides a XML format. Currently, we either use plain text from it to insert into a Word range, or RTF. The issue we come up against is only wanting some of the font attributes, like bold, underline, and italics. There’s not a easy way to modify the RTF.

    If we could get XML from the RichTextBox, we might be able to easily perform a transform on it and insert the XML into the Word doc.

  2. David Giusto says:

    I must have forgotten about range.xml or maybe it just that I haven’t used it in a while. .insertXML on the other hand is something I use on a regular basis. I assume that range.xml is what is used in the XML ToolBox for View XML. If it is There are a few issues about consistency between what you see in the View XML window and what you see if you save the same document as XML and then view that. For instance if you have multiple list instances that make up a single multi-level list in the view it is a single list instance but in the save it is the multiple instances. There are a number of these unexplained anomalies between viewed XML and saved XML – any insight?

    Is the source for the XML ToolBox available? It does a number of things that I would have liked to see examples of. Back in January we have the party-line from John Durant on this topic in his reply to Michael. I’m guessing it hasn’t changed.

    http://blogs.msdn.com/johnrdurant/archive/2005/01/20/357781.aspx

    Ok so now you have explained <w:cfChunk> (see 7/20) that can be used in the source XML stream and range.insertXML that can be used in a program behind word. Do these two actually have the same functionality when it comes to merging the style information of the inserted XML and the main document instance? Just puttin’ 2 & 2 together here.

  3. Ankesh Mehta says:

    How do I do this… I have a large document and I want to insert smart tags (ns0:documentSection) in the document on click of a button on the action task pane.

    say the document is like

    Heading 1

    some text

    Heading 1.1

    some more text

    Heading 1.2

    some more text

    Heading 2

    text here.

    I want to tag this document as follows

    <documentSection>

    Heading 1

    some text

    <documentSection>Heading 1.1

    some more text

    </documentSection>

    <documentSection>Heading 1.2

    some more text

    </documentSection>

    </documentSection>

    <documentSection>

    Heading 2

    text here.

    </documentSection>

    I want this so that I should be able to delete heading 1.1, heading 1.2 and all the text related to it if the user deletes heading 1 (using a tree control on the document action pane).

    Thanks,

    Ankesh

  4. David Giusto says:

    Ankesh – Using the tree control in the document actions pane for this is OK but there is no context, you can not see the heading numbers etc. You can select the range of the XML tag and all its contents and then press delete to remove the section. If your sections are longer than a screen you will not see the section numbers and will have to scroll to make sure you have selected the correct content. It sure would be nice if the view was scrolled to the beginning of the selection not the end when using the task pane to select XML nodes. It would also be nice if there was some context there as well. All you will see is a long list of <documentSection> tags.

    All that aside this is how you do it:

    The hard part is finding all the ranges where you want to insert your tags. You will need a combination of range.xml and range.insertXML in something like this:

    range.insertXML(range.XML, c:insertDocSect.xls)

    insertDocSect.xls is a transform you need to make that should add a <ns0:documentSection> inside of <w:body>.

    This basically is the same as selecting a range and ‘applying XML’ as you do from the right click context menu when a schema is attached. Of course you will have to insert a toplevel tag around the entire document before inserting more than one <ns0:documentSection> tag. I think you can do this with out having a schema attached. This can be done as a smartDoc or just a Word Plug-in.

  5. David Giusto says:

    Ankesh – The way to do it without a transform is to use this:

    Dim node As Word.XMLNode = range.XMLNodes.Add(Name:="documentSection", Namespace:="http://yourSpaceHere&quot;)

    To add attributes use this:

    Dim attr As Word.XMLNode = node.Attributes.Add("some_name", Nothing)

    attr.NodeValue = "some value"

    Still the hard part is finding all the ranges.

  6. Bob Kotl says:

    I’d like to post some my observations upon InsertXML method. I develop an automated job place for people processing rather large documents in WordprocessingML format (up to 40 MB). So, some operations take all document content in XML format, perform some modifications and insert fixed content back into Word by InsertXML method. Approximately after 500 megabytes have been passed through InsertXML, this method throws an exception, and almost nothing can be peformed in Word…

    Is it a kind of InsertXML’s volume "limit" ?

  7. Markus says:

    InsertXML only seems to apply to a single range. But what if I have a document (.doc) with a schema defined and with some text outside the XML-tags and want to fill the document with business data within the tags. Do I have to assing every node and range with its XML or could I in any way assign a xml-datafile as a datasource to the document?

  8. Brad Morgan says:

    Hi Brian,

    I just discovered your blog and have found a lot of very useful information.

    One point on the capturing of XML chunks from a Word document. I’m working with a document that contains a lot of styles. When I use range.xml to capture a fragment of the document, all of the parent document’s style, font, and list information is included in the resulting WordML document. With the example I’m working with, capturing one line of text results in a 73KB XML file.

    When capturing XML, it would be nice to have the option to include only styles, font definitions and list definitions that are actually used in that part of the document in order to keep the resulting XML size smaller.

    Thanks for keeping us posted on all of the latest Office developments. Very much appreciated.

  9. Carmenm says:

    I would like to use selection.insertxml(str, transform) to insert xml into my word document. str is a string with some xml information like str="<tag1>some text</tag1>". I have an xsl file containing a wordml transformation that I would like to apply to str when inserting into the document. When I try to do this I get compile errors in my vba macro. Any suggestions would be appreciated.

  10. BrianJones says:

    Markus, this will be possible in Office 12. See this post: http://blogs.msdn.com/brian_jones/archive/2006/01/09/CustomXML1.aspx

    Brad, we’re actually working on improving this in Office 12. It should be set up so that you’ll only get the styles that are in use in the range you selected.

    I’m glad you’re enjoying the posts!

    Carmenm, I’m not sure what the problem is. Have you tried opening the XML file in Word as a seperate file and applying the XSLT? You should try to narrow it down a bit more and see if it’s an issue with your XSLT, with Word, or with the insertXML method.

    Sorry I can’t help more.

    -Brian

  11. Jesse says:

    Hi Brian,

    Is there a way on the XML side to completely replace the WordML of a target by the WordML of a source document via automation.  

    What I’d like to do is (a)  receive a WordML string from someplace that represents a complete wordML document, (b) Create a new Word document via automation; and (c) use insertXML() method to completly replace everything about the empty document (the target)  with everthing in the source WordML string.   It does not work–just wondering if the basic idea is feasible, or some alternate  approach, exists–I’d like to edit WordXML documents locally without using local files in the process.    Thanks

  12. BrianJones says:

    Jesse,

    I’m curious why you’re saying that it’s not working now. You should be able to do insertXML on document.range and it should replace everything. It is working, but just not quite as you expect?

    -Brian

  13. Keith says:

    Anybody out there get a yes/no message box when using Word.Range.XML telling you "Recording clipboard style sheet will require copying many styles. Do you want to use Normal style instead?" Any ideas about how to disable this alert while answering "No" to preserve the styles?

    Thanks.

  14. Louie says:

    I have Word 2003 on my XP SP2 notebook that came as part of Microsoft Office Basic 2003. I’ve also installed the Office 2003 PIA to use the insertXML method to insert XML into a word document. When I use this method I get the following error "This method or property is available only in Microsoft Office Professional Edition 2003". Is this truly a requirement to use this method or is this a bug.