Create a rich Word document based on your own custom XML (without the need for XSLT)


 

I hope everyone had a great new year. Sorry I’ve taken so much time off from blogging. I was pretty busy last week just getting caught up on e-mail. For those of you who posted comments, or sent comments to me directly, I’ll try to get to them all (sorry it’s taking so long). Last month was such a busy month with all the traveling for our work in Ecma and family time for the holidays that I quickly fell behind. Beta 1 of Office has been out for a couple months now, and I haven’t posted much content to help people use some of the new XML functionality in Office 12. Today, I want to post an example Word document that leverages the new storage we provide for custom XML and the integration of that XML store with a new feature called content controls. Anyone who has Beta 1 should be able to try this out.


There were are large number of scenarios we looked at when we first started our move towards strong custom XML support back towards the end of Office XP. Some of them were around making document generation much easier and more reliable. Other scenarios were around making the Office documents integrate in richer ways with business processes. There were a number of different exciting scenarios here, but this first example I’m going to show is really more around document generation. We often see people use the mail merge functionality in Word for more than just creating letters. It allows you to import data to create a document driven by that data. We’ve also seen people do this in Word 2003 using the XML file format in combination with XSLTs. We had been a bit naive in thinking that there would be a lot of folks out there building XSLTs for transforming their data into a rich Word document. There are plenty of people willing to do this, but it’s a lot of work, and often too advanced for the majority of people trying to build a solution.


Generate a rich document based on Custom XML without an XSLT!


I have an example I’ve demo’d at a number of conferences that I wanted everyone to get a chance to play around with. If you grab this ZIP file: http://jonesxml.com/resources/xmlMapping1.zip you’ll see a Word document and an XML file called “item1.xml”. Go ahead and open the Word document in Beta 1 and take a look. I have a couple things I’d like you to try:



  1. Close down the file in Word, and make a copy of the Word document. For the new copy, rename the extension of the file to “.zip”

  2. Crack open the file and navigate to the “customXML” folder. Notice the part called “item1.xml”. If you open that you’ll see an XML file with a number of custom XML tags that I created, but they are all empty.

  3. Open the “item1.xml” file that was in the original ZIP file you downloaded. Notice that it’s in the same namespace as the xml file you looked at in step 2, but it has values for each XML node.

  4. Delete the item1.xml file from the Word document from step 2, and replace it with the extra one from step 3.

  5. Now, change the extension of the Word document back to .docx and open it in Word. Notice that the document now has all the values from that new item1.xml file displayed directly inline in the Word document (you can open the original Word document as well if you want to compare the differences).

  6. Make some changes to those values and save the file again. Change the extension back to ZIP and go to the “item1.xml” part again and you’ll see that the XML file has the updated values based on the changes you made.

This is new functionality that leverages a couple new features. Content controls, the custom XML store, and the ability to map the content controls to nodes in the custom XML store all combined to give you this powerful data view separation.


Content Controls


Even without the XML mapping, the new set of features in Word called content controls make it much easier to structure a rich Word solution. Go ahead and open the original document you’d downloaded again. Notice that in the 2nd paragraph, you can only edit within specific regions. In that 2nd paragraph, there are a number of “content controls”, and then the entire paragraph has been “grouped”. By grouping the 2nd paragraph when I created the document, I made it so that the look and boilerplate text couldn’t be changed, and instead only the content of the controls could be edited. Some of the controls are just plain text, but notice that there are other types of controls as well. The date for example, has a calendar control that will drop down:


Developer Tools


There are a number of available content controls:



  1. Plain Text – The name is somewhat misleading. This control will take on the formatting that is applied to it while in design mode, so the template author can set up the look, and the end user can only edit the contents.

  2. Picture – This control can only contain a picture. When the user clicks on it, the “insert picture” dialog appears.

  3. Drop Down List – This one behaves similarly to the plain text control, since you can first set up what formatting you want applied, but in addition, you can also specify a list of values that the user is allowed to choose from.

  4. Calendar – The user will be given a calendar control to pick the date. You have a number of options here for how the date is formatted (M/d/yyyy; dddd, MMMM dd, yyyy; etc.).

  5. Combo Box – Just like a Drop Down List, except that the user can type in their own values as well as choose from a list you define.

  6. Rich Text – Behaves just like any other text in Word.

  7.  Building Blocks – This is another new feature that I’ll talk about later since it really deserves it’s own post(s).

These new controls, and the new “grouping” functionality make it really easy to design a template where you have some structured islands of information you want the user to fill out. Each control has it’s own independent settings as to whether it’s editable and whether or not it can be deleted. You can also specify placeholder text to be displayed when the contents of the control is empty.


If you are building a solution, the controls are also really helpful because they can be given unique names that you can use to easily address them in the Object Model. That also makes it really easy to get at them in the file format, since each control will be marked with XML structure. The part that I find most exciting about the controls though, is that you can map these controls to XML nodes in your own schema as we saw in this example.


Insert your own content controls


While I’ll need to cover this in more detail later, I did want to quickly explain how you can insert your own content controls. The first thing you’ll need to to is make sure that you have the “developer” tab showing in the ribbon. You can do this by going to File -> Word Options, and under the view settings choose “Developer Tools”:


Developer Tools


Now, click on the “Developer” tab, and you’ll see a chunk called “content controls”


Developer Tools


Developer Tools


With this, you can insert new content controls, as well as modify the properties of existing ones. Go ahead and play around with that a bit, and I’ll post some more information later on ways to work with the controls. Some of the other topics I’ll try to cover in the future in this area are:



  1. Using XML mapping and schema to drive the content for drop down controls. If you have a schema restriction, we can automatically use those retentions to populate the dropdown list.

  2. Using locking and groups to structure the document.

  3. Using building blocks to generate rich structures document fragments that can be easily inserted into a document and automatically bind to the custom XML already present.

  4. Bind content controls to document properties and SharePoint data. Have you ever had a document library in sharepoint and wanted the ability to map the column values directly into the content of the document? Well now you can set it up so that if the values are changed in SharePoint they will be reflected directly in the document, and if they are changed in the document, they will be reflected in the SharePoint library.

  5. Programmatic access to the custom XML store. You can set up all the mappings with the content controls, and then just program directly against the XML data. Anytime the user changes the values of one of the controls, it’s automatically pushed back into the node it’s mapped to, and an event is thrown. If you make a change to a node programmatically, then any content control mapped to that node will be automatically updated. This allows you to write your solution directly against the data, instead of against Word’s objects.

-Brian


(I almost forgot… go Seahawks!)

Comments (32)

  1. Do the content controls cover images? I’m hunting for a simple way to replace an existing formatted image in a document with a new image that gets the same layout and formatting and XML ought to make that much easier!

  2. BrianJones says:

    Hi Mary, there is a content control of type "image", but I’m not sure if that’s what your asking for exactly. Do you mean that you want to do this is just one specific file? Or you want an easy way of doing this in multiple files? What is it about your scenario that’s currently giving you trouble? Is the resizing the problem?

    -Brian

  3. Darryl Hover says:

    I’ve been waiting for this file since the PDC. 🙂 Thanks Brian!

  4. magua says:

    All I want to see is the code, which makes more sense commentary

  5. itsadok says:

    I think this is the most exciting feature of Office 12 I’ve seen yet, and that’s saying a lot! It’s the missing link between Word templates and custom database applications, and I’m sure it will be one of those things we’ll wonder how we ever managed without.

    This serves to further underline the sore lack of a proper Word 12 blog – after three "hello world" posts and one post ironically claiming there’s nothing you can add to Word, the guy disappeared. "Joe Friend" sounds like a made up name anyway…

    http://blogs.msdn.com/joe%5Ffriend/

  6. BrianJones says:

    I’ve been giving Joe a hard time about getting his blog going. He definitely has been intending to provide more information but it’s a really busy time for all of us in Office.

    I’ll let Joe know you’d like to see some content though. 🙂

    I’m glad you’re excited about the content controls. They are some of my favorite features. One thing I do want to point out though as a limitation of the content control XML mapping capability is that it doesn’t support repeating content. So it’s really valuable when connecting to a specific row of a database, if you want to do more traditional reporting type of functionality than you’ll see that limitation pretty quickly. It’s something I really wish we’d been able to do, but just didn’t have the time to fit it in. It’s still an extremely powerful feature, and if you really want to do repeating you can write some custom logic, but it won’t work automatically.

    -Brian

  7. Chris Evans says:

    Brian, can content controls contain other context controls?

    — If so, that might be a workaround for repeated content.

  8. rodrigo says:

    This is a great post, thanks.

    I added a new content control to the document, but it didn’t show up on customxml/item1.xml…could you please give some instructions show I should proceed to accomplish this?

  9. rodrigo says:

    One more thing: will we have content controls in PowerPoint 12 too?

  10. BrianJones says:

    Chris, content controls of type "rich text" can contain other content controls, but that doesn’t help as much with repeating as it does with "grouping". By creating a group, it is easier to repeat a structured chunk, but it wouldn’t happen automatically.

    Rodrigo, you need to add the XML part through the Object Model. Once you do that, you can programatically set the content control to be bound to a node in your XML part.

    PowerPoint 12 will not have content controls, but they do support the storage of seperate XML parts. You won’t have the controls to bind to though…

    -Brian

  11. Randy Brown says:

    Brian,

    Can you clarify the following:

    "One thing I do want to point out though as a limitation of the content control XML mapping capability is that it doesn’t support repeating content."

    Does this mean that if I have a content control in several places in a Word Doc that represents the same info, and I programmatically change the XML node that the content is based on, not all content controls will update through-out the doc?  Or, are you saying that you cannot throw an IEnumerable type of object (like a dataset with mult rows) at the XML souce and expect the content controls to account for all the rows of data?

    Thanks Brian…

  12. BrianJones says:

    Hey Randy, it’s the later.

    If you are going more for a DB reporting type of scenario (with multi rows), then the mapping probably won’t work for you. You could probably get it to work, but it would require a good amount of additional code.

    You can have multiple controls mapped to one node, and they will all stay in sync.

    -Brian

  13. Group vows to maintain fight against community park hospital

    The group opposing the construction of Denmark’s new hospital on the town’s community park says it will use every avenue of appeal to block the project.

    Last month, Denmark council deferred the rezoning of the community park in order to reconsider the Denmark Country Club as a possible site, but the Health Department has warned against further delays.

    Friends of the Community Park spokesman Clive Malcolm says if delays are the biggest problem, the department should use the country club site, which would attract less opposition.

    Mr Malcolm says it has already submitted an appeal to the Environment Minister, which could take weeks to deal with.

    "Then the decision could be to go out to public environmental review and that would also take many weeks and these processes could be very greatly reduced if they were to change now to a different site," he said.

    Denmark Shire president Kim Barrow has attacked the group’s tactics, saying they are not in the community interest.

    Mr Barrow says although he recognises the group’s right to appeal, every objection it has raised so far has been dismissed.

    "Well, I think Clive is trying to hijack the whole project using mafia tactics. There’s a small pressure group working against and trying to delay the whole project, they’re trying to hold the community to ransom," he said.

    Denmark council will consider the issue at a meeting this evening.

  14. Zwah says:

    – Rodrigo, you need to add the XML part through the Object Model. Once you do that, you can programatically set the content control to be bound to a node in your XML part.

    Can you provide more information on this please? In this XML I have found items like

        <w:dataBinding w:prefixMappings="xmlns:ns0=’http://contoso.com/2005/contracts/commercialSale‘" w:xpath="/ns0:contract[1]/ns0:placeExecuted[1]" w:storeItemID="{C65DD089-F388-4A84-8443-BC4CB07DEB45}" />

    which appear to be what is required to map the xml to the content controls, but can’t work out how do actually do it.

  15. Ed Richard says:

    Thanks for this info Brian, whish I’d seen this b4 DevCon 2006 so I could have asked you about it. As it turns out I only started looking for info on Content Controls after the conference 😉

    But I agree very exiting addition, however not having repeating groups is a disappointment.

    Anyway, now we do really need a post on mapping,… please?

  16. Links to blog posts that contain useful technical information for developers.  Open XML is a new standard, but there’s some good information already available if you know where to look.

  17. willib says:

    Hello

    will it be possible to map to boolean XML datatypes?

  18. I returned to Microsoft  (after a 7 year hiatus) in late 2003 just as the Office 2007 effort was getting…

  19. Joe Friend has finally made it public that there will be built in blog functionality&amp;nbsp;in Word 2007!&amp;nbsp;I…

  20. In general, I prefer to sit back and watch the comments come in and respond to them in future blog posts….

  21. Chris Evans says:

    Hey Brian — I’ve been playing with the Content Controls and the Rich Text control doesn’t seem to allow for multiple lines of Rich Text.  Am I doing something wrong?

  22. Chris Evans says:

    Er’… Sorry Brian… Must have been some kind of bug.  I dropped a new one and it allowed multiple lines at that point

  23. Mahesh says:

    Hi,

    Do u know how to insert the any other word file content in any word file using wordprocessingML in word 2003?

    I have senorio like i will have 1 xml file which will have content and using xslt i will convert to wordML in that xml file iwl have any other word file path which content i have to addin target wordml file.

    so i am looking for, does it possible to insert that content of word file using file path in wordml applying xslt?

    u can send me mail at mpatil@investec.co.za

    Thanks in advance

    Rgds

    Mahesh

  24. This is the third post by Zeyad Rajabi who owns the XHTML output from Word’s new blogging feature. In…

  25. Martin Nuss says:

    Hi,

    I am working on the same problem as rodrigo and Zwah.

    "- Rodrigo, you need to add the XML part through the Object Model. Once you do that, you can programatically set the content control to be bound to a node in your XML part."

    Can you provide more information on this please? In this XML I have found items like

       <w:dataBinding w:prefixMappings="xmlns:ns0=’http://contoso.com/2005/contracts/commercialSale‘" w:xpath="/ns0:contract[1]/ns0:placeExecuted[1]" w:storeItemID="{C65DD089-F388-4A84-8443-BC4CB07DEB45}" />

    which appear to be what is required to map the xml to the content controls, but can’t work out how do actually do it.

    I would be very thankful for some detailed information about the link between the content control and the customXML property…

    Thanks in advance

    Martin

  26. If you’re heading out to TechEd this week like I am, you should definitely plan on attending Tristan…

  27. karthikonmsdn says:

    Dear All,

    One thing i have to clarify that "Is it possible to open and edit the Office 2007 generated word document in previous version of Office. [May be this question is silly].

  28. Dominique says:

    I can’t figure out how one can do point 3 discussed in point 3.

    "Using building blocks to generate rich structures document fragments that can be easily inserted into a document and automatically bind to the custom XML already present"

    A building block can contain content controls but the declarative mapping of your custom xml data to existing content controls happens in the document.xml. This is ok for existing controls. But when a user inserts a building block with a content control how can you map in the document.xml to that control that wasn’t  there yet in the first place  ?

  29. Denise says:

    In Word 2003, is there a way when creating/editing xml documents based on a custom schema to have drop-down boxes with the enumeration values from the schema?

    Thanks,

    Denise

  30. Bruce says:

    I have been sending out newsletters created in Word 2002.   I was able to create great looking emails and send them to a short list of people without doing mailmerge.

    Using Office 2002 I was able to click on the email icon and place the document in the body of my email.  When I do this in Office 2007 it automatically attaches the document to the email.

    Is there a way to use a document as the email body in Office 2007 without putting the document as an attachment?

  31. I’ve talked a lot about the value of "Custom Schema" support in Office. Anytime I give talks on the file