OpenXML: How to refresh a field when the document is opened


logo_Office_2010 I
was working on an internal project a bit ago and one of the requirements was to implement
a fancy Word document.  The idea was that all of the editing of the text/code
samples/etc. would be done in the application and then the user could just export
it to Word to put any finishing touches and send off to the customer.  The final
report needed to include section headers, page breaks, a table of contents, etc. 
There are a number of ways we could have accomplished the task.  There’s
the Word automation stuff that relies upon a COM based API
, there’s the method
of just creating an HTML document and loading that into Word and then finally there’s
the Open XML API
.  Now, someone had hacked up a version of this export functionality
previously using the Word automation stuff but considering we’re often dealing with
1,000+ page documents – it turned out to be a little slow.  Also,
there are some restrictions around using the automation libraries in a server context.
 
Lastly, since my OpenXML kung-fu is strong, I thought I would take the opportunity
to implement a better, more flexible and much faster solution.  For
those just starting out, Brian and Zeyad’s excellent blog on the topic is invaluable

One of the requirements for the export operation was to have Word automagically refresh
the table of contents (and other fields) the first time the document is opened. 
This was something that took a bit of time to research but you really end up with
2 options:

w:updateFields Element

The “w:updateFields” element is a document-level element that is set in the document
settings part and tells Word to update all of the fields in the document:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

<w:settings>
<w:updateFields w:val="true" />

</w:settings>

If you’re wondering what the document settings part is – just rename a Word doc from
blah.docx” to “blah.docx.zip” and extract it to
a folder on your computer.  In the new folder is a directory called “word”. 
In that directory, you should see a file called “settings.xml”:

image In
that file are all of the document level settings for your docx.  There’s
some really great stuff in here

If you’d like to use the OpenXML
SDK to set that value
(and you’d be crazy not to), here’s some sample code:

using (WordprocessingDocument
document = WordprocessingDocument.Open(path, true))
{

DocumentSettingsPart settingsPart =
document.MainDocumentPart.GetPartsOfType<DocumentSettingsPart>().First();

// Create object to update fields on open
UpdateFieldsOnOpen updateFields = new UpdateFieldsOnOpen();
updateFields.Val = new DocumentFormat.OpenXml.OnOffValue(true);

// Insert object into settings part.
settingsPart.Settings.PrependChild<UpdateFieldsOnOpen>(updateFields);
settingsPart.Settings.Save();

}

w:dirty Attribute

This attribute is applied to the field you would like to have refreshed when the document
is opened in Word.  It tells Word to only refresh this field the next time the
document is opened.  For example, if you want to apply it to a field like your
table of contents, just find the w:fldChar and add that attribute:

<w:r>

<w:fldChar w:fldCharType="begin" w:dirty="true"/>
</w:r>

For a simple field, like the document author, you’ll want to add it to the w:fldSimple
element, like so:

<w:fldSimple w:instr="AUTHOR
\* Upper \* MERGEFORMAT"

w:dirty="true" >
<w:r>
...
</w:r>
</w:fldSimple>

A caveat or two

Both of these methods will work just fine in Word 2010. 

In Word 2007, though, you need to clear out the contents of the field before the user
opens the document.  For example, with a table of contents, Word will normally
cache the contents of the TOC in the fldChar element.  This is good, normally,
but here it causes a problem. 

For example, in a very simple test document, you would see the following cached data
(i.e.:  Heading 1, Heading 2, etc.):

<w:p w:rsidR="00563999" w:rsidRDefault="00050B09">

...
<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r w:rsidR="00563999">
<w:instrText xml:space="preserve"> TOC \* MERGEFORMAT </w:instrText>
</w:r>
</w:p>
<w:p w:rsidR="00F77370" w:rsidRDefault="00F77370">
...
<w:r>
...
<w:t>Heading 1</w:t>
</w:r>
...
</w:p>
<w:p w:rsidR="00F77370" w:rsidRDefault="00F77370">
...
<w:r>
...
<w:t>Heading 2</w:t>
</w:r>
...
</w:p>
<w:p w:rsidR="00F77370" w:rsidRDefault="00F77370">
...
<w:r>
<w:rPr>
<w:noProof/>
</w:rPr>
<w:fldChar w:fldCharType="end"/>
</w:r>
</w:p>

After you clear out the schmutz,
you end up with just the begin element, the definition of the TOC and the end element:

<w:p w:rsidR="00563999" w:rsidRDefault="00563999">

...
<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText xml:space="preserve"> TOC \* MERGEFORMAT </w:instrText>
</w:r>
</w:p>
<w:p w:rsidR="00B63C3C" w:rsidRDefault="00563999" w:rsidP="00B63C3C">
<w:r>
<w:fldChar w:fldCharType="end"/>
</w:r>
...
</w:p>

Once you’ve made the updates, you can safely open up your file in Word 2007 and your
fields will update when the document opens.

Big thanks for Zeyad
for his tip on trimming out the schmutz.

Just to stress, this is improved in Word 2010 and you no longer need to clear
out the cached data in your fields.

Enjoy!


Comments (0)

Skip to main content