Mail Merging Into a Single Word Document Using XML Literals

With the release of Microsoft Office 2007 we can work with a much simpler, standard, XML format called Open XML which opens the door for many types of applications that cannot work via COM. What if you needed to build a scalable web service that processes many documents in high volume? What if you wanted to quickly read or write to these formats from a client application but wanted to have minimal dependencies on other applications? These types of programs do not want to require Microsoft Office be installed to run. The cool thing is you have the tools already with Visual Basic 2008. XML Literals are an easy way to manipulate any kind of XML, including Open XML.

I’ve written before on how to use XML Literals to manipulate Word documents, if you missed them:

If you aren’t familiar with XML Literals or Open XML then I’d suggest reading the article in the first bullet above first. It’s a great intro into XML Literals, Open XML and it also demonstrates a couple practical ways of reading and writing to Word document formats directly.

Recently I got a great question on how to change the program outlined in the last bullet above to merge letters we were creating on the fly from a database into a single document instead of multiple documents like we’re doing. This would make it much easier for an end user to print them. That’s such an obvious thing I can’t believe I didn’t think of that! For those of you that aren’t familiar with what we did I’ll give a quick recap of that application.

Creating Word Documents with XML Literals

What I wanted to do is take all the Customers in the Northwind database who had some orders shipping today and send them thank you letters. To get started I created a new Word 2007 document with the letter text and some placeholder field names to indicate where I want the data:

image

Save the document (I named it Doc1.docx) and then rename the .docx extension to .zip and you can drill into the package and see a bunch of XML documents inside it. If we drill down through the zip file we'll see that the text we just typed is located in the \word\document.xml file. Copy this xml and paste it into the Visual Basic editor, it will infer it as an XDocument object, and then you can use embedded expressions to replace the placeholders. (If I’ve lost you, read this article which explains it step-by-step.)

In order to create letters for our customers in the database, we want to end up with a collection of XDocuments with our data merged into them. I created a simple class called Letter that has two properties, CustomerID As String and Document As XDocument. Then I wrote a LINQ query to select the data and embed it into the document (I’ve omitted all the WordProcessingML for clarity, take a loot at the code sample for the whole listing.)

 Dim letters = _
    From Order In db.Orders _
    Join Employee In db.Employees _
        On Order.EmployeeID Equals Employee.EmployeeID _
    Where Order.OrderDate IsNot Nothing AndAlso _
        Order.ShippedDate IsNot Nothing AndAlso _
        Order.ShippedDate.Value.Date >= #1/1/2007# _
    Let DateOrder = Order.OrderDate.Value.ToShortDateString _
    Let DateShip = Order.ShippedDate.Value.ToShortDateString _
    Select New Letter With { _
        .CustomerID = Order.Customer.CustomerID, _
        .Document = <?xml version="1.0" encoding="utf-8" standalone="yes"?>
    <w:document ...
    <w:p w:rsidR="00705CFF" w:rsidRDefault="00112228" w:rsidP="00AA2EC6">
        <w:r>
            <w:br/>
        </w:r>
        <w:r w:rsidR="007A5236">
            <w:t xml:space="preserve">Dear </w:t>
        </w:r>
            <w:proofErr w:type="spellStart"/>
        <w:r>
            <w:t> <%= Order.Customer.ContactName %> </w:t>
        </w:r>
        <w:proofErr w:type="spellEnd"/>
            <w:r w:rsidR="00AA2EC6">
                <w:t>,</w:t>
            </w:r>
     </w:p>
     <w:p w:rsidR="00E04FB0" w:rsidRDefault="00AA2EC6" w:rsidP="00AA2EC6">
        <w:r>
            <w:t xml:space="preserve">We’d like to inform you that the order you placed on </w:t>
        </w:r>
        <w:proofErr w:type="spellStart"/>
            <w:r w:rsidR="00112228">
                <w:t> <%= DateOrder %> </w:t>
            </w:r>
        <w:proofErr w:type="spellEnd"/>
            <w:r w:rsidR="00806521">
                <w:t xml:space="preserve"> has shipped on </w:t>
            </w:r>
        <w:proofErr w:type="spellStart"/>
            <w:r w:rsidR="00112228">
                <w:t> <%= DateShip %> </w:t>
            </w:r>
... </w:document>}

The query returns an IEnumerable(Of Letter) objects with the Document property set to the document.xml data for that customer. The next step involved taking the template Doc1.docx file, making a copy of it and then simply replacing the document.xml part inside the package with the one here in the Letter class for each letter. However, what I did was create separate letters for each customer. Instead we want to create one .docx file with ALL the letter data merged inside.

Merging into a Single Word Document

The first thing we need to do is insert a page break between all our letters. The easiest thing to do is to open back up your template Doc1.docx file in Word and insert a page break, save it, and then look at the document.xml again in the package. You should see this WordProccessingML element inserted near the end:

 <w:p w:rsidR="00622A50" w:rsidRDefault="00622A50">
    <w:r>
        <w:br w:type="page"/>
    </w:r>
</w:p>

Paste this into the query above in the document exactly where you pulled it out of the document.xml.

Next we have a couple options on how to manipulate the package (docx file). I’ve shown how to use the Open XML SDK before to manipulate documents but in this simple case we can use the System.IO.Packaging classes directly. This is because all we’re doing is working with text in the document. If we were working with images or embedded objects it would be easier to use the SDK. In the next post we’ll add photos of the employees on each order into the document, but for this post let’s just focus on how to merge the text.

So the first thing we need to do is Import some namespaces including an XML namespace that we’re going to use:

 Imports <xmlns:w="https://schemas.openxmlformats.org/wordprocessingml/2006/main">
Imports System.IO
Imports System.IO.Packaging

Next we need to set up some variables and copy the Doc1.docx template to a new file I’m calling AllLetters.docx.

 Dim uri As New Uri("/word/document.xml", UriKind.Relative)
Dim contentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"
Dim sourceFile = CurDir() & "\Doc1.docx"
Dim letterDir = CurDir() & "\Letters\"
My.Computer.FileSystem.CreateDirectory(letterDir)

Dim mergeFile = letterDir & "AllLetters.docx"
My.Computer.FileSystem.CopyFile(sourceFile, mergeFile, True)

Now we can open the template package and load the main document part as an XElement:

 Using p As Package = Package.Open(mergeFile)

    'get the main document part (document.xml)
    Dim mainPart = p.GetPart(uri)
    Dim mainDocumentXML As XElement




    Using sr As New StreamReader(mainPart.GetStream)
        mainDocumentXML = XElement.Load(sr)
    End Using

The next part is where it gets fun. A word document has a top element structure like this:

 <w:document>
    <w:body>
       ...
    </w:body>
</w:document>

So what we need to do is grab the body of the template, replace it with the first customer’s document body we have and then append the rest of the elements inside the bodies of the rest of the documents. A word document can only have one <w:body> element for it to be legal. The way we get the <w:body> XElement from the document is using the descendants syntax with the three dot notation (…):

     Dim mainBody = mainDocumentXML...<w:body>.First()
    Dim i = 0

    For Each letter In letters
        Dim nextBody = letter.Document...<w:body>.First()
        If i = 0 Then
            'Replace the first body contents in the template
            mainBody.ReplaceNodes(nextBody.Elements())
        Else
            'Append the new contents for the rest of the customers
            mainBody.Add(nextBody.Elements())
        End If
        i += 1
    Next

Now that we’ve built up the right body with all our letter data in it, we can replace the main document.xml part in the package and close it.

     'Delete the current document.xml file in the template
    p.DeletePart(uri)
    'Replace that part with our new merged XDocument
    Dim replace As PackagePart = p.CreatePart(uri, contentType)

    Using sw As New StreamWriter(replace.GetStream())
        mainDocumentXML.Save(sw)
        sw.Close()
    End Using

    p.Close()
End Using

Now when we run this, all the letters will be mail merged into the same document for easy printing:

image

As I mentioned if we have embedded images or objects that we need to replace it gets a little trickier. In the DevProConnections Article I have an example of how to create multiple documents with embedded pictures of the employee’s photo. In the next post I’ll show you how we can use the Open XML SDK to create a single document with embedded pictures as well.

Until then, have a look at the complete code I put up on Code Gallery.

[UPDATE: Merging Text & Photos into a Word Document using Open XML SDK]

Enjoy!