Merging Text & Photos into a Word Document using Open XML SDK

Yesterday I posted about how we could create a letter generator (mail merge) that took data from a database to create a Word 2007 document using the System.IO.Packaging class. I showed how to take data from Northwind using a single LINQ query to create XDocument objects representing letters to customers and then create a single document of all the data using XML Literals.

Today I want to show how we can embed images into the documents as well as text. Since we’re using Northwind for this example, we’ll use the Employee.Photo field from that database.

The Letter Template with Photo

The first thing to do is create a letter template with our field placeholders as well as an image placeholder. We’ll also need to insert a page break at the end because we’re going to merge all the letters into a single document for easy printing:

image

The next step is to grab the document.xml from this docx file. Because it’s an Open XML package, as I showed in yesterday’s post, we can just rename this file with a .zip extension temporarily and drill into the word\document.xml. However, there’s an easier way to work with these packages by installing the Open XML package editor Visual Studio add-in from the VSTO Power Tools download.

imageOnce you install the VSTO Power tools, save this template in your project. Then you can just double-click on the .docx file in the Solution Explorer and it will open the package editor (it also works with .pptx and .xslx files). Now you can drill into the document.xml directly in Visual Studio and inspect all the parts of the package easily.

The difference between this template and the template we used yesterday is there is an image imbedded into the document. So you will see an image embedded into the package in the media folder. This is referred to as an “image part”. This part is linked to the “main document part” which is the word\document.xml file. The Open XML package editor shows the links (relationships) to all the parts. When you look at the relationship properties you’ll see that it has a unique ID. This is the same ID that you will see in the main document part at the point where we embedded the image, in our case it’s set to rId4. (I’ve omitted all the WordProccessingML for clarity):

 <?xml version="1.0" encoding="utf-8" standalone="yes"?>
<w:document ...
  <w:body> ...
         <w:drawing> ...
               <a:graphic xmlns:a="https://schemas.openxmlformats.org/drawingml/2006/main">
              <a:graphicData uri="https://schemas.openxmlformats.org/drawingml/2006/picture">
                <pic:pic xmlns:pic="https://schemas.openxmlformats.org/drawingml/2006/picture">
                  <pic:nvPicPr>
                    <pic:cNvPr id="0" name="image1.png"/>
                    <pic:cNvPicPr/>
                  </pic:nvPicPr>
                  <pic:blipFill>
                    <a:blip r:embed="rId4"  cstate="print"/>
                    <a:stretch>
                      <a:fillRect/>
                    </a:stretch>
                  </pic:blipFill>
                  <pic:spPr>
                    <a:xfrm>
                      <a:off x="0" y="0"/>
                      <a:ext cx="1829055" cy="2124372"/>
                    </a:xfrm>
                    <a:prstGeom prst="rect">
                      <a:avLst/>
                    </a:prstGeom>
                  </pic:spPr>
                </pic:pic>
              </a:graphicData>
            </a:graphic> ...
          
        </w:drawing> ...
         <w:p w:rsidR="00622A50" w:rsidRDefault="00622A50">
            <w:r>
                <w:br w:type="page"/>
            </w:r>
        </w:p> ...
   </w:body>
</w:document>

Installing the Open XML SDK

As I mentioned yesterday, manipulating the package directly with the System.IO.Packaging classes gets a little bit tricky when you have to start adding embedded images and objects and linking them together. Instead, it’s easier to use the Open XML SDK to do this because it will handle linking new image parts we add to the package automatically for us. The SDK provides a strongly-typed way of accessing the parts in a package so it cuts down on the amount of code you have to write yourself.  Version 1.0 requires only .NET 2.0, Version 2 requires .NET 3.5 and uses LINQ to provide more functionality when querying the parts. This version is currently a CTP. For what we’re doing you can use either version but I have Version 2 installed.

The assembly that is installed in the GAC when you install the SDK is called DocumentFormat.OpenXml.dll and you’ll need to add a project reference to this. When you distribute your application you can simply xcopy the assembly with your application for easy deployment. (After you add the reference, select it in the Server Explorer and then in the Properties window set Copy Local to True in order to have it outputted to the \bin folder when you build your application.)

Now we can add the namespace imports we’ll be working with at the top of our program, including the DocumentFormat.OpenXml.Packaging namespace:

 Imports <xmlns:w="https://schemas.openxmlformats.org/wordprocessingml/2006/main">
Imports System.Xml
Imports System.IO
Imports DocumentFormat.OpenXml.Packaging

Modifying the Letter Class and LINQ Query to Get the Photo

In order to create letters for our customers in the database, we want to end up with a collection of XDocuments with our data merged into them. I created a Letter class that has two properties, CustomerID As String and Document As XDocument. We’re also going to add a new property for the photo byte array called EmployeePhoto As Byte() and OrderID as Integer. Next we need to modify the query so that it selects the photo as well as add the above WordProcessingML into our XML Literal to specify where the image should appear in the document. Because the relationship IDs must be unique, I’m going to use the OrderID as the suffix for the relationship ID for each of the images we want to embed. Like before, we can use embedded expressions to merge the data into the right places in the XDocument.

(I’ve omitted all the XML with the embedded expressions for clarity, take a loot at the code sample for the whole listing.) The bolded areas show the modified sections of the query:

 Dim letters = _
  From Order In db.Orders _
  Join Employee In db.Employees _
    On Order.EmployeeID Equals Employee.EmployeeID _
  Where Order.OrderDate IsNot Nothing AndAlso _
    Order.ShippedDate IsNot Nothing AndAlso _
    Order.ShippedDate.Value.Date >= #1/1/1998# _
  Let DateOrder = Order.OrderDate.Value.ToShortDateString _
  Let DateShip = Order.ShippedDate.Value.ToShortDateString _
  Select New Letter With { _
    .CustomerID = Order.CustomerID, _
     .OrderID = Order.OrderID, _<br>    .EmployeePhoto = If(Employee.Photo IsNot Nothing, Employee.Photo.ToArray, Nothing), _ 



    .Document = <?xml version="1.0" encoding="utf-8" standalone="yes"?>
        <w:document ... 
            <w:body> ...
                <w:p w:rsidR="00A9498E" w:rsidRDefault="00A9498E" w:rsidP="00AA2EC6">
                    <w:r>
                        <w:t xml:space="preserve">If you have any issues please contact </w:t>
                    </w:r>
                    <w:proofErr w:type="spellStart"/>
                    <w:r>
                        <w:t><%= Employee.FirstName & " " & Employee.LastName %></w:t>
                    </w:r>
                    <w:proofErr w:type="spellEnd"/>
                        <w:r>
                            <w:t xml:space="preserve"> at</w:t>
                        </w:r>
                        <w:r w:rsidR="005F35D6">
                            <w:t xml:space="preserve"> ext.</w:t>
                        </w:r>
                        <w:r>
                            <w:t xml:space="preserve"></w:t>
                        </w:r>
                        <w:r w:rsidR="005F35D6">
                            <w:t><%= Employee.Extension %></w:t>
                        </w:r>
                        </w:p> ...
                         <w:drawing > ...                            <pic:blipFill
 > ...<br>                                <a:blip r:embed=<%=  "rId" & Order.OrderID %>  /> ...                        </w:drawing
 > ...                        <w:p w:rsidR="00622A50" w:rsidRDefault="00622A50" 
 ><br>                            <w:r ><br>                                <w:br w:type="page"  /><br>                            </w:r ><br>                        </w:p > ... 
                    </w:body>
                </w:document>}

The query returns an IEnumerable(Of Letter) objects with the Document property set to the document.xml data for that customer and the EmployeePhoto set to the photo byte array. If we wanted to create separate documents (which is exactly what I demonstrated in my DevProConnections article) it’s actually pretty simple because all we need to do is copy the template and then replace the document.xml and the image data, we don’t have to mess with the relationship ID at all. However if we want to create a single document with all our letters we need to delete the placeholder image part and add new unique image parts for the Letter.EmployeePhoto.

Adding and Linking the Images into a Single Word Document

Because relationship IDs must be unique in a document I used the OrderID as the embedded expression for the r:embed attribute in the XML Literal above. You can choose to use any unique ID you want but it must start with an alpha character, that’s why I prefixed it with “rId”. When you use the Open XML SDK to add the new image parts we have to specify the relationship ID to use. To update the code we wrote yesterday to work with the Open XML SDK and also work with images we first need to add a call to open the Word document template, grab the main document part and then delete the image part by calling DeletePart:

 'Place the letters in a separate directory
Dim sourceFile = CurDir() & "\LetterTemplate.docx"
Dim letterDir = CurDir() & "\Letters\"
My.Computer.FileSystem.CreateDirectory(letterDir)

Dim mergeFile = letterDir & "AllLetters.docx"
My.Computer.FileSystem.CopyFile(sourceFile, mergeFile, True)

'Open the Document template using the Open XML SDK
Using wordDoc = WordprocessingDocument.Open(mergeFile, True)

    'get the main document part (document.xml)
    Dim mainPart = wordDoc.MainDocumentPart()

    'Delete the placeholder image from the template so we can replace it with the real photos
    mainPart.DeletePart("rId4")

This opens the Word document as read/write and then deletes the placeholder image. You should also notice at this point that the code is a bit simpler to open the package than yesterday because we don’t have to declare the relative URI’s or content types. The Open XML SDK helps us here. Next we need to load the document.xml into an XElement object and grab the body:

     Dim mainDocumentXML As XElement
    Using sr As New StreamReader(mainPart.GetStream)
        mainDocumentXML = XElement.Load(sr)
    End Using

    Dim mainBody = mainDocumentXML...<w:body>.First()
    Dim i = 0

Now we can loop through all the letters, grab the body of those XDocuments and then add an image part. We need to specify the content type and the relationship ID that must match what we embedded in the XML Literal in our query. Then we can either replace the main document body with the first letter (overwrite our placeholder data) otherwise we append the rest of the letter contents into the document:

     For Each letter In letters
        Dim nextBody = letter.Document...<w:body>.First()
        'Add the new imagePart. The SDK takes care of linking this properly.
        Dim imagePart = mainPart.AddImagePart("image/png", "rId" & letter.OrderID) 

        If i = 0 Then
           'Replace the first body contents in the template
           mainBody.ReplaceNodes(nextBody.Elements())
        Else
           'Otherwise append the next letter contents
           mainBody.Add(nextBody.Elements())
        End If

Next we need to feed the byte array data into the ImagePart. Note that since I’m using Northwind for this example there is an OLE header on those images that we must strip off that is 78 bytes. If you are using your own images stored in SQL Server you won’t have to do that:

         'Now feed the byte array into the ImagePart
        If letter.EmployeePhoto IsNot Nothing Then
            'Using ms As New System.IO.MemoryStream(letter.EmployeePhoto.ToArray)
            'NOTE: Northwind Photos have an OLE header on them that we must strip off.
            ' You don't have to do this if you are using your own SQL-server stored images
            Using ms As New System.IO.MemoryStream(letter.EmployeePhoto, 78, _
                                                   letter.EmployeePhoto.Length - 78)
                imagePart.FeedData(ms)
            End Using
        End If

        i += 1
    Next

Now that we’ve built up the document XML, added our image parts and linked them properly, the final step is to replace the document part with our XML just like before:

     'Replace the document part with our XML
    Using sw As New StreamWriter(mainPart.GetStream(FileMode.Create))
        mainDocumentXML.Save(sw)
    End Using

End Using

Now when we run the application you will see a single document with all the letters, including photos, that’s much easier to print than multiple files.

image

There’s Always Room for Improvement

LINQ & Northwind savvy folks may notice that I’m actually creating duplicates of the employee photos. Since I’m flattening out the data in my query to have 1 photo per order there could be an unnecessary amount of duplicate images. This can cause the document to get bloated. To be more efficient we should reuse the relationship ID’s to the images if they are the same on multiple orders. One approach would be to modify the LINQ query and Letter object to select the EmployeePhoto and then have a collection of order & document data under that. I’ll leave that experiment up to you :-)

Take a look at the full code listing for this example as well as the complete project with other Open XML examples here on Code Gallery.

Enjoy!