OBA Part 3 – Storing and Reading Data in Word Documents

In my last few posts we’ve been building a simple Office Business Application (OBA) for the new Northwind Traders. If you missed them:

In this post I’m going to talk about how we can create a purchase order in Word 2007 that contains data about the items being purchased and how we can query that data and place it into our database. We’ll use this code as a basis for our SharePoint Workflow which we will build out in the next post.


If you recall our architecture diagram of our Northwind Traders OBA involved our Sales Reps submitting purchase orders as Word 2007 documents up to SharePoint which kicked off a workflow to parse the document and update the database with the order data through our data service.

This allows us to store the unstructured document on SharePoint and the structured order data in our database.

However before we build out the SharePoint Workflow we need a clean way to store and then retrieve the structured order data inside the Word 2007 document. Since Word 2007 documents are Open XML we can use the Open XML SDK and LINQ to XML to easily parse the document. (I’ve talked about how to manipulate documents with Open XML SDK before here.)

Content Controls

One way to store data in an Word 2007 document is by using content controls. These allow you to define specific data areas/fields in the document which are then bound to XML that is placed inside the document. When users enter data into these areas of the document the data is stored as a CustomXML Part inside the document. You can use Visual Studio to create content controls and map them to XML or you can use Word itself. There’s also a nifty tool called the Word 2007 Content Control Toolkit that makes the mapping more visual. I’d also highly recommend installing the VSTO Power Tools as well which includes a VS Add-In for manipulating Open XML documents. This allows you to look inside the document easily to inspect all the parts directly within Visual Studio.

So the first thing to do is to create a purchase order template and lay out the content controls on the document. We’ll create something very simple using Microsoft Word 2007. On the Developer tab you will see the Controls section. There you can choose which types of controls to lay out on the document. Click the properties button to assign a friendly title and tag to the control. Here I’ve laid out the minimum information we’ll need to submit an order to the system:


Users can write anything else around the content controls but the system only cares about capturing the data we’ve specified. This gives us the ability to store structured and unstructured data completely inside the .docx file.

Creating and Mapping the XML

Now we are ready to map the values of the content controls to some custom XML. The XML document for our order looks like this. (Note that there are 10 <OrderDetail> elements I just snipped them for brevity):

<OrderEntry xmlns="urn:microsoft:examples:oba">
  <CustomerID />
  <OrderDate />
  <Shipper />
      <ProductName />
      <Quantity />
</OrderDetails> </OrderEntry>

Now open up the Word 2007 Content Control Toolkit an open the OrderEntryTemplate.docx. Under Actions select "Create a new Custom XML Part", switch to edit view, and then paste in the XML:


Next switch to Bind View and then drag the elements onto the content controls on the left. Make sure you select the element first and then drag it.


Once you’re done, save the document and then you can open it in Visual Studio if you’ve loaded the VSTO Power Tools. This will show the Open XML parts of the document and you can expand the customXml folder and see that our XML has been added to the document.


If you back into Word and fill out the content controls and then view the document in Visual Studio again, you will see that the item1.xml custom XML part will contain the data we entered.

Now that we have a purchase order template we can give this to our sales reps who can collaborate with our high volume customers via email to fill it out. They can then submit the purchase orders to a SharePoint list that can run a workflow to extract the order data and update the database through the data service.

Using the Open XML SDK to Retrieve the Order Data

The easiest way to programmatically manipulate Office 2007 Open XML documents is by using the Open XML SDK. Once you install it you can then Add Reference to the DocumentFormat.OpenXML assembly. In order to use LINQ to XML you’ll also need a reference to System.Core and System.Xml.Linq. These are imported automatically when you create a new project in Visual Studio 2008. You’ll also need to add a Service Reference to the ADO.NET Data Service like I’ve shown before. So let’s start simple and just create a console application for now called NorthwindOrderDocParser. Later we’ll talk about moving this to a SharePoint workflow.

Before we start parsing the document let’s create a couple simple classes that store the data we’re extracting from our document.

''' <summary>
''' These classes represents the order data that is inside the Word Document. 
''' </summary>
''' <remarks></remarks>
Public Class DocumentOrderData

    Sub New(ByVal customerID As String, ByVal orderDate As Date, ByVal shipperName As String)
        _CustomerID = customerID
        _OrderDate = orderDate
        _Shipper = shipperName
    End Sub

    Private _CustomerID As String
    Public Property CustomerID() As String
            Return _CustomerID
        End Get
        Set(ByVal value As String)
            _CustomerID = value
        End Set
    End Property

    Private _OrderDate As Date
    Public Property OrderDate() As Date
            Return _OrderDate
        End Get
        Set(ByVal value As Date)
            _OrderDate = value
        End Set
    End Property

    Private _Shipper As String
    Public Property Shipper() As String
            Return _Shipper
        End Get
        Set(ByVal value As String)
            If value Is Nothing OrElse value.Trim = "" Then
                value = "Speedy Express"
            End If
            _Shipper = value
        End Set
    End Property

    Private _details As New List(Of Detail)
    Public ReadOnly Property Details() As List(Of Detail)
            Return _details
        End Get
    End Property

    Public Class Detail
        Sub New(ByVal productName As String, ByVal quantity As Short)
            _ProductName = productName
            _Quantity = quantity
        End Sub

        Private _ProductName As String
        Public Property ProductName() As String
                Return _ProductName
            End Get
            Set(ByVal value As String)
                _ProductName = value
            End Set
        End Property

        Private _Quantity As Short
        Public Property Quantity() As Short
                Return _Quantity
            End Get
            Set(ByVal value As Short)
                _Quantity = value
            End Set
        End Property
    End Class
End Class

Next, let’s add a schema for the OrderEntry XML data that is contained in the document. This will give us IntelliSense on our XML when we’re using LINQ to XML. We can just open the document in Visual Studio like before and copy the OrderEntry XML data into the clipboard. Then we can Add a new XML to Schema Item and paste into the Wizard’s dialog box. This will infer the schema and place the XSD file into the project automatically for us. Notice that I specified a namespace on our OrderEntry XML data. We now can import this namespace into our main program along with a few other .NET namespaces we’ll need:

'Reference to our data service and data entities:
Imports NorthwindOrderDocParser.NorthwindService
'Open XML SDK:
Imports DocumentFormat.OpenXml.Packaging
Imports System.IO
'Default XML Namespace:
Imports <xmlns="urn:microsoft:examples:oba">

We are almost ready to start writing our main program to parse the purchase order. First we need a test document. For this test I filled out the following information in a document called MyTestOrder.docx.


Now we can write our main program:

Module Module1

    Sub Main()
            Dim docFile = My.Computer.FileSystem.GetFileInfo("MyTestOrder.docx")

            Dim docData As DocumentOrderData
            Using sr = docFile.OpenRead()
                'Attempt to parse the document for order data
                docData = ParseOrderDocument(sr)
            End Using

            If docData IsNot Nothing Then
                Dim employeeEmail = "sales@nwtraders.com"

                'Attempt to add the order data through the service
                AddNewOrder(docData, employeeEmail)
                Console.WriteLine("Order saved successfully.")
                Console.WriteLine("No order data was found in the document.")
            End If

        Catch ex As Exception
            Console.WriteLine("Order could not be processed." & vbCrLf & ex.ToString())
        End Try

    End Sub

The ParseOrderDocument method is going to need to grab the XML data from our Custom XML parts as we iterate over the part collection. It’s a collection because there can actually be many Custom XML definitions in our document. In order to make grabbing the XML data from the parts easier let’s create an Extension method that extends the OpenXMLPart type. I like to place Extension methods in a separate file called Extensions.vb:

Imports DocumentFormat.OpenXml.Packaging
Imports System.IO
Imports System.Xml

Module Extensions

    ' Create an extension method so we can easily access the part XML
    <System.Runtime.CompilerServices.Extension()> _
    Function GetXDocument(ByVal part As OpenXmlPart) As XDocument
        Dim xdoc As XDocument

        Using sr As New StreamReader(part.GetStream())
            xdoc = XDocument.Load(XmlReader.Create(sr))
        End Using

        Return xdoc
    End Function
End Module

Now we can go back to our main Module1 and add a the ParseOrderDocument method. Notice that I’m using the Extension method we created in the For Each part… loop to return the custom XML as an XDocument. Then I use the child axis property <OrderEntry> (displayed in IntelliSense as I type the query) to see if the element exists. Also notice that since I imported our XML namespace at the top of the file it will only return an <OrderEntry> element in that namespace. So we’re safe not to clash with other custom XML that may be added to the document by other processes.

''' <summary>
''' Attempts to parse the word document for order data and returns an order
'''  object with all the required. The document must have a customXML part 
'''  that adheres to the OrderEntry.xsd
''' </summary>
''' <param name="docStream">The document to parse</param>
''' <returns>The order data contained in the document</returns>
''' <remarks></remarks>
Function ParseOrderDocument(ByVal docStream As Stream) As DocumentOrderData
    Dim orderData As DocumentOrderData = Nothing

        'Use the Open XML SDK to open the document and access parts easily
        Dim wordDoc = WordprocessingDocument.Open(docStream, False)

        Using wordDoc
            'Get the main document part (document.xml)
            Dim mainPart = wordDoc.MainDocumentPart
            Dim docXML As XElement = Nothing

            'Find the order data custom XML part
            For Each part In mainPart.CustomXmlParts
                docXML = part.GetXDocument.<OrderEntry>.FirstOrDefault()
                If docXML IsNot Nothing Then
                    Exit For
                End If

            If docXML Is Nothing Then
                Throw New InvalidOperationException("This document does not contain order entry data.")
            End If

            'Grab the order data fields from the XML 
            Dim customerID = docXML.<CustomerID>.Value.Trim()
            Dim orderDate = docXML.<OrderDate>.Value.Trim()
            Dim shipper = docXML.<Shipper>.Value.Trim()

            If customerID <> "" AndAlso IsDate(orderDate) Then
                'Create and fill the DocumentOrderData  
                orderData = New DocumentOrderData(customerID, CDate(orderDate), shipper)

                For Each item In docXML.<OrderDetails>.<OrderDetail>
                    'Grab order details data fields
                    Dim product = item.<ProductName>.Value.Trim()
                    Dim quantity = item.<Quantity>.Value.Trim()

                    If product <> "" AndAlso IsNumeric(quantity) Then
                        'Add a new DocumentOrderData.Detail for each product found
                        orderData.Details.Add(New DocumentOrderData.Detail(product, CShort(quantity)))
                    End If

            End If
        End Using

    Catch ex As Exception
        Throw New InvalidOperationException("Could not process this document.", ex)
    End Try

    Return orderData
End Function

Updating the Database through the Data Service

Now that we have our document parsed we’re just left with adding the data through our data service.  What we need to do is query the reference data (entities) that we’ll need to properly associate on our Order. For instance Order will need a reference to the Customer, the Employee and the Shipper. Then each Order_Detail will need a reference to the Product entity. Notice that we’re passing the employee email address into this method so that we can associate the sales rep with the order. If you recall we had to add this field to the Customer and Employee tables in Northwind. (For this test program I’m hard-coding the value but later we’ll get this information from the Outlook client when it submits the order to SharePoint.)

Once we have these entities queried and returned from the service we can link them up properly and add our new Order and Order_Details to the data service. For more information on updating data and setting proper linkage to entities returned from an ADO.NET data service read this post and this one.

    ''' <summary>
    ''' Adds a new order through the ADO.NET Data service and sets up all the required
    ''' associations to related entities.
    ''' </summary>
    ''' <param name="docData">The order data</param>
    ''' <param name="employeeEmail">EmailAddress of sales representitve</param>
    ''' <remarks></remarks>
    Private Sub AddNewOrder(ByVal docData As DocumentOrderData, ByVal employeeEmail As String)

        Dim ctx As New NorthwindEntities(New Uri("http://localhost:1234/Northwind.svc/"))
        Dim cust As Customer
            'Try to retrieve the customer
            cust = (From c In ctx.Customers _
                     Where c.CustomerID = docData.CustomerID).FirstOrDefault()

        Catch ex As Exception
            Throw New InvalidOperationException("Invalid customer ID.")
        End Try

        If cust IsNot Nothing Then

            Dim ship = (From s In ctx.Shippers _
                        Where s.CompanyName = docData.Shipper).FirstOrDefault()

            'Email Address will come from our Outlook client/sales person 
            Dim emp = (From e In ctx.Employees _
                       Where e.EmailAddress = employeeEmail).FirstOrDefault()

            Dim o As New Order()

            o.OrderDate = docData.OrderDate
            o.RequiredDate = Now.AddDays(2)
            o.ShipAddress = cust.Address
            o.ShipCity = cust.City
            o.ShipCountry = cust.Country
            o.ShipName = cust.ContactName
            o.ShipPostalCode = cust.PostalCode
            o.ShipRegion = cust.Region
            o.Freight = 25


            o.Customer = cust
            ctx.SetLink(o, "Customer", cust)

            If ship IsNot Nothing Then
                o.Shipper = ship
                ctx.SetLink(o, "Shipper", ship)
            End If

            If emp IsNot Nothing Then
                o.Employee = emp
                ctx.SetLink(o, "Employee", emp)
            End If

            o.Order_Details = New System.Collections.ObjectModel.Collection(Of Order_Detail)

            For Each item In docData.Details
                Dim productName = item.ProductName.ToLower()
                Dim product = (From p In ctx.Products _
                               Where p.ProductName.ToLower() = productName).FirstOrDefault()

                If product IsNot Nothing Then
                    'Create a detail for each product being ordered
                    Dim detail As New Order_Detail()

                    detail.Quantity = item.Quantity
                    detail.UnitPrice = If(product.UnitPrice.HasValue, _
                                          product.UnitPrice.Value, 1D)


                    detail.Product = product
                    ctx.SetLink(detail, "Product", product)

                    detail.Order = o
                    ctx.SetLink(detail, "Order", o)

                    ctx.AddLink(o, "Order_Details", detail)
                End If
            'Saving in Batch mode will update the data inside a database transaction
            'This will throw an exception if the service can't save the Order
        End If
    End Sub
End Module

When we run this program we will see that Customer ALFKI now has a new Order and 4 Order Details entered into the database. Since we’re sending the updates in Batch mode this will cause our order data to be properly wrapped in a database transaction.


Next post we’ll talk about how we can create a SharePoint workflow to run this code when order documents are added to a SharePoint list. However, if SharePoint is not a requirement of your system (maybe you have no need to collaborate on documents or store this unstructured data) you could easily add this code directly to the Outlook client we built in the previous post.

I updated the sample on Code Gallery with this project so have a look.