In my last few posts we’ve been building a simple Office Business Application (OBA) for the new Northwind Traders. If you missed them:
- OBA Part 1 – Exposing Line-of-Business Data
- OBA Part 2 – Building and Outlook Client against LOB Data
In this post I’m going to talk about how we can create a purchase order in Word 2007 that contains data about the items being purchased and how we can query that data and place it into our database. We’ll use this code as a basis for our SharePoint Workflow which we will build out in the next post.
If you recall our architecture diagram of our Northwind Traders OBA involved our Sales Reps submitting purchase orders as Word 2007 documents up to SharePoint which kicked off a workflow to parse the document and update the database with the order data through our data service.
This allows us to store the unstructured document on SharePoint and the structured order data in our database.
However before we build out the SharePoint Workflow we need a clean way to store and then retrieve the structured order data inside the Word 2007 document. Since Word 2007 documents are Open XML we can use the Open XML SDK and LINQ to XML to easily parse the document. (I’ve talked about how to manipulate documents with Open XML SDK before here.)
One way to store data in an Word 2007 document is by using content controls. These allow you to define specific data areas/fields in the document which are then bound to XML that is placed inside the document. When users enter data into these areas of the document the data is stored as a CustomXML Part inside the document. You can use Visual Studio to create content controls and map them to XML or you can use Word itself. There’s also a nifty tool called the Word 2007 Content Control Toolkit that makes the mapping more visual. I’d also highly recommend installing the VSTO Power Tools as well which includes a VS Add-In for manipulating Open XML documents. This allows you to look inside the document easily to inspect all the parts directly within Visual Studio.
So the first thing to do is to create a purchase order template and lay out the content controls on the document. We’ll create something very simple using Microsoft Word 2007. On the Developer tab you will see the Controls section. There you can choose which types of controls to lay out on the document. Click the properties button to assign a friendly title and tag to the control. Here I’ve laid out the minimum information we’ll need to submit an order to the system:
Users can write anything else around the content controls but the system only cares about capturing the data we’ve specified. This gives us the ability to store structured and unstructured data completely inside the .docx file.
Creating and Mapping the XML
Now we are ready to map the values of the content controls to some custom XML. The XML document for our order looks like this. (Note that there are 10 <OrderDetail> elements I just snipped them for brevity):
<OrderEntry xmlns="urn:microsoft:examples:oba"> <CustomerID /> <OrderDate /> <Shipper /> <OrderDetails> <OrderDetail> <ProductName /> <Quantity /> </OrderDetail>
<OrderDetail>... </OrderDetails> </OrderEntry>
Now open up the Word 2007 Content Control Toolkit an open the OrderEntryTemplate.docx. Under Actions select "Create a new Custom XML Part", switch to edit view, and then paste in the XML:
Next switch to Bind View and then drag the elements onto the content controls on the left. Make sure you select the element first and then drag it.
Once you’re done, save the document and then you can open it in Visual Studio if you’ve loaded the VSTO Power Tools. This will show the Open XML parts of the document and you can expand the customXml folder and see that our XML has been added to the document.
If you back into Word and fill out the content controls and then view the document in Visual Studio again, you will see that the item1.xml custom XML part will contain the data we entered.
Now that we have a purchase order template we can give this to our sales reps who can collaborate with our high volume customers via email to fill it out. They can then submit the purchase orders to a SharePoint list that can run a workflow to extract the order data and update the database through the data service.
Using the Open XML SDK to Retrieve the Order Data
The easiest way to programmatically manipulate Office 2007 Open XML documents is by using the Open XML SDK. Once you install it you can then Add Reference to the DocumentFormat.OpenXML assembly. In order to use LINQ to XML you’ll also need a reference to System.Core and System.Xml.Linq. These are imported automatically when you create a new project in Visual Studio 2008. You’ll also need to add a Service Reference to the ADO.NET Data Service like I’ve shown before. So let’s start simple and just create a console application for now called NorthwindOrderDocParser. Later we’ll talk about moving this to a SharePoint workflow.
Before we start parsing the document let’s create a couple simple classes that store the data we’re extracting from our document.
''' <summary> ''' These classes represents the order data that is inside the Word Document. ''' </summary> ''' <remarks></remarks> Public Class DocumentOrderData Sub New(ByVal customerID As String, ByVal orderDate As Date, ByVal shipperName As String) _CustomerID = customerID _OrderDate = orderDate _Shipper = shipperName End Sub Private _CustomerID As String Public Property CustomerID() As String Get Return _CustomerID End Get Set(ByVal value As String) _CustomerID = value End Set End Property Private _OrderDate As Date Public Property OrderDate() As Date Get Return _OrderDate End Get Set(ByVal value As Date) _OrderDate = value End Set End Property Private _Shipper As String Public Property Shipper() As String Get Return _Shipper End Get Set(ByVal value As String) If value Is Nothing OrElse value.Trim = "" Then value = "Speedy Express" End If _Shipper = value End Set End Property Private _details As New List(Of Detail) Public ReadOnly Property Details() As List(Of Detail) Get Return _details End Get End Property Public Class Detail Sub New(ByVal productName As String, ByVal quantity As Short) _ProductName = productName _Quantity = quantity End Sub Private _ProductName As String Public Property ProductName() As String Get Return _ProductName End Get Set(ByVal value As String) _ProductName = value End Set End Property Private _Quantity As Short Public Property Quantity() As Short Get Return _Quantity End Get Set(ByVal value As Short) _Quantity = value End Set End Property End Class End Class
Next, let’s add a schema for the OrderEntry XML data that is contained in the document. This will give us IntelliSense on our XML when we’re using LINQ to XML. We can just open the document in Visual Studio like before and copy the OrderEntry XML data into the clipboard. Then we can Add a new XML to Schema Item and paste into the Wizard’s dialog box. This will infer the schema and place the XSD file into the project automatically for us. Notice that I specified a namespace on our OrderEntry XML data. We now can import this namespace into our main program along with a few other .NET namespaces we’ll need:
'Reference to our data service and data entities: Imports NorthwindOrderDocParser.NorthwindService 'Open XML SDK: Imports DocumentFormat.OpenXml.Packaging Imports System.IO 'Default XML Namespace: Imports <xmlns="urn:microsoft:examples:oba">
We are almost ready to start writing our main program to parse the purchase order. First we need a test document. For this test I filled out the following information in a document called MyTestOrder.docx.
Now we can write our main program:
Module Module1 Sub Main() Try Dim docFile = My.Computer.FileSystem.GetFileInfo("MyTestOrder.docx") Dim docData As DocumentOrderData Using sr = docFile.OpenRead() 'Attempt to parse the document for order data docData = ParseOrderDocument(sr) sr.Close() End Using If docData IsNot Nothing Then Dim employeeEmail = "firstname.lastname@example.org" 'Attempt to add the order data through the service AddNewOrder(docData, employeeEmail) Console.WriteLine("Order saved successfully.") Else Console.WriteLine("No order data was found in the document.") End If Catch ex As Exception Console.WriteLine("Order could not be processed." & vbCrLf & ex.ToString()) End Try Console.ReadLine() End Sub
The ParseOrderDocument method is going to need to grab the XML data from our Custom XML parts as we iterate over the part collection. It’s a collection because there can actually be many Custom XML definitions in our document. In order to make grabbing the XML data from the parts easier let’s create an Extension method that extends the OpenXMLPart type. I like to place Extension methods in a separate file called Extensions.vb:
Imports DocumentFormat.OpenXml.Packaging Imports System.IO Imports System.Xml Module Extensions ' Create an extension method so we can easily access the part XML <System.Runtime.CompilerServices.Extension()> _ Function GetXDocument(ByVal part As OpenXmlPart) As XDocument Dim xdoc As XDocument Using sr As New StreamReader(part.GetStream()) xdoc = XDocument.Load(XmlReader.Create(sr)) sr.Close() End Using Return xdoc End Function End Module
Now we can go back to our main Module1 and add a the ParseOrderDocument method. Notice that I’m using the Extension method we created in the For Each part… loop to return the custom XML as an XDocument. Then I use the child axis property <OrderEntry> (displayed in IntelliSense as I type the query) to see if the element exists. Also notice that since I imported our XML namespace at the top of the file it will only return an <OrderEntry> element in that namespace. So we’re safe not to clash with other custom XML that may be added to the document by other processes.
''' <summary> ''' Attempts to parse the word document for order data and returns an order ''' object with all the required. The document must have a customXML part ''' that adheres to the OrderEntry.xsd ''' </summary> ''' <param name="docStream">The document to parse</param> ''' <returns>The order data contained in the document</returns> ''' <remarks></remarks> Function ParseOrderDocument(ByVal docStream As Stream) As DocumentOrderData Dim orderData As DocumentOrderData = Nothing Try 'Use the Open XML SDK to open the document and access parts easily Dim wordDoc = WordprocessingDocument.Open(docStream, False) Using wordDoc 'Get the main document part (document.xml) Dim mainPart = wordDoc.MainDocumentPart Dim docXML As XElement = Nothing 'Find the order data custom XML part For Each part In mainPart.CustomXmlParts docXML = part.GetXDocument.<OrderEntry>.FirstOrDefault() If docXML IsNot Nothing Then Exit For End If Next If docXML Is Nothing Then Throw New InvalidOperationException("This document does not contain order entry data.") End If 'Grab the order data fields from the XML Dim customerID = docXML.<CustomerID>.Value.Trim() Dim orderDate = docXML.<OrderDate>.Value.Trim() Dim shipper = docXML.<Shipper>.Value.Trim() If customerID <> "" AndAlso IsDate(orderDate) Then 'Create and fill the DocumentOrderData orderData = New DocumentOrderData(customerID, CDate(orderDate), shipper) For Each item In docXML.<OrderDetails>.<OrderDetail> 'Grab order details data fields Dim product = item.<ProductName>.Value.Trim() Dim quantity = item.<Quantity>.Value.Trim() If product <> "" AndAlso IsNumeric(quantity) Then 'Add a new DocumentOrderData.Detail for each product found orderData.Details.Add(New DocumentOrderData.Detail(product, CShort(quantity))) End If Next End If wordDoc.Close() End Using Catch ex As Exception Throw New InvalidOperationException("Could not process this document.", ex) End Try Return orderData End Function
Updating the Database through the Data Service
Now that we have our document parsed we’re just left with adding the data through our data service. What we need to do is query the reference data (entities) that we’ll need to properly associate on our Order. For instance Order will need a reference to the Customer, the Employee and the Shipper. Then each Order_Detail will need a reference to the Product entity. Notice that we’re passing the employee email address into this method so that we can associate the sales rep with the order. If you recall we had to add this field to the Customer and Employee tables in Northwind. (For this test program I’m hard-coding the value but later we’ll get this information from the Outlook client when it submits the order to SharePoint.)
Once we have these entities queried and returned from the service we can link them up properly and add our new Order and Order_Details to the data service. For more information on updating data and setting proper linkage to entities returned from an ADO.NET data service read this post and this one.
''' <summary> ''' Adds a new order through the ADO.NET Data service and sets up all the required ''' associations to related entities. ''' </summary> ''' <param name="docData">The order data</param> ''' <param name="employeeEmail">EmailAddress of sales representitve</param> ''' <remarks></remarks> Private Sub AddNewOrder(ByVal docData As DocumentOrderData, ByVal employeeEmail As String) Dim ctx As New NorthwindEntities(New Uri("http://localhost:1234/Northwind.svc/")) Dim cust As Customer Try 'Try to retrieve the customer cust = (From c In ctx.Customers _ Where c.CustomerID = docData.CustomerID).FirstOrDefault() Catch ex As Exception Throw New InvalidOperationException("Invalid customer ID.") End Try If cust IsNot Nothing Then Dim ship = (From s In ctx.Shippers _ Where s.CompanyName = docData.Shipper).FirstOrDefault() 'Email Address will come from our Outlook client/sales person Dim emp = (From e In ctx.Employees _ Where e.EmailAddress = employeeEmail).FirstOrDefault() Dim o As New Order() o.OrderDate = docData.OrderDate o.RequiredDate = Now.AddDays(2) o.ShipAddress = cust.Address o.ShipCity = cust.City o.ShipCountry = cust.Country o.ShipName = cust.ContactName o.ShipPostalCode = cust.PostalCode o.ShipRegion = cust.Region o.Freight = 25 ctx.AddToOrders(o) o.Customer = cust ctx.SetLink(o, "Customer", cust) If ship IsNot Nothing Then o.Shipper = ship ctx.SetLink(o, "Shipper", ship) End If If emp IsNot Nothing Then o.Employee = emp ctx.SetLink(o, "Employee", emp) End If o.Order_Details = New System.Collections.ObjectModel.Collection(Of Order_Detail) For Each item In docData.Details Dim productName = item.ProductName.ToLower() Dim product = (From p In ctx.Products _ Where p.ProductName.ToLower() = productName).FirstOrDefault() If product IsNot Nothing Then 'Create a detail for each product being ordered Dim detail As New Order_Detail() o.Order_Details.Add(detail) detail.Quantity = item.Quantity detail.UnitPrice = If(product.UnitPrice.HasValue, _ product.UnitPrice.Value, 1D) ctx.AddToOrder_Details(detail) detail.Product = product ctx.SetLink(detail, "Product", product) detail.Order = o ctx.SetLink(detail, "Order", o) ctx.AddLink(o, "Order_Details", detail) End If Next 'Saving in Batch mode will update the data inside a database transaction 'This will throw an exception if the service can't save the Order ctx.SaveChanges(Services.Client.SaveChangesOptions.Batch) End If End Sub End Module
When we run this program we will see that Customer ALFKI now has a new Order and 4 Order Details entered into the database. Since we’re sending the updates in Batch mode this will cause our order data to be properly wrapped in a database transaction.
Next post we’ll talk about how we can create a SharePoint workflow to run this code when order documents are added to a SharePoint list. However, if SharePoint is not a requirement of your system (maybe you have no need to collaborate on documents or store this unstructured data) you could easily add this code directly to the Outlook client we built in the previous post.
I updated the sample on Code Gallery with this project so have a look.