Modifying Open XML Documents that are in SharePoint Document Libraries using Web Services

When using the Open XML SDK with SharePoint web services, one of the most basic operations is to get a document from a document library using web services, modify it using the Open XML SDK (and LINQ to XML), and save it back to the document library.  This post describes how to do this, and provides a sample in C#.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCIt is simple to extend this sample to iterate through all documents in a library, apply some changes to each one, and save them back.  In an upcoming post, I’ll present a sample to ‘sanitize’ (remove comments, accept revisions, and remove personal information) all documents in a document library.  This is pretty useful.  I keep a library of documents that I send externally as needed, and it’s always best to not have personal information embedded in the documents.  By running this upcoming sample, I can regularly check to make sure that the document library is clean, even if other folks are editing documents in the library.

For a brief tutorial on SharePoint web services, see “Getting Started with SharePoint (WSS) Web Services using LINQ to XML”.  For this example, you need to add two references to web services (both Lists and Copy).  The procedure for adding a reference to the Copy web service is the same as adding a reference to the Lists web service.

This code uses the Open XML SDK.  Remember to add a reference to the Open XML SDK assembly.  This code uses V1 of the SDK.  It should work with V2 CTP but I haven't tried it.

The code references the System.IO.FileFormatException class, which is in the WindowsBase assembly, so add a reference to it.

This code uses the technique of converting XmlNode to XElement (and back again), as detailed in “Convert XElement to XmlNode (and Convert XmlNode to XElement)”, so that we can use LINQ to XML instead of XmlDocument.

One important aspect of the code is that you retrieve the document as a byte array:

ModifyDoc.CopyWebService.FieldInformation[] fields;
byte[] byteArray;
copy.GetItem(url, out fields, out byteArray);

After retrieving the byte array, you can write the byte array to a MemoryStream, and use the MemoryStream to open an in-memory Open XML document.  After modifying the in-memory document, you can convert it back to a byte array and serialize back to the SharePoint document library.  The technique is described in the post, “Working with In-Memory Open XML Documents”.

Here is the code to serialize it back to the SharePoint document library:

string[] urls = { url };
ModifyDoc.CopyWebService.CopyResult[] copyResults;
copy.CopyIntoItems(url, urls, fields, mem.ToArray(), out copyResults);

Now that we’ve covered these basics, in the near future, I'll show using SharePoint web services and the Open XML SDK to do some more interesting stuff.

Here is the complete listing (the code is added as an attachment to this post):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;
using DocumentFormat.OpenXml.Packaging;

namespace ModifyDoc
{
public static class MyExtensions
{
public static XDocument GetXDocument(this OpenXmlPart part)
{
XDocument xdoc = part.Annotation<XDocument>();
if (xdoc != null)
return xdoc;
using (StreamReader sr = new StreamReader(part.GetStream()))
using (XmlReader xr = XmlReader.Create(sr))
xdoc = XDocument.Load(xr);
part.AddAnnotation(xdoc);
return xdoc;
}

public static void PutXDocument(this OpenXmlPart part)
{
XDocument xdoc = part.GetXDocument();
if (xdoc != null)
{
// Serialize the XDocument object back to the package.
using (XmlWriter xw =
XmlWriter.Create(part.GetStream
(FileMode.Create, FileAccess.Write)))
{
xdoc.Save(xw);
}
}
}

public static string StringConcatenate(
this IEnumerable<string> source)
{
return source.Aggregate(
new StringBuilder(),
(s, i) => s.Append(i),
s => s.ToString());
}

public static XElement GetXElement(this XmlNode node)
{
XDocument xDoc = new XDocument();
using (XmlWriter xmlWriter = xDoc.CreateWriter())
node.WriteTo(xmlWriter);
return xDoc.Root;
}

public static XmlNode GetXmlNode(this XElement element)
{
using (XmlReader xmlReader = element.CreateReader())
{
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(xmlReader);
return xmlDoc;
}
}
}

class Program
{
static void Main(string[] args)
{
string documentLibraryName = "Open XML Documents";
string documentName = "Test.docx";

XNamespace s = "https://schemas.microsoft.com/sharepoint/soap/";
XNamespace rs = "urn:schemas-microsoft-com:rowset";
XNamespace z = "#RowsetSchema";

// Make sure that you use the correct namespace, as well as the correct reference
// name. The namespace (by default) is the same as the name of the application
// when you created it. You specify the reference name in the Add Web Reference
// dialog box.
//
// Namespace Reference Name
// | |
// V V
ModifyDoc.ListsWebService.Lists lists =
new ModifyDoc.ListsWebService.Lists();

// Fix Namespace and Reference Name for the Copy web service too
ModifyDoc.CopyWebService.Copy copy =
new ModifyDoc.CopyWebService.Copy();

// Update the following URL to point to the Lists web service for
// your SharePoint site.
lists.Url = "https://localhost/_vti_bin/Lists.asmx";

lists.Credentials = System.Net.CredentialCache.DefaultCredentials;
copy.Credentials = System.Net.CredentialCache.DefaultCredentials;

XElement listCollection = lists.GetListCollection().GetXElement();

// get the node for the library that we want
XElement library = listCollection
.Elements(s + "List")
.Where(l => (string)l.Attribute("Title") == documentLibraryName)
.FirstOrDefault();

if (library == null)
{
Console.WriteLine("Library {0} doesn't exist.", documentLibraryName);
Environment.Exit(0);
}

// get the ID of the library
string libId = (string)library.Attribute("ID");

XElement item = GetItemByLinkFilename(lists, libId, documentName);

if (item == null)
{
Console.WriteLine("Document {0} doesn't exist.", documentName);
Environment.Exit(0);
}

// get the document from the doc library as a byte array
string url = item.Attribute("ows_EncodedAbsUrl").Value;

ModifyDoc.CopyWebService.FieldInformation[] fields;
byte[] byteArray;
copy.GetItem(url, out fields, out byteArray);

// create a memory stream from the byte array
using (MemoryStream mem = new MemoryStream())
{
mem.Write(byteArray, 0, (int)byteArray.Length);
try
{
// create a WordprocessingDocument from the memory stream
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(mem, true))
{
XNamespace w =
"https://schemas.openxmlformats.org/wordprocessingml/2006/main";

// modify the document as necessary
// for this example, we'll insert a simple paragraph at the
// beginning of the document
XDocument doc = wordDoc.MainDocumentPart.GetXDocument();
doc.Element(w + "document")
.Element(w + "body")
.AddFirst(
new XElement(w + "p",
new XElement(w + "r",
new XElement(w + "t", "Hello, there")
)
)
);

// write the XDocument back into the Open XML document
wordDoc.MainDocumentPart.PutXDocument();
}

// use the Copy web service to save the document back to the
// document library.
string[] urls = { url };
ModifyDoc.CopyWebService.CopyResult[] copyResults;
copy.CopyIntoItems(url, urls, fields, mem.ToArray(), out copyResults);
}
catch (System.IO.FileFormatException e)
{
// document is invalid
Console.WriteLine(e);
Environment.Exit(0);
}
}
}

private static XElement GetItemByLinkFilename(
ModifyDoc.ListsWebService.Lists lists, string libId,
string documentName)
{
XNamespace z = "#RowsetSchema";

// get the XElement for the row that contains info about the document
// that we want to modify
XElement queryOptions = new XElement("QueryOptions",
new XElement("Folder"),
new XElement("IncludeMandatoryColumns", false)
);
XElement viewFields = new XElement("ViewFields");
XElement item = lists.GetListItems(libId, "", null,
viewFields.GetXmlNode(), "", queryOptions.GetXmlNode(), "")
.GetXElement()
.Descendants(z + "row")
.Where(i => (string)i.Attribute("ows_LinkFilename") == documentName)
.FirstOrDefault();
return item;
}
}
}

ModDocument.cs