This post presents a custom application page in SharePoint that uses Open XML, the Open XML SDK and LINQ to XML to accept revisions, remove comments, and remove personal information from an Open XML word processing document.
An approach that has interesting possibilities would be to create SharePoint workflows that query and modify Open XML documents. For example, you could write some code that would ensure that no documents in a SharePoint document library have comments, revisions, or personal information. I’ll present this code in the future.
The most interesting characteristic of this code is that until Open XML, it would be very difficult or impossible to implement reliably on a SharePoint server. If you wanted to implement this for binary office files, it would require using a library for accessing those binary documents, and that library may not be suitable for deployment on a SharePoint server. I have heard of people installing a copy of Office on their server. This has huge performance implications; in addition, it is not in conformance of the 2007 Office system license agreement. But the approach presented in this post is clean, performs well, and has no adverse licensing implications.
To make my development as easy as possible, I created a class, OpenXmlInfo, which has methods to query and modify the Open XML document. I developed the class using a console application. Then, when the class was coded and debugged, it was a simple matter to use it in the SharePoint custom application page. The class contains six static methods; three to query a document, and three to modify a document. Here are the signatures of the methods:
public static bool InspectForComments(WordprocessingDocument document)
public static bool InspectForRevisions(WordprocessingDocument doc)
public static bool InspectForPersonalInfo(WordprocessingDocument document)
public static void RemoveComments(WordprocessingDocument document)
public static void AcceptRevisions(WordprocessingDocument doc)
public static void RemovePersonalInfo(WordprocessingDocument document)
These methods are based on code that is presented in the following blog posts:
The custom application page (OpenXmlInspector.aspx) contains a few C# methods. They are pretty straightforward.
To open the document using the Open XML SDK, the code does the following:
Once the code has the document in an SPFile object, it uses the OpenBinary() method to get a byte array that contains the document. However, the Open XML SDK needs a stream to instantiate the document, so the code creates a MemoryStream from the byte array.
The code then creates the WordprocessingDocument from the stream.
The feature adds a menu item (Inspect Open XML Document) to the ECB menu. (The ECB menu is the drop down menu that you get for each document in a document library.)
To use this code, you will need to install the Open XML SDK.
For a good screen-cast on creating a feature in SharePoint, see Ted Pattison’s web casts. In addition, there are some great resources at http://www.microsoft.com/click/SharePointDeveloper/.
Note that this code is just a demonstration of using the Open XML SDK and LINQ to XML to query and modify documents; you would want to make modifications to this code before deploying it in a production environment. For instance, the code does not validate that the document is an Open XML document before querying it. I’m presenting it as an example of the type of development that you can do using these technologies.
Code is attached.