Packages and Parts

(Update January 7, 2010 - I've written an MSDN article: Essentials of Open Packaging Conventions, which gives a fairly comprehensive overview of them.)

 My first thought when reading about packages is that I wanted a more LINQ to XML friendly way to deal with an Open XML package. I wanted an object graph that contains all the relationships and all the parts. I wanted all XML parts to be read into LINQ to XML trees that are part of my object graph. A fair number of queries need to access multiple parts simultaneously, so if we read the entire document into memory in this fashion, we can write queries over all parts using LINQ to XML to our heart’s content. Another page on this blog shows an implementation of a class that does this. I plan to extend and improve on this class (and other similar ones) over the next few months.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCOn this page, I am going to present a bit of information about packages and parts. Primarily, I assembled this information for myself, to teach myself about packages and parts. I figured that if it was helpful to me, it might be to others as well.

Following is a short list that contains most of what you need to know about packages and parts:

·         In the Package (the zip file), there is a collection of PackagePart objects (the files).

·         You could iterate through the parts themselves, but the physical name and location of the parts in the package is not defined in the spec and therefore shouldn’t be used directly. Instead, you iterate through relationships (through the Package.Relationships method, which returns a PackageRelationshipCollection), and access each PackagePart through a PackageRelationship object.

·         The Package itself contains a collection of relationships (PackageRelationshipCollection), and any individual PackagePart can also contain a collection of relationships (also PackageRelationshipCollection). The main document body will be a part that is referred to by the PackageRelationshipCollection collection in the Package. For a WordprocessingML document, the PackagePart for the main document body will have a collection of relationships (the PackageRelationshipCollection) that contain such things as styles, themes, footnotes, and custom XML.

·         Also, a PackageRelationship has a RelationshipType string that will be one of a known set of values. You can use this string to find a particular relationship.

·         In the zip file, there is a file named “[Content_Types].xml” that contains the types of content that are allowed in the package. This is technically not a part, since it has no PackageRelationship that points to it. Unlike the parts themselves, the [Content_Types].xml item must have a specific hard-coded name and location, so that a program consuming the file can find it first. But when using the classes in System.IO.Packaging, you don’t deal very much with the contents of this file; this file is updated automatically when you create relationships, and you never explicitly refer to this file when iterating through relationships. But it is there if you were to look directly in the zip file.

·         As you iterate through relationships, you need to assemble the Uri to the related part using the SourceUri and TargetUri property. The first example below shows how to do this.

·         Once you have assembled the correct Uri for the related part, you can get the PackagePart from the Package.

·         Once you have the PackagePart, you can then iterate through that part’s PackageRelationshipCollection.

·         You can also retrieve the ContentType property from the PackagePart, which will tell us whether the PackagePart contains XML . The example contains a HashSet<string> that contains a list of known content types that contain XML, and before the XDocument is loaded, the code verifies that the content type exists in the HashSet.

·         Also, once you have the part, you can get a stream to read the part. And from that stream, you can create an XmlReader, and from that XmlReader, you can instantiate a LINQ to XML tree.

·         The classes in the System.IO.Packaging namespace are in the WindowsBase assembly. Don’t forget to add the reference to this assembly or you won’t be able to compile.

Example

The following example shows about the simplest use of the System.IO.Packaging classes to retrieve the relationships in an Open XML package. The main purpose of this snippet is to show the mechanics of accessing the stuff in an Open XML package.

Note that the Package class implements IDisposable, hence the using statement.

The easiest way to write the code that dumps all relationships in an Open XML package is as a recursive function. With each level of recursion, the indent level is incremented, resulting in indented output that is easy to read.

You can point this app at any Open XML document (WordprocessingML, SpreadsheetML, or PresentationML), and it will dump a bunch of information to the console.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;
using System.IO.Packaging;
class Program
{
private static void DumpRelationshipCollection(
PackageRelationshipCollection prc, int indentLevel)
{
string indent = "".PadRight(indentLevel * 2);
foreach (var r in prc)
{
Console.WriteLine("{0}Id: {1}", indent, r.Id);
Console.WriteLine("{0}RelationshipType: {1}", indent,
r.RelationshipType);
Console.WriteLine("{0}SourceUri: {1}", indent, r.SourceUri);
Console.WriteLine("{0}TargetUri: {1}", indent, r.TargetUri);
Console.WriteLine("{0}TargetMode: {1}", indent, r.TargetMode);
if (r.TargetMode == TargetMode.Internal)
{
// assemble the Uri for the part
Uri partUri = PackUriHelper.ResolvePartUri(
new Uri(r.SourceUri.ToString(), UriKind.Relative),
r.TargetUri);
Console.WriteLine("{0}PartUri: {1}", indent, partUri);
// get the part from the package, given the Uri
PackagePart part = r.Package.GetPart(partUri);
Console.WriteLine("{0}ContentType: {1}", indent,
part.ContentType);
// if the part contains XML
if (XmlContentTypes.Contains(part.ContentType))
{
// read the part into an XDocument, and
// calculate the count of the nodes
int nodeCount =
XDocument.Load(
XmlReader.Create(part.GetStream())
).DescendantNodes().Count();
Console.WriteLine(
"{0}XDocument descendant node count: {1}",
indent, nodeCount);
}
Console.WriteLine();
// if the part has relationships,
// then dump them recursively
DumpRelationshipCollection(
part.GetRelationships(), indentLevel + 1);
}
else
Console.WriteLine();
}
}
static void Main(string[] args)
{
string filename = "Test.docx";
//string filename = "OfficeXMLMarkupExplained_en.docx";
//string filename = "Book1.xlsx";
using (Package p = Package.Open(
filename, FileMode.Open, FileAccess.Read))
{
DumpRelationshipCollection(p.GetRelationships(), 0);
}
}
public static HashSet<string> XmlContentTypes = new HashSet<string>
{
"application/vnd.openxmlformats-officedocument.custom-properties+xml",
"application/vnd.openxmlformats-officedocument.customXmlProperties+xml",
"application/vnd.openxmlformats-officedocument.drawing+xml",
"application/vnd.openxmlformats-officedocument.drawingml.chart+xml",
"application/vnd.openxmlformats-officedocument.drawingml.chartshapes+xml",
"application/vnd.openxmlformats-officedocument.drawingml.diagramColors+xml",
"application/vnd.openxmlformats-officedocument.drawingml.diagramData+xml",
"application/vnd.openxmlformats-officedocument.drawingml.diagramLayout+xml",
"application/vnd.openxmlformats-officedocument.drawingml.diagramStyle+xml",
"application/vnd.openxmlformats-officedocument.extended-properties+xml",
"application/vnd.openxmlformats-officedocument.presentationml.commentAuthors+xml",
"application/vnd.openxmlformats-officedocument.presentationml.comments+xml",
"application/vnd.openxmlformats-officedocument.presentationml.handoutMaster+xml",
"application/vnd.openxmlformats-officedocument.presentationml.notesMaster+xml",
"application/vnd.openxmlformats-officedocument.presentationml.notesSlide+xml",
"application/vnd.openxmlformats-officedocument.presentationml.presentation.main+xml",
"application/vnd.openxmlformats-officedocument.presentationml.presentationProperties+xml",
"application/vnd.openxmlformats-officedocument.presentationml.slide+xml",
"application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml",
"application/vnd.openxmlformats-officedocument.presentationml.slideMaster+xml",
"application/vnd.openxmlformats-officedocument.presentationml.slideshow.main+xml",
"application/vnd.openxmlformats-officedocument.presentationml.slideUpdateInfo+xml",
"application/vnd.openxmlformats-officedocument.presentationml.tableStyles+xml",
"application/vnd.openxmlformats-officedocument.presentationml.tags+xml",
"application/vnd.openxmlformats-officedocument.presentationml.template.main+xml",
"application/vnd.openxmlformats-officedocument.presentationml.viewProps+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.calcChain+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.chartsheet+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.comments+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.connections+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.dialogsheet+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.externalLink+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.pivotCacheDefinition+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.pivotCacheRecords+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.pivotTable+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.queryTable+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.revisionHeaders+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.revisionLog+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheetMetadata+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.table+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.tableSingleCells+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.userNames+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.volatileDependencies+xml",
"application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml",
"application/vnd.openxmlformats-officedocument.theme+xml",
"application/vnd.openxmlformats-officedocument.themeOverride+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document.glossary+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.endnotes+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.footnotes+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.header+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.webSettings+xml",
"application/vnd.openxmlformats-package.core-properties+xml",
"application/vnd.openxmlformats-package.digital-signature-xmlsignature+xml",
"application/xml"
};
}

Now that we have the mechanics down, we can do the fun stuff – pull it all into memory in a LINQ friendly way using a class that is implemented in a Functional Programming (FP) style. And then we can write a few queries.