Comparison of altChunk to the DocumentBuilder Class

There are two ways to assemble multiple Open XML word processing documents into a single document:

This post compares and contrasts these two approaches.

altChunk Relies on the Consuming Application to do the Document Assembly

The biggest difference between the two approaches is that the altChunk technique relies on the consuming application to merge the documents.  The gist of the altChunk approach is that you embed the entire source document (whether it is another Open XML document, an HTML document, or simply text) as a binary part in the new document.  You then insert altChunk markup that refers to the binary part at the desired place where you want the inserted document to end up.  Compliant consuming applications are not required to recognize altChunk.  Novel Open Office, which can consume Open XML documents, doesn’t process altChunk.  In contrast, DocumentBuilder assembles a new document that contains basic Open XML word processing markup that can be consumed by a wide variety of consuming application, such as Novel Open Office, the Open XML Document Viewer, and more.  I’ve written a wide variety of Open XML SDK / LINQ to XML examples that consume Open XML word processing documents, and those examples don’t process the altChunk element.

altChunk Inserts Entire Documents

Another difference is that altChunk can only insert an entire document at a specified point.  You can’t pick and choose a subset of a source document to insert into the newly assembled document.  In contrast, DocumentBuilder allows you to specify a range of content from the source document to be inserted into the newly assembled document.  For example, you can assemble a new document from paragraphs 1-3 from one document, paragraphs 5-8 from a second document, and the entire contents of a third document.  Technically, with DocumentBuilder, you are not specifying paragraphs from the source document – you specify a range of child elements of the <w:body> element.  Child elements can include tables and content controls, for instance.

Inserting the entire document can be an issue when you require fine-grained control of the newly assembled document.  For instance, in Word 2007, you can’t create a document that contains just a table.  The document will also always contain an empty paragraph that follows the table.  This means that if you are merging two documents that each contains just a table, the resulting document will contain a blank paragraph between the two tables.  This was one of the issues called out by a number of folks who are using altChunk, as seen in comments on my altChunk post.

altChunk Code is Simpler

As you can see from the above mentioned blog posts, the code that you need to write to take advantage of altChunk is pretty simple.  In contrast, DocumentBuilder contains about 1000 lines of source code to resolve issues of interrelated markup.

altChunk can Merge HTML Documents

You can use altChunk to convert HTML to Open XML word processing documents.  In contrast, DocumentBuilder has no capabilities for working with HTML markup – you can only assemble multiple Open XML word processing documents into a new document.

Performance Differences

If you are assembling a large number documents using altChunk, the creation of the new document will be very fast, and the main document part of the assembled document will be very small – the body of the document consists of an altChunk element for each document.  Then, when opening in Microsoft Word 2007, it will take a bit longer than normal.  While opening, if the documents are large enough to take time to import, you can see a progress bar repeatedly iterate from 0% - 100% for each imported document at the bottom of the task window of Word.  When assembling a document using DocumentBuilder, the assembling will take a bit longer – processing time is proportional to the total size of imported chunks.  Then when Word opens the document, normal opening times apply.