The Easy Way to Assemble Multiple Word Documents

One of the most common requests we hear related to word processing documents is the ability to merge multiple documents into a single document. Today, I am going to show you how to leverage altChunks and version 2 of the Open XML SDK to easily create a robust document assembly solution in less than a thirty lines of code.

Scenario – Document Assembly

Imagine a scenario where I'm a developer for a book publisher company that specializes in education based books. In my company, we typically have one or more authors write content for a specific chapter within a given book. Each of these chapters is written as a separate document. In this case, my company wants to write a book on the solar system, where the book is divided into chapters that correspond to unique components of the solar system, like the different planets and the sun. My company has asked me to write a solution that will be able to merge all these documents, each representing specific chapters, into one final document or book.

Solution

Before I get into the details of my solution I want to talk about the two different approaches I can take to solve this problem:

  1. Use altChunks to merge documents together
  2. Manually merge documents together

By far the first option of using altChunks is the easiest method for merging multiple documents together. I think of altChunks as the "easy button" when it comes to importing external files into a document. Not only can altChunks import other WordprocessingML documents, but it can also import html, xml, rtf, or plain text.

Manually merging multiple documents together is feasible, but requires you to handle a number of issues. For example, you will need to manually merge and deal with conflicts related to styles, bullets and numbering, comments, headers and footers, etc. Perhaps sometime in the future I will write a series of posts talking about how to merge documents manually.

Eric White has already written a blog post on how to use altChunks for document assembly using version 1 of the Open XML SDK. My post will talk about using version 2 of the SDK.

If you just want to jump straight into the code, feel free to download this solution here.

Step 1 – Create a Template

For those of you who have read my previous posts you will know that setting up the right template is the first, and probably the most important, step in creating an Open XML format solution. This scenario is no exception.

The best way to accomplish this scenario is to create a template that represents the final look of the book I want to create. In this template I will merge a specific chapter in a specific location within the template. I can accomplish this task by taking advantage of content controls. Content controls provide an easy mechanism for specifying semantic regions within a document. In other words, content controls allow me to uniquely identify a specific region within a document.

In this case, I am going to add content controls within my template document that have the name of the chapter I want to add at that location. For example, as shown in the screenshot below, I have a content control that has the name "Earth." This name indicates that the chapter titled "Earth" needs to be merged in this location of the template.

Step 2 – Find Specific Content Controls

Now that I have setup the template I need to programmatically locate content controls based on the alias or name of the content control, which represents the title of the chapter I want to merge. This task is pretty easy with version 2 of the SDK. Once I open a Word processing document I can find all content controls, represented as SdtBlock, that have an alias value set to the source file I want to merge into the template with the following code:

MergeSourceDocument(string sourceFile, string destinationFile) { using (WordprocessingDocument myDoc = WordprocessingDocument.Open(destinationFile, true)) { MainDocumentPart mainPart = myDoc.MainDocumentPart; //Find content controls that have the name of the source file as // an Alias value List<SdtBlock> sdtList = mainPart.Document.Descendants<SdtBlock>() .Where(s => sourceFile .Contains(s.SdtProperties.GetFirstChild<Alias>().Val.Value)).ToList(); ... } }

Step 3 – Add altChunk and Swap Out Content Control

I now need to swap out the found content control with the actual document I want to merge using altChunks. Merging documents using altChunks is pretty easy and consists of the following tasks:

  1. Add an altChunk part to the package
  2. Feed data from the intended merged document into the altChunk part
  3. Add altChunk reference in the main document part

The following code accomplishes those three tasks as well as the task to swap out the content control for the altChunk:

if (sdtList.Count != 0) { string altChunkId = "AltChunkId" + id; id++; AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart( AlternativeFormatImportPartType.WordprocessingML, altChunkId); chunk.FeedData(File.Open(sourceFile, FileMode.Open)); AltChunk altChunk = new AltChunk(); altChunk.Id = altChunkId; //Replace content control with altChunk information foreach (SdtBlock sdt in sdtList) { OpenXmlElement parent = sdt.Parent; parent.InsertAfter(altChunk, sdt); sdt.Remove(); } ... }

End Result

Putting everything together and running my code, I will end up with a solar system book that is broken down into chapters representing unique components of the solar system. Using altChunks automatically ensures the following:

  • Final document has consistent styles applied
  • Images, comments, tracked changes, etc. are all included as part of my merged document
  • Bullets and numbering just works

Here is a screenshot of the final solar system document:

[updated 1/9 due to bug in code - SdtProperties.GetFirstChild<Alias>() is the correct syntax]  

Zeyad Rajabi