How to Use altChunk for Document Assembly


Merging multiple word processing documents into a single document is something that many people want to do.  An application built for attorneys might assemble selected standard clauses into a contract.  An application built for book publishers can assemble chapters of a book into a single document.  This post explains the semantics of the altChunk element, and provides some code using the Open XML SDK that shows how to use altChunk.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCInstead of using altChunk, you could write a program to merge the Open XML markup for documents.  You would need to deal with a number of issues, including merging style sheets and resolving conflicting styles, merging the comments from all of the documents, merging bookmarks, and more.  This is doable, but it’s a lot of work.  You can use altChunk to let Word 2007 do the heavy lifting for you.

altChunk is a powerful technique.  It’s a tool that should be in every Open XML developer’s toolbox.  In an upcoming post, I’ll show an example of the use of altChunk in a SharePoint application.  You can create compelling document assembly solutions in SharePoint using altChunk.

Overview of the altChunk Markup

The altChunk markup tells the consuming application to import content into the document.  This behavior is not required for a conforming application – a conforming application is free to ignore the altChunk markup.  However, the standard recommends that if the application ignores the altChunk markup, it should notify the user.  Word 2007 supports altChunk.

To use altChunk, you do the following:

  • You create a new part in the package.  The part can have a number of content types, listed below.  When you create the part, you assign a unique ID to the part.
  • You store the content that you want to import into the part.  You can import a variety of types of content, including another Open XML word processing document, HTML, or text.
  • The main document part has a relationship to the alternative format part.
  • You add a w:altChunk element at the location where you want to import the alternative format content.  The r:id attribute of the w:altChunk element identifies the chunk to import.  The w:altChunk element is a sibling to paragraph elements (w:p).  You can add an altChunk element at any point in the markup that can contain a paragraph element.

A few options for content types that can be imported into a document are:

  • application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml

The alternative format content part contains an Open XML document in binary form.

  • application/xhtml+xml

The alternative format content part contains an XHTML document.

  • text/plain

The alternative format content part contains text.

There are more than these three options; the code presented in this post shows how to implement altChunk for these three types of content.

The altChunk markup in the document looks like this:

<w:p>
  <w:r>
    <w:t>Paragraph before.</w:t>
  </w:r>
</w:p>
<w:altChunkr:id=AltChunkId1 />
<w:p>
  <w:r>
    <w:t>Paragraph after.</w:t>
  </w:r>
</w:p>


altChunk: Import Only

One important note about altChunk – it is used only for importing content.  If you open the document using Word 2007 and save it, the newly saved document will not contain the alternative format content part, nor the altChunk markup that references it.  Word saves all imported content as paragraph (w:p) elements.  The standard requires this behavior from a conforming application.

Using altChunk

The following screen-clipping shows a simple word processing document.  It has a heading, a paragraph styled as Normal, and a comment:

The following screen-clipping shows another word processing document, with content that we want to insert into the first document.

After running the example program included with this post, the resulting document looks like the following.  Notice that the resulting document has comments from both of the source documents:

The following example shows how to merge two Open XML documents using altChunk.  It uses V1 of the Open XML SDK, and LINQ to XML:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using DocumentFormat.OpenXml.Packaging;
using System.Xml;
using System.Xml.Linq;

class Program
{
    static void Main(string[] args)
    {
        XNamespace w =
            “http://schemas.openxmlformats.org/wordprocessingml/2006/main”;
        XNamespace r =
            “http://schemas.openxmlformats.org/officeDocument/2006/relationships”;

        using (WordprocessingDocument myDoc =
            WordprocessingDocument.Open(“Test.docx”, true))
        {
            string altChunkId = “AltChunkId1”;
            MainDocumentPart mainPart = myDoc.MainDocumentPart;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
              “application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml”,
              altChunkId);
            using (FileStream fileStream =
                File.Open(“TestInsertedContent.docx”, FileMode.Open))
                chunk.FeedData(fileStream);
            XElement altChunk = new XElement(w + “altChunk”,
                new XAttribute(r + “id”, altChunkId)
            );
            XDocument mainDocumentXDoc = GetXDocument(myDoc);
            // Add the altChunk element after the last paragraph.
            mainDocumentXDoc.Root
                .Element(w + “body”)
                .Elements(w + “p”)
                .Last()
                .AddAfterSelf(altChunk);
            SaveXDocument(myDoc, mainDocumentXDoc);
        }
    }

    private static void SaveXDocument(WordprocessingDocument myDoc,
        XDocument mainDocumentXDoc)
    {
        // Serialize the XDocument back into the part
        using (Stream str = myDoc.MainDocumentPart.GetStream(
            FileMode.Create, FileAccess.Write))
        using (XmlWriter xw = XmlWriter.Create(str))
            mainDocumentXDoc.Save(xw);
    }

    private static XDocument GetXDocument(WordprocessingDocument myDoc)
    {
        // Load the main document part into an XDocument
        XDocument mainDocumentXDoc;
        using (Stream str = myDoc.MainDocumentPart.GetStream())
        using (XmlReader xr = XmlReader.Create(str))
            mainDocumentXDoc = XDocument.Load(xr);
        return mainDocumentXDoc;
    }
}


To use altChunk with HTML, the code looks like this:

using (WordprocessingDocument myDoc =
    WordprocessingDocument.Open(“Test3.docx”, true))
{
    string html =
      @”<html>
            <head/>
            <body>
                <h1>Html Heading</h1>
                <p>This is an html document in a string literal.</p>
            </body>
        </html>”;
    string altChunkId = “AltChunkId1”;
    MainDocumentPart mainPart = myDoc.MainDocumentPart;
    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
        “application/xhtml+xml”, altChunkId);
    using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
    using (StreamWriter stringStream = new StreamWriter(chunkStream))
        stringStream.Write(html);
    XElement altChunk = new XElement(w + “altChunk”,
        new XAttribute(r + “id”, altChunkId)
    );
    XDocument mainDocumentXDoc = GetXDocument(myDoc);
    mainDocumentXDoc.Root
        .Element(w + “body”)
        .Elements(w + “p”)
        .Last()
        .AddAfterSelf(altChunk);
    SaveXDocument(myDoc, mainDocumentXDoc);
}


Using V2 of the Open XML SDK:

using (WordprocessingDocument myDoc =
    WordprocessingDocument.Open(“Test1.docx”, true))
{
    string altChunkId = “AltChunkId1”;
    MainDocumentPart mainPart = myDoc.MainDocumentPart;
    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
        AlternativeFormatImportPartType.WordprocessingML, altChunkId);
    using (FileStream fileStream = File.Open(“TestInsertedContent.docx”, FileMode.Open))
        chunk.FeedData(fileStream);
    AltChunk altChunk = new AltChunk();
    altChunk.Id = altChunkId;
    mainPart.Document
        .Body
        .InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
    mainPart.Document.Save();
}


The attached code shows examples of placing an Open XML document, html, and text into an alternative content part.  I’ve provided two versions of the example – one using V1 of the Open XML SDK (and LINQ to XML), and another using V2 of the Open XML SDK.

altChunk.zip

Comments (54)

  1. Ed Cawthorne says:

    Hi Eric,

    Interesting post.

    I have just been reading up on OpenXML and it looks like a great solution to my document assembly problem.

    Is it possible to combine excel tables/charts and powerpoint slides into a word document using OpenXML.

    Clearly altChunk wouldn’t be the method as it only works with Word/XML/XTML files but would it work for Excel/Powerpoint elements embedded into Word?

    Ed

  2. teltest says:

    Hi Eric,  Great bit of code – nearly exactly what I was looking for.  I seem to have a problem though if each sub-document has a different header – the headers seem to get lost.  Any ideas?

    Terry

  3. Anand says:

    Hi Eric,

    Thanks for the code sample.

    I am facing a problem with the bullets & numbering when using altChunk to merge two word documents (office 2003 .doc documents converted to .docx using OFC.exe). The code I am using is given below.

               string oriDoc = @"C:Final.docx";

               string mergedDocPath= @"C:A.docx";

               using (WordprocessingDocument doc = WordprocessingDocument.Open(oriDoc, true))

               {

                   IEnumerator<Locked> enumerator = doc.MainDocumentPart.StyleDefinitionsPart.Styles.Descendants<Locked>().GetEnumerator();

                   while (enumerator.MoveNext() == true)

                       enumerator.Current.Val = BooleanValues.True; //Tried using False as well, but it doesnt make sense here.

                   doc.MainDocumentPart.Document.Save();

                   Paragraph paragraph = doc.MainDocumentPart.Document.Descendants<Paragraph>().Last();

                   AlternativeFormatImportPart importPart = doc.MainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML);

                   using (StreamReader reader = new StreamReader(mergedDocPath, true))

                       importPart.FeedData(reader.BaseStream);

                   AltChunk altChunk = new AltChunk();

                   altChunk.AltChunkProperties = new AltChunkProperties();

                   altChunk.AltChunkProperties.MatchSource = new MatchSource();

                   altChunk.AltChunkProperties.MatchSource.Val = BooleanValues.True;//Tried using False as well

                   altChunk.Id = doc.MainDocumentPart.GetIdOfPart(importPart);

                   paragraph.InsertAfterSelf(altChunk);

                   doc.MainDocumentPart.Document.Save();

               }

    A.docx originally looks like this,

    —————————————————————————————————-

    Diggity dog

    ffdgfgfdg

    first time dfvidjgldgdgm

    dfsfsdfgdgdfgghfgghfh:

    1.       Zoom vroom

    2.       doom boom

    3.       dhgfhfghgfhfghgfhgfhgfh dsfsfsddfsfdsfgdsfdgffg fgffdgdghfdh fsfgf fdsfsfsdfdsf:

    4.       Sweeetdfvggdggf

    a.       dfsfvcff

                                                                  i.      why Go

                                                                ii.      jeremy

                                                               iii.      black

    —————————————————————————————————-

    After merging the formatting becomes like this,

    —————————————————————————————————-

    Diggity dog

    ffdgfgfdg

    first time dfvidjgldgdgm

    dfsfsdfgdgdfgghfgghfh:

    • Zoom vroom

    • doom boom

    • dhgfhfghgfhfghgfhgfhgfh dsfsfsddfsfdsfgdsfdgffg fgffdgdghfdh fsfgf fdsfsfsdfdsf:

    • Sweeetdfvggdggf

    • dfsfvcff

    • why Go

    • jeremy

    • black

    —————————————————————————————————-

    Something similar happens to bullets too. The bullets style changes to the bullets styling of "Final.Docx".

    On checking the afchunk the bullet & numbering were correct, which indicates that the parent document superimposes its bulleting & numbering on the chunk.

    I thought about setting DocumentProtection.Enforcement and DocumentProtection.Formatting to false. Also I tried setting AutoFormatOverride.Val to false. But I couldn’t find a way to do that. Also will setting these help?

    Also does setting AltChunk.Id manually rather than by using MainDocumentPart.GetIdOfPart cause a difference?

    If the above method does not work, should I instead take all the Styles from the second document and merge them into the first document? Although this does not look like the right way to go about doing things.

    Thanks,

    Anand.

  4. Doug Mahugh says:

    Stephen McGibbon has screenshots of the Open XML and ODF support coming in Windows 7 Wordpad , as announced

  5. Hi, Ed, Terry, and Anand,

    Thanks for the great questions.  I’ll be responding to these, but it may be as late as the end of next week, due to schedule constraints.  Thanks for your patience.

    -Eric

  6. Suite à la PDC 2008 et au workshop Open XML donné par Microsoft à Redmond ( Doug , encore mille excuses

  7. I received this message privately, but the question and the response are relevant to many, so including it here.

    Question:

    I’m attempting to merge multiple documents (which contain rows of a table) into a single document.  When the merge process happens, I get what looks to be a paragraph marker between my table rows (so there’s visual seperation between the rows of the table, wich isn’t what I want).

    Any thoughts on how to modify altChunk’s behavior to not include the document delimeter between the documents that it merges?

    My response:

    I’ve seen this same behavior, and as far as I know, this is behavior that is not configurable in Word.  I’ll check, but would guess that this can’t be changed.

    The solution to this is to write some utility that can move content between docs (not using altChunk).  I’m starting on the prep work for this.  See this post:

    http://blogs.msdn.com/ericwhite/archive/2008/11/03/inserting-deleting-moving-paragraphs-in-open-xml-wordprocessing-documents.aspx

    -Eric

  8. Knut Hamang says:

    Eric, I am having the excact problem as Anand. Maybe you have a good solution for this.

    Rather strange that it is not possible to do inline numbering type in the document.xml itself.

  9. KumaAnith says:

    Hi,

    i have few doubts

    1. is it possible to view a altchunk from word 2007 or it can be view only in xml format

    2. can we insert the contents in between the documents?

  10. One of the most common requests we hear related to word processing documents is the ability to merge

  11. Ernest says:

    Hi,

    I try to mergedonc and then making some string replace using this code : http://www.codeproject.com/KB/office/OfficeTokenReplacement.aspx

    It’sdoing some regex on thewhole xml

    but when I use chunk, the unziped embeded content is under AltChunk1.docx and I have to uzip after.

    I first tried with the PDC source code

    //Find all content controls in document

                   List<SdtBlock> sdtList = mainPart.Document

                       .Descendants<SdtBlock>().Where(s => sourceFile

                           .Contains(s.SdtProperties

                               .GetFirstChild<Alias>().Val.Value)).ToList();

                   //Go through all the content controls

                   if (sdtList.Count != 0)

                   {

                       string altChunkId = "AltChunkId" + id;

                       id++;

                       //Add altchunk into document

                       AlternativeFormatImportPart chunk =

                           mainPart.AddAlternativeFormatImportPart(

                           "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml",

                           altChunkId);

                       //stream data from source file into altchunk

                       chunk.FeedData(File.Open(sourceFile, FileMode.Open));

                       //Create new altchunk element

                       AltChunk altChunk = new AltChunk();

                       altChunk.Id = altChunkId;

                       //Swap out content control for altchunk

                       foreach (SdtBlock sdt in sdtList)

                       {

                           OpenXmlElement parent = sdt.Parent;

                           parent.InsertAfter(altChunk, sdt);

                           sdt.Remove();

                       }

                       //Save

                       mainPart.Document.Save();

                   }

    but I only have paragraph and no SdtBlock ?

    Could you please help me !!

  12. siri says:

    Hi Eric …Can this be used with word 2003?

  13. Ernest Bariq says:

    How can I insert an AltChunk at a special place ?

  14. S K Tripathi says:

    Hi Eric, Nice sample of code. I am using some html as altChunk. Its working for plain html but, if the html contains some images, the images are not coming. I understand the problem as images are not in the scope of the document. As you have mentioned that whwn the document with alt chunk is saved by MS Word2007, it converts all the altChunk to WordML. My question is whether can we do the same(converting HTML to WordML).

    It wil be a great help for my project.

  15. Resolution ================ Step 1: Open a new Microsoft Word 2007 document and type A B C Save the document

  16. rama says:

    Hi Eric, I’m getting following error when trying to execute the above code in an aspx page. may I know what is causing this issue.

    Thanks,

    Rama

    Server Error in ‘/TMS’ Application.

    ‘AltChunkId14’ ID conflicts with the ID of an existing relationship for the specified source.

    Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

    Exception Details: System.Xml.XmlException: ‘AltChunkId14’ ID conflicts with the ID of an existing relationship for the specified source.

    Source Error:

    Line 78:                             string altChunkId = "AltChunkId" + loop;

    Line 79:                             MainDocumentPart mainPart = myDoc.MainDocumentPart;

    Line 80:                             AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(

    Line 81:                               "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml",

    Line 82:                               altChunkId);

    Source File: c:InetpubwwwroottmstrunkwebTariffTariff.aspx.cs    Line: 80

    Stack Trace:

    [XmlException: ‘AltChunkId14’ ID conflicts with the ID of an existing relationship for the specified source.]

      MS.Internal.IO.Packaging.InternalRelationshipCollection.ValidateUniqueRelationshipId(String id) +634905

      MS.Internal.IO.Packaging.InternalRelationshipCollection.Add(Uri targetUri, TargetMode targetMode, String relationshipType, String id, Boolean parsing) +210

      System.IO.Packaging.PackagePart.CreateRelationship(Uri targetUri, TargetMode targetMode, String relationshipType, String id) +62

      DocumentFormat.OpenXml.Packaging.OpenXmlPart.CreateRelationship(Uri targetUri, TargetMode targetMode, String relationshipType, String id) +36

      DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.AttachChild(OpenXmlPart part, String rId) +88

      DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.InitPart(T newPart, String contentType, String id) +246

      DocumentFormat.OpenXml.Packaging.MainDocumentPart.AddAlternativeFormatImportPart(String contentType, String id) +47

      Systrends.TMS.Web.Tariff.Tariff.<Page_Load>b__1(<>f__AnonymousType1`3 mergingdocuments) in c:InetpubwwwroottmstrunkwebTariffTariff.aspx.cs:80

      System.Array.ForEach(T[] array, Action`1 action) +47

      Systrends.TMS.Web.Tariff.Tariff.Page_Load(Object sender, EventArgs e) in c:InetpubwwwroottmstrunkwebTariffTariff.aspx.cs:66

      System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr fp, Object o, Object t, EventArgs e) +14

      System.Web.Util.CalliEventHandlerDelegateProxy.Callback(Object sender, EventArgs e) +35

      System.Web.UI.Control.OnLoad(EventArgs e) +99

      System.Web.UI.Control.LoadRecursive() +50

      System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) +627

    Version Information: Microsoft .NET Framework Version:2.0.50727.3053; ASP.NET Version:2.0.50727.3053

  17. Hi Rama,

    Somehow you are getting duplicate rIDs for the altChunk that you are adding.  rIDs need to be unique – there are a variety of ways to enforce this.  It isn’t a problem when creating a document from scratch, but when modifying an existing document, you need to take care that you only add new parts with uniuqe rIDs.  Does this help you with your issue?

    -Eric

  18. Kulio says:

    Hi Eric,

    I’ve really appreciated your article and I have one question: regarding the "altChunk: Import Only" section, is there any way to avoid this peculiar behaviour?

    In other words, is there an altChunk property or another markup that can be used to embed external sources (i.e. html files) avoiding them to be totally erased from the archive after the first saving?

    Many thanks,

    Kulio.

  19. Hi Kulio,

    Unfortunately, the behavior can’t be changed.  When you open the document in Word, the embedded external source is removed from the package.

    -Eric

  20. There are two ways to assemble multiple Open XML word processing documents into a single document: altChunk,

  21. DocumentBuilder is an example class that’s part of the PowerTools for Open XML project that enables you

  22. Ramesh says:

    i have used your v2 code to merge the documents, and i add my page to sharepoint site, but sharepoint not recongnzing Last() method in the following line:

    InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());

    Iam getting following exception.

    ‘System.Collections.Generic.IEnumerable<DocumentFormat.OpenXml.Wordprocessing.Paragraph>’ does not contain a definition for ‘Last’   at System.Web.Compilation.AssemblyBuilder.Compile()

  23. Hi Ramesh, you need to include a "using System.Linq;" using statement.

    -Eric

  24. rpallothu says:

    hi Eric,

    Thanks for your response.

    but i used System.Xml.Linq namespace.

    I used System.Xml.Linq  and DocumentFormat.OpenXml dll to merge the office documents, which is working fine in 3.5 framework. When i bind my page with sharepoint site iam getting an exception saying

    ‘System.Collections.Generic.IEnumerable<DocumentFormat.OpenXml.Wordprocessing.Paragraph>’ does not contain a definition for ‘Last’   at System.Web.Compilation.AssemblyBuilder.Compile()

    Code snippet:

    using (WordprocessingDocument myDoc =

                       WordprocessingDocument.Open("Desc.docx", true))

                   {

                       string altChunkId = "AltChunkId" + i;

                       MainDocumentPart mainPart = myDoc.MainDocumentPart;

                       AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(

                           AlternativeFormatImportPartType.WordprocessingML, altChunkId);

                       using (FileStream fileStream = File.Open("Temp.docx", FileMode.Open))

                           chunk.FeedData(fileStream);

                       AltChunk altChunk = new AltChunk();

                       altChunk.Id = altChunkId;

                       mainPart.Document

                           .Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());

                       mainPart.Document.Save();

                   }

    Note: when i change my applicaiton framework version to 3.0 also i am getting the same exception in my local, which i got in sharepoint.

    Is it mean that sharepoint doens’t support 3.5 framework DLL..

    Please advise.

  25. Hi Ramesh, the Enumerable.Last extension method is in the System.Linq namespace, not System.Xml.Linq.  By default, the SharePoint project doesn’t include a using for System.Linq.  So to use the Last extension method, you need to add that using statement.

    In general, when you get build errors like this, take a look at the MSDN docs on the class/method/type.  The docs always tell you which assembly the class is in, and what namespace the class is in.  Then, you can add appropriate references and using statements.  Make sense?

    -Eric

  26. rmagill says:

    Good stuff, very helpful.   I have a slight twist to this I am working on, maybe someone can help.  

    Instead of opening existing files and merging them, I am programmatically creating WordProcessingDocuments using C# in .NET.  Then based on various conditions I may or may not want to combine them and then stream them out as a single document.    

    So instead of adding data to the stream in the form of:

    Stream fileStream = System.IO.File.Open(fileName, FileMode.Open);

    chunk.FeedData(fileStream);

    I tried to do this:

    Stream stream = wordDoc.MainDocumentPart.GetStream();

    chunk.FeedData(stream);

    Which compiles but then when you try to open the final document it give me a message that the docx can’t be opened because of problems with the contents.    Any ideas?

  27. Hi rmagill,  quick question – are you properly disposing of all of your streams?  That could very well cause this problem.  Another debugging technique for a situation like this – read streams to byte arrays – as necessary, you can create a non-resizable memory stream from a byte array using one of the MemoryStream constructors (I believe that the memory stream uses the passed in byte array as its backing store).  You can then examine this byte array to see what’s different.

    It’s best to always use a ‘using’ block for every object that implements IDisposable:

    private static void SaveXDocument(WordprocessingDocument myDoc,

        XDocument mainDocumentXDoc)

    {

        // Serialize the XDocument back into the part

        using (Stream str = myDoc.MainDocumentPart.GetStream(FileMode.Create, FileAccess.Write))

        using (XmlWriter xw = XmlWriter.Create(str))

            mainDocumentXDoc.Save(xw);

    }

    -Eric

  28. Kulio says:

    Hi Eric,

    I thank you for your last answer.

    I’ve solved the problem creating the html files ‘on the fly’ and using a kind of ‘custom marker’ in the document that is replaced at runtime with the proper altchunk reference tag.

    Now I am wondering if there is a way to embed also a css stylesheet for the html files.

    The stylesheet file is placed in a directory "word/html".

    I’ve found out that if I insert "<Default Extension="css" ContentType="text/css" />"

    into [Content_Types].xml I get no error message when opening the docx.

    However the CSS is ignored in the docx file.

    On the contrary, ff I integrate the styles in a <style> tag inside the html file, the proper style is displayed correctly inside the docx file.

    Many thanks,

    Kulio.

  29. Hi Kulio,

    From the dev team:

    Word only supports a few content types for altChunks. Word does support HTML and MHT, which is why putting them in a <style> tag worked. For HTML, Word only reads the HTML file itself and not any supporting files in the package. So if you have any external stylesheets, images, etc. MHT might be the best route.

    -Eric

  30. Balazs says:

    I’ve had no trouble getting this working.  However, the one difficult I’m having is this:

    If I have a hyperlink in my source HTML that looks like this: <a href="myimage.jpeg">, and I have added myimage.jpeg, is there any way that I can get my hyperlink to refer to that image?  Currently the URL is resolved to "directoryTheDocumentIsIn/myimage.jpeg". I’m not sure whether the HyperlinkBase extended property could be used for this…I can’t figure out a way.

    Also, I’m a little confused. I was under the impression that with altchunk, Word does a one-time conversion of the content and does away with the source altchunk file.  However, I find that even after opening the docx file several times, the altchunk file remains, and document.xml still contains the <altchunk> tag, rather than any imported html.

  31. AltChunk rocks! However, I can’t get it to work inside a headerpart. Inside the maindocumentpart it works perfect. Is this supported at all?

    You can check the broken document here: http://blogs.infosupport.com/cfs-file.ashx/__key/CommunityServer.Blogs.Components.WeblogFiles/porint/order.docx

  32. Sanjay says:

    Beautiful work Eric.  Worked like a charm.

    I have captured content using InfoPath forms in Moss and wished to export the content to a word document.  the content control do not allow you to map the content directly by including the customxml parts.  This option sure worked.

    Cheers

  33. Maria Hsiung says:

    Thanks for the awesome post!

    I’m also trying to assemble different types of office documents (excel, word, powerpoint) into a single document.  Your post really helped me with combining word documents, but I’m not sure how to proceed with the other types (excel, powerpoint).  Any suggestions?

  34. Jakob Flygare says:

    What is best practice for assembling a document from a database source?

    Is it to use content controls? To use AltChunks? To use content controls replaced at runtime with AtlChunks? To use custom markup replaced at runtime with database content; e.g. http://geekswithblogs.net/DanBedassa/archive/2009/01/16/dynamically-generating-word-2007-.docx-documents-using-.net.aspx or http://msdn.microsoft.com/en-us/library/cc850835(office.14).aspx).

  35. kmote says:

    I need to create a .docx file from an html document that includes image tags with a src element pointing to a url. Is there any way make to sure that images contained in the HTML document are put into the Word package, so that Word will render the images without internet connectivity? When I read your "altChunk: Import Only" section, I was hopeful that Word might actually accomplish this for me. I can’t get that to happen, however. Am I missing something?

    Also, you mention in that section that you must "open the file and save it" for the altchunk stuff to be removed. Is there anyway to do that programmatically? (rather than requiring the user to open and save). In fact, I’ve had to actually EDIT & save before the "auto-pruning" would take place. Any suggestions?

    Thanks!

  36. hi Kmote,

    I don’t know of any way to accomplish what you’re trying to do.  As far as I know, you are correct – you must make a small change in the file and save it.

    Ultimately, the solution to this is to have an html => open xml converter in code that you can modify for your specific needs.  I have this on my todo list – but it will be some time before I can get to this.

    I wish I had a better answer for you, but I don’t.

    -Eric

  37. Matt says:

    Eric,

    Thanks for this post! Any advice on merging Excel documents into a Word document? I have been searching the internet up and down and your blog is by far the best.

    Thanks,

    Matt

  38. ted says:

    Hi, really good post, I tried to import a mht file, but it doesn’t work. Do you know the content type for this?

  39. Fantastic Eric. Just what I needed. I can’t tell you how much time and grief you probably saved me… Thanks!

  40. Jeff1234 says:

    Hi Eric,

    I was planning to use altchunk to insert html text to word template that have custom xml tags (pink tags from schema). My requirement is that user will create template with xml tag. I will read the tag name using xpath or linq to xml and replace the node with altchunk. But since we cannot use Custom XMl tags (pink tags) as per Gray’s blog what is the alternative solution? How can i map the content control with my xml schema tag so that i can insert altchunk. There would be several such tags and each will have different html text.

    Here is the link from Gray’s blog

    http://blogs.technet.com/gray_knowlton/archive/2009/12/23/what-is-custom-xml-and-the-impact-of-the-i4i-judgment-on-word.aspx?CommentPosted=true#commentmessage

    Any help is appreciated.

  41. Chris says:

    Hi Eric,

    Looks like its been a while since your last post. I am trying like many others to merge the headers in as well. I can merge the documents no problems but only the first documents header and footer get saved to the final document. Is there a way around this?

    Chris

  42. Chris says:

    Hi Eric,

    Yes we have looked at DocumentBuilder but as we are using xml documents, streams and office documents it is not really suitable.

    I have decided to take the longer route of a copying everything manually into to each document( we loop through them, depending on how many are selected), making sure that all style-references, header-references, footer-references, … are preserved.

    I have been using the reflector tool to see how this is created but cannot seem to find the rsid values for each paragraph to add to the properties. Below is what i have so far,

    Dim paraRef = mainPart.GetIdOfPart(mainPart.Document.MainDocumentPart)

    Dim para As Paragraph = New Paragraph With {.RsidParagraphAddition = paraRef}

    Dim parid As String = mainPart.Document.MainDocumentPart.GetIdOfPart(mainPart.Document.MainDocumentPart)

    Dim headid As String = mainPart.Document.MainDocumentPart.GetIdOfPart(mainPart.Document.MainDocumentPart.HeaderParts).ToString

    Dim footid As String = mainPart.Document.MainDocumentPart.GetIdOfPart(mainPart.Document.MainDocumentPart.FooterParts).ToString

    Dim headRef As HeaderReference = New HeaderReference With {.Id = headid, .Type = HeaderFooterValues.First}

    Dim footRef As FooterReference = New FooterReference With {.Id = footid, .Type = HeaderFooterValues.First}

    Dim title As TitlePage = New TitlePage()

    Dim paraProp As ParagraphProperties = New ParagraphProperties

    Dim sectionProperty As SectionProperties = New SectionProperties

    sectionProperty.Append(headRef)

    sectionProperty.Append(footRef)

    sectionProperty.Append(title)

    paraProp.Append(sectionProperty)

    para.Append(paraProp)

    What am i missing?

  43. Hi Chris,

    First thing – I modified DocumentBuilder a while ago so that it works just fine with streams and in-memory documents.  I’m not quite clear why you can’t use it.

    Regarding Rsid elements and attributes, you really don’t need to add those.  Those are only used for a fairly obscure scenario where I pass a single document to two people, who separately edit it, and then the results are merged back into a single document.  If you are programmatically assembling a document, then almost by definition, you don’t care about Rsid elements and attributes.  You can discard those in the generated document.

    Regarding your example, it is not clear to me what is missing.  In general, I take the approach of creating the resulting document exactly as I want it using Word, and then looking at the resulting markup.

    -Eric

  44. Chris says:

    Hi Eric,

    Sorry iv just realised this is the case, I misread the error we recieved and gave up on it a little to quick. Iv managed to get the DocumentBuilder working as we like it however it only works with DocumentFormat.openxml.dll v:1.0.1825.0.

    We need to use v:2.0.5022.0 to make use of some other features that we use. If we use the v2 with the DocumentBuilder then we get a null value for part here:

    public static XDocument GetXDocument(this OpenXmlPart part)

           {

               XDocument xdoc = part.Annotation<XDocument>();

    This same code works with v:1.0.18

    Stack Trace as follows

    StackTrace "   at OpenXml.PowerTools.DocumentExtensions.GetXDocument(OpenXmlPart part) in C:UserscbertrandDesktopOpen_XML_PowerToolsClassesDocumentExtensions.cs:line 36    at OpenXml.PowerTools.Source..ctor(WordprocessingDocument source, Boolean keepSections) in C:UserscbertrandDesktopOpen_XML_PowerToolsClassesDocumentBuilder.cs:line 39    at DocBuildTest._Default.mergedocs() in C:ProjectsDocBuildTestDocBuildTestDefault.aspx.vb:line 36" String

    Is there a resolution for this?

    Thanks for all your help so far

    Chris

  45. Chris says:

    Hi Eric,

    Just noticed i haven’t stated the exact error:

    Object reference not set to an instance of an object.

    In addition it is the: Imports DocumentFormat.OpenXml.Wordprocessing statement that is not available with the previous dll, which means docx.MainDocumentPart.Document cannot be found and type SimpleField is not declared.

    We use these as we have mergefields in each document which get populated upon creation.

    Chris

  46. Bilel says:

    Hi Eric,

    After merging multiple docx into single document , how can I update the source docx files if the user modify the content of the assembled document?  

  47. Jeff says:

    Hi Eric,

    I'm merging HTML documents into Word documents via altChunk, but I'd like the style from the

    Word documents to be applied to the HTML documents. I've tried putting the altChunk inside of a paragraph, run and have even done it without the surrounding sdt tags, but still can't get it to work. Have any suggestions? Here is some sample markup:

    <w:sdt>

                       <w:sdtPr>

                         <w:alias w:val="description" />

                         <w:tag w:val="description" />

                       </w:sdtPr>

                       <w:sdtContent>

                         <w:p w:rsidR="00275992" w:rsidRPr="00275992" w:rsidRDefault="00275992" w:rsidP="00EC68D3">

                           <w:pPr>

                             <w:pStyle w:val="LineItemTable" />

                           </w:pPr>

                           <w:r>

                             <w:altChunk r:id="raac4be36-f977-4735-9ffc-a5cbf35dd6d5">

                               <w:altChunkPr>

                                 <w:matchSrc w:val="false" />

                               </w:altChunkPr>

                             </w:altChunk>

                           </w:r>

                         </w:p>

                       </w:sdtContent>

                     </w:sdt>

  48. Surya says:

    Hi,

    I am trying to merge word documents in sharepoint document library. Some pages in the docs are in portrait and some in landscape. after merging documents all the pages in the documents r displayed in portrait mode. how can i retain page orientation programmatically ?

    i think we can do it by inserting section properties after each page or each document.

    here is  my code

    Appreciate your help..

               foreach (SPFile item in listitem.Folder.Files)

               {

                 //  SPFile inputFile = item.File;

                   SPFile inputFile = item;

                   string altChunkId = "AltChunkId" + id;

                   id++;

                   byte[] byteArray = inputFile.OpenBinary();

                   AlternativeFormatImportPart chunk = outputDoc.MainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML,

                       altChunkId);

                   using (MemoryStream mem = new MemoryStream())

                   {

                       mem.Write(byteArray, 0, (int)byteArray.Length);

                       mem.Seek(0, SeekOrigin.Begin);

                       chunk.FeedData(mem);

                   }

                   AltChunk altChunk = new AltChunk();

                   altChunk.Id = altChunkId;

                   outputDoc.MainDocumentPart.Document.Body.InsertAfter(altChunk,

                       outputDoc.MainDocumentPart.Document.Body.Elements<Paragraph>().Last());

                   outputDoc.MainDocumentPart.Document.Save();

               }

               outputDoc.Close();

               memOut.Seek(0, SeekOrigin.Begin);

               ClientContext clientContext = new ClientContext(SPContext.Current.Site.Url);

               ClientOM.File.SaveBinaryDirect(clientContext, outputPath, memOut, true);

               // Conversion

  49. John J says:

    Hi Eric,

    How about the support of Altchunks in Office 2003 with Compatibility pack installed. I created a very simple word document using open xml sdk and added an alt chunk to to the body with a stream of simple html content. It fails to open in office 2003.

    Any thoughts ??

    John

  50. Hi John,

    Yes, you are right, altChunk is not supported in Office 2003.  There are other features of Open XML not supported in 2003, such as content controls.  This has to do with the actual functionality in that version of Office.  There is no code to do the conversion and import for altChunk, nor to handle content controls, therefore those are not supported.

    -Eric

  51. Hi Eric,

    I am converting the html content to word using altchunk. The problem is spacing is adding between lines in the paragraph. In html there is no space between the lines but in the word the space is automatically getting added between each line.

    I used the below html.

    <html>

    <head/>

    <body>

    <div >

    <div>sdsdsd</div>

    <div><strong>sdsdsdsdsd sdsdsdsdsd sdsdsdsdsd</strong></div>

    <div><strong>erterterttr</strong></div>

    <div>Sample <em>Text</em></div>

    <div><font color="#ff0000">ACCCC</font></div>

    <div><font color="#ff0000">sdsdsd</font></div>

    <div><font color="#ff9900">Test Doc</font></div>

    <div><a href="http://sgehmoss01:9005//sites/Conversion/default.aspx">Default</a></div&gt;

    <div> </div>

    <div><a href="http://www.google.com/">All Items</a></div>

    <div><font color="#ff0000"></font></div>

    <div>AAA</div>

    <div><img alt="Home Page" src="http://sgehmoss01:9005/Sites/Conversion/_layouts/images/homepage.gif"></div&gt;

    <div> </div>

    <div><img alt="Second One" src="http://sgehmoss01:9005/Sites/Conversion/_layouts/images/homepage.gif"></div&gt;

    <div> </div>

    <div>Saasasas</div>

    <div>asas</div>

    <div>as</div>

    <div> </div>

    <div>asas</div></div>

    </body>

    </html>

  52. Matt Mackay says:

    Hi Eric, I sure hope you will respond to me. I'm really stuck. I've been adding altchunks and sometimes I get corrupt file and other times I do not. I haven't been able to find any pattern. I've unzipped the docx file and I'm manually playing with the document.xml file and I can move

    <w:altChunk r:id="somechunkId" />

    to different locations within the root of the body. Some of them work, others produce error

    Unspecified error: location part:/word/document.xml line 176, column 0

    It's very frustrating. I need to put HTML into a word template at certain points. Word asks if I would like to try and recover and it's fine but I can't automate that on our server.

    You seem to be the expert on altchunk, any thoughts? anything would be helpful.. thank you

    -Matt

  53. @Matt,

    Yes, you are correct, there are certain places where you can put altChunk, and other places where you can't.  altChunk imports block content (i.e. siblings of paragraphs and tables) so the altChunk element needs to go there, not within a paragraph.  Beyond that, I'm not sure.

    I'd be happy to take a look at one of your corrupted docs and I can probably tell you what is wrong.  If you would be good enough to submit the question on the forums at OpenXmlDeveloper.org, it would be super easy for me to respond.  Also, by answering the question there, others can take advantage of the answer.

    Cheers, Eric