Using XHTML in a WordprocessingML document


This question has come up a few times, most recently over on the OpenXMLDeveloper site (http://openxmldeveloper.org/forums/477/ShowThread.aspx#477)


The challenge a lot of folks have is that they want to generate a WordprocessingML document using pre-existing content. Often times that content is in other formats, like HTML. This is also the case if you have folks entering rich content in a web form or some other type of HTML control, and then you want to use that content to generate a wordprocessingML document. While there are tools out there that will transform from HTML into WordprocessingML, this is also easily achievable using the altChunk element.


You can place one or more XHTML files as a seperate part(s) in the ZIP package, and give it the proper content type. Then create a relationship to it from the document.xml part. Once you’ve done that, you can place the afChunk element (which is a block level element) into the content of the document, and reference the relationship ID that you used to point at the XHTML part. You also have the option to specify whether you want the styles to be merged with the document, or if you want it to maintain the source formatting.


So, for example, you could have the following:


<document>
  <body>
    <p><r><t>Here is a some WordprocessingML followed by someXHTML:</t></r></p>
    <altChunk r:id=”rel7″/>
    <p><r><t>Here is some more WordprocessingML</t></r></p>
  <body>
</document>


The relationship type is: http://schemas.openxmlformats.org/officeDocument/
2006/relationships/afChunk


The content type for html is: application/html


With the example above, the content of the HTML file that was referenced by the altChunk tag would show up directly inline after the first paragraph. Now, you should note that this is an import only feature. Once the file is opened, the XHTML content is merged with the rest of the file, and when you save, it will be represented with wordprocessingML rather than XHTML.


This was something I really wanted us to support with the 2003 XML formats when we did the cfChunk work. The cfChunk is extremely useful, and the altChunk builds off of it.


-Brian

Comments (8)

  1. Mike says:

    I’d be surprised if you can pass a .docx document using such chunk technique around so that even someone using an older version of Office can open the file using the "downlevel converter".

    This would imply the converter understands and performs the merge. Unless I am mistaken, the chunking technique is a Office 2003 feature, while the converter is expected to be provided to Office 2000 and Office XP users. Something does not sound right here.

  2. Mike says:

    But at least you are talking about file formats. 🙂

  3. BrianJones says:

    Hi Mike,

    The support for altChunks is actually only in Office 2007. Note that the altChunk feature isn’t as much a feature of the file format, but of the consuming application. The older versions (including 2003) using the converter don’t understand how to do the import and merge of the chunks.

    Also note that this is something that Office would never produce, only consume.

    So if you think that the altChunk support is useful in a solution you’re building, you should make sure that the people viewing those files have Office 2007.

    -Brian

  4. Mike says:

    Any roadmap considered to add the capability to the converter? This would enable the following scenario : the developer generates .docx documents by using the chunking technique extensively, and then passes the file around without worrying.

    I have to admit I need to spend time and figure out what the merge really does, i.e. whether it’s actually doable by hand. If that’s fairly simple, then the developer should be able to do that as well.

  5. Alex says:

    applicaton/html ? Where did you invent that from?

    HTML is usually text/html, XHTML is usually application/xhtml+xml (although Microsoft web browsers don’t understand that one yet).

    Personally, I don’t even think this functionality should be in the file format – two different OXML applications could convert the same HTML is two different ways. But, whatever.

  6. Uncle Sam says:

    Sorry, Perhaps U coulkd help me on this : i’ve saved a file with office 12 beta1 back then . then when open the same file with office 12 beta 2 i got a nerror saying elements and attributes in restricted names spaces are not allowed.

    thanks for any suggestion .

  7. Dating says:

    This question has come up a few times, most recently over on the OpenXMLDeveloper site ( http://openxmldeveloper.org/forums/477/ShowThread.aspx#477 ) The challenge a lot of folks have is that they want to generate a WordprocessingML document using pre-existin

  8. Weddings says:

    This question has come up a few times, most recently over on the OpenXMLDeveloper site ( http://openxmldeveloper.org/forums/477/ShowThread.aspx#477 ) The challenge a lot of folks have is that they want to generate a WordprocessingML document using pre-existin