Leveraging content in other formats

There is a really cool feature that we added into the WordprocessingML format that allows you to pass a file off to a consumer using alternative formats embedded within the WordprocessingML if you know that the consumer supports that alternate format. We had a lot of customers asking for this type of functionality, so we added the alternate content anchors into the WordprocessingML format to help with document assembly scenarios.

Scenario

I'm building a document generation tool that will allow users to fill out a form on a web site and automatically generate a rich Wordprocessing document based on the values they filled out in the form (think of a tool like a contract generator). I get some rich content back from the user that they filled out in a web form, so it's formatted as XHTML. Rather than having to do a translation from XHTML into WordprocessingML, I can just include the XHTML in the file as well, as long as I know that the user is going to open the file in an application that knows how to consume XHTML. If I don't know this, then of course I'll need to transform it into WordprocessingML.

This was a scenario we saw a lot of folks hitting with the earlier version of WordprocessingML from Office 2003. People had content repositories with HTML, and they didn't want to have to build an HTML to WordprocessingML translator just to get those chunks into the Word document.

Types of content allowed

This is completely up to the consuming application. The Ecma spec just defines where you put the alternate content, and how you identify it. There are no limitations on what kind of content you can place, and there are no rules on what type of content you must support. If you look at the definition in the spec, it says the type of content allowed is:

Any content, support for which is application-defined.

[Note: Some examples of formats which might be supported include:

  • Text = application/txt
  • RTF = application/rtf
  • HTML = application/html
  • XML = application/xml

end note]

So you can see that there are a few examples of the types of content you might want to support, but there are no limits or requirements.

Creating alternate content

In order to make it clear that there isn't an additional burden of supporting the various types of content that may occur, we said in the spec that a conformant producer is not able to create the alternate content chunks. This way, you know when you write a consuming application that you aren't required to support these alternate chunks. It's only something you can optionally decide to support. A producer should only create alternate chunks if they have a knowledge of what the consumer understands. There is no guarantee that anyone else will support XHTML within the files for example.

Guidelines in the spec

The IBM folks are clearly spending some resources scouring the Open XML spec looking for ways in which they can try to block the ISO approval (we've already discussed the huge financial bet they've made in ODF being the only standard). It looks like Rob Wier of IBM found a rather poorly worded description of alternate chunks in Part 1 of the spec. I think we did a poor job of explaining that there is no requirement to consume alternate chunks. The spec was trying to call out that any content type can be used if you want to, but it's not a conformant document because consumers don't have to support that content. We also wanted to be clear that if a consumer does understand the alternate content, they should translate it into WordprocessingML to match the rest of the file, so that on save, it's all the same format.

This was a good catch by Rob. I agree that it could be a bit clearer in what is required of both consumers and producers. I really wish IBM had spent more energy trying to improve the spec earlier on (they are members of Ecma and could have joined the Open XML TC). This is something we easily could have cleared up the wording on. As part of the ISO fast track process though, we have a chance to gather comments from the various national bodies and make any fixes required before finalizing. This is definitely something we can look into clearing up.

-Brian