Images in Open XML documents


Images are one of the basic elements of a document, and the use of images in documents continues to grow. Just a few years ago, it was relatively uncommon to have an image in a word-processing document, and downright rare to see one in a spreadsheet. Now images are commonplace in all types of documents, and they’ve become an expected component of professional-looking business documents.


As Brian Jones has explained on his blog, it’s pretty easy to embed an image in a WordprocessingML document that you’re generating from your own code. You just insert some markup in the document body where you want the image to appear, add a relationship to the image part, define a content type for that part, and you have an image in the document.


The relationship-based structure of Open XML documents allows for a lot of flexibility. in the case of an embedded image, you can have that relationship point to an image inside the document itself (as in Brian’s example), an external image on your local hard drive, or an external image located on a web server.


Sample document: Images.docx


The attached sample document shows how these three approaches can be implemented in programmatically generated WordprocessingML document. Un-zip the attached Images.zip file into a folder, and you’ll see two files: a WordprocessingML document (Images.docx) and an image (external-local.jpg). When you open the DOCX in Word 2007 you’ll see something like the screen shot shown here. If you don’t have internet connectivity, that third image won’t appear — more on that in a minute. (Frankly, if you don’t have internet connectivity, I’m not sure how you’re reading this.)


If you rename the Images.docx file to a .ZIP and drill down into it, you’ll see that its structure is pretty simple. There’s a document.xml “start part,” an embedded image (that first one, internal.jpg), a content-types item, and two relationship parts. Let’s look at the key details in the content types, document body, and relationships.


Content Types


The content types definition is very simple: all three images use the same definition, because they’re all the same type of content regardless of where it happens to be stored. Here’s the content-type definition from [Content_Types].xml for the jpg extension:


<Default Extension=”jpg” ContentType=”image/jpeg” />

Main Document Body


In the main document body, document.xml, you’ll see that the manner in which each image is embedded doesn’t vary. It’s the same markup to embed an internal image, an external/local image, or an external/web image:


<w:body>
<w:p>
<w:r>
<w:t>Internal image stored inside Images.docx:</w:t>
</w:r>
</w:p>
<w:p>
<w:r>
<w:pict>
<v:shape id=”myShape1″ type=”#_x0000_t75″ style=”width:400; height:240″>
<v:imagedata r:id=”rId1″/>
</v:shape>
</w:pict>
</w:r>
</w:p>
<w:p>
<w:r>
<w:t>External image stored in the local file system:</w:t>
</w:r>
</w:p>
<w:p>
<w:r>
<w:pict>
<v:shape id=”myShape2″ type=”#_x0000_t75″ style=”width:400; height:240″>
<v:imagedata r:id=”rId2″/>
</v:shape>
</w:pict>
</w:r>
</w:p>

<w:p>
<w:r>
<w:t>External image on a web server:</w:t>
</w:r>
</w:p>
<w:p>
<w:r>
<w:pict>
<v:shape id=”myShape3″ type=”#_x0000_t75″ style=”width:400; height:240″>
<v:imagedata r:id=”rId3″/>
</v:shape>
</w:pict>
</w:r>
</w:p>
</w:body>


Document Relationships


There are only three document-level relationships defined (in document.xml.rels), one for each image. Note the TargetMode attribute, which specifies whether the image is stored inside or outside the document package itself, and the Target attribute that shows the path to the image file itself:


<Relationship Id=”rId1″
Type=”http://schemas.openxmlformats.org/officeDocument/2006/relationships/image”
Target=”internal.jpg”/>
<Relationship Id=”rId2″
Type=”http://schemas.openxmlformats.org/officeDocument/2006/relationships/image”
Target=”external-local.jpg”
TargetMode=”External”/>
<Relationship Id=”rId3″
Type=”http://schemas.openxmlformats.org/officeDocument/2006/relationships/image”
Target=”http://www.mahugh.com/samples/external-http.jpg”
TargetMode=”External”/>

The benefit of this approach: flexibility


This structure allows for many creative development scenarios:



  • You can embed images in the document when appropriate. For example, static images that should be tightly bound to the content of the document.

  • You can store images on the local hard drive when appropriate. For example, perhaps you’re modifying or generating those images from another process that doesn’t know how to put them in an OPC container. Or maybe you’d like to include an image in the document that you know is on users’ hard drives already, without bloating the document by inserting it.

  • You can store images on a web server when appropriate. For example, the image may be highly dynamic, and you want to be able to update any number of distributed documents from a centralized location at any time. Or maybe you want to email a report to a colleague on Friday afternoon, and actually generate the embedded chart image over the weekend before they look at the document on Monday morning. (Hey, it’s just an example!)

Another important flexibility aspect to the relationship-based approach is that the location of these images can be modified without making any changes to the document body itself. The main document body is typically the largest part in an Open XML document, and the relationships part is usually much smaller and simpler. The ability to modify the relationships part independent of the document body means you can even do this on non-.NET platforms, by simply opening the DOCX from any ZIP library and modifying the targets of the relationships part.


Image handling is a very flexible aspect of the Open XML file formats. You can learn more by experimenting with the attached sample document. Here are a few things you can do to see how easy it is to put images in documents:



  • Replace internal.jpg in the package with another image of your choosing and then re-open Images.docx in Word.

  • Copy your own image over external-local.jpg and re-open Images.docx.

  • Modify the target of rId3 to point to a JPG image on your favorite web server.

P.S. The sample images in this document are from series of pictures on my personal blog from various business trips this year. For those who are interested, here are links to more shots in each series:Paris, FranceMunich, Germany (Oktoberfest)Sao Paulo, Brazil (Carnaval)

Images.zip

Comments (8)

  1. Chris Nokleberg says:

    Hi Doug,

    Is there anything in the spec about which image formats are supported? I posted a question in the openxmldeveloper forums recently:

    http://openxmldeveloper.org/forums/944/ShowThread.aspx#944

    Thanks!

  2. Doug Mahugh says:

    Hi Chris,

    I saw your question on OpenXmlDeveloper, but hadn’t responded yet because I’m not sure I know the answer to this.

    You may have noticed that in Section 15.2.13 of "Part 1 – Fundamentals" the spec says that an image part may be of "any supported content type" and then lists gif, png, tiff, pict and jpeg as "some example content types."

    And in Section 8.1.2 of "Part 2 – Open Packaging Conventions" it says "Package implementers shall only create and only recognize parts with a content type; format designers shall specify a content type for each part included in the format."

    Then later in the OPC section, in Annex F, "Standard Namespaces and Content Types," there is a list of the required package-specific content types, and no image formats are on the list.

    So the way I read that is that there is no defined list of supported image formats (i.e., content types) and each consumer may choose to support/render any arbitrary set of image content types.  But I’d like to get confirmation of that.  I’ll let you know what I find out.

    – Doug

  3. Chris Nokleberg says:

    Thanks Doug.

    I think it is a potential compatibility issue if both producers and consumers can decide to use or support any image format willy nilly (especially when you consider formats like EMF).

  4. Some interesting links from the past few days: File Formats Why China’s UOF is good – O’Reilly XML Blog

  5. Erika Ehrli says:

    I gathered a list of common Open XML questions related to programmability: What are the Open XML File

  6. 247Blogging says:

    Images are one of the basic elements of a document, and the use of images in documents continues to grow. Just a few years ago, it was relatively uncommon to have an image in a word-processing document, and downright rare to see one in a spreadsheet.

  7. Dating says:

    Images are one of the basic elements of a document, and the use of images in documents continues to grow. Just a few years ago, it was relatively uncommon to have an image in a word-processing document, and downright rare to see one in a spreadsheet.

  8. Weddings says:

    Images are one of the basic elements of a document, and the use of images in documents continues to grow. Just a few years ago, it was relatively uncommon to have an image in a word-processing document, and downright rare to see one in a spreadsheet.