OPC implementation test documents


In my presentation at TechEd SEA yesterday, I explained the importance of writing code against the relationship structure of an OPC package instead of hard-coding URIs that point to specific part names in specific folders. That’s a key concept for building flexible Open XML implementations that interoperate with other implementations: follow the relationships, and ignore the part names and locations, which may vary between different implementations.


The attached ZIP file contains three sample documents that demonstrate this principle. I’ve included a DOCX, XLSX, and PPTX that have start parts named whatever.xml, and don’t include the word/xl/ppt folders that Office uses for storing the start part. So the content of these documents is physically different from what Office would generate, but they’re valid Open XML documents that conform to the spec.


Here’s the root folder of the ZIP package for each of these sample documents:



If your Open XML implementation is written against the spec, it will open these documents just fine because you’re just looking for the officeDocument relationship type to find the start part, and the name and location of that part don’t matter at all. But if an implementation is written against something other than the spec (oh, say, Office’s implementation), then your app won’t open these documents.


You can use these documents to test an Open XML implementation for conformance to the rule that it’s relationships and not URIs that define the structure of an OPC package. And in case you’re wondering, yes, Word 2007, Excel 2007, and PowerPoint 2007 open these files just fine:



If you find another implementation that doesn’t work with these documents, you might want to let the developer know. When you’re just working with documents created in Office, these details don’t matter, but the pace of delivery of new Open XML implementations is accelerating, and we all need to follow the spec to deliver on the interoperability that it enables.

TestDocuments.zip

Comments (11)

  1. Interesting post.  This sort of overview information is really helpful.  Does this mean that the minimum requirement is both an "_rels.rels" file and a "[Content_Types].xml", or is the latter optional if you don’t scramble the parts?

  2. dmahugh says:

    Sorry for the delay in moderating comments here — connectivity has been poor for me the last 24 hours …

    Yes, those two are always required, as well as an officeDocument part, or "start part."  For a word-processing document, that’s all you need, and for a spreadsheet document you need a worksheet part in addition to the workbook start part.  For a presentation, you need several parts because the slide is based on a slide layout, which is based on a slide master, and you need a notes master and a few other things as well.

    The content types part is always required, and its name and location is fixed: it must be [Content_Types].xml, and it must be in the root of the package.  That’s so that a consumer always knows where to look to see the manifest of content types in the package, which is something a consumer usually wants to do before anything else.

    Relationships are always in a _rels folder that is at the same level (in the folder hierarchy) as the part that is the source of the relationship, and the relationships part’s name is formed by adding ".rels" to the source part.  So, for example, relationships from whatever.xml are in the _rels folder in the same location as whatever.xml, and they’re stored in a part named whatever.xml.rels.

    The "package relationships" are a special case, relationships that are from the package to something inside it.  Since the source part doesn’t have a name in this case, the relationships are stored in a part named ".rels" with a filename and no extension.  Package relationships include the officeDocument relationship to the start part, and also relationships to things like metadata or digital signatures.

    Hey, here’s a question for anyone reading this: is there a way to manually create a .rels file from scratch in Windows?  That is, create a file with an extension but no filename?  When I create one from scratch manually, I always do it with a filename, then go to a DOS prompt and rename it to .rels, but I’ve not found a way to do that in Windows.  (Perhaps this is due to my age — I rather miss command-line interfaces, frankly. :-))

    Glad you found this helpful, and I’m planning to do a few more posts on these sorts of topics soon.

  3. hAl says:

    @Doug

    With Notepad

    Use "Save as"

    Choice filetype to save as "All files"

    Add the preferred extension to the name.

  4. dmahugh says:

    Hey thanks, hAl!  That works like a champ.  I thought I tried that, but I guess not.

  5. hAl says:

    It is always amusing explaining someone working at Microsoft how their own products work

    😉

  6. John Hensley says:

    Hey Doug, do you remember when Scientific American was the best in its class?

    Look what it publishes now:

    http://blog.sciam.com/index.php?title=micro_oft_gets_spanked_twice_in_one_week&more=1&c=1&tb=1&pb=1&more=1#comments

    HAHAHAH MICRO$OFT GET IT?

  7. dmahugh says:

    Yes I do, John.  Scientific American linking to … Slashdot?  My father, who built shelves in my closet to hold 10 years of Scientific American (1958-1968), just rolled over in his grave.

    Although I sort of liked the cartoon.  Perhaps we can convince the fanatics to all use cartoons going forward.  It eliminates a lot of the tedium of reading their arguments.

  8. Doug Mahugh says:

    Like many people, I thought we’d know the official outcome of the DIS 29500 process today, but it looks

  9. Dating says:

    In my presentation at TechEd SEA yesterday, I explained the importance of writing code against the relationship structure of an OPC package instead of hard-coding URIs that point to specific part names in specific folders. That’s a key concept for buildin

  10. Weddings says:

    In my presentation at TechEd SEA yesterday, I explained the importance of writing code against the relationship structure of an OPC package instead of hard-coding URIs that point to specific part names in specific folders. That’s a key concept for buildin

  11. Doug Mahugh says:

    In this blog post, I’m going to cover some of the details of how we approached the challenges of testing