Custom schemas revisited


I’ve seen some signs of confusion about custom schema support lately. For example, I’ve seen a vendor claim that Open XML’s support for custom schemas is “essentially inherent in XML itself” and that “there is nothing that OOXML supports via custom schemas that ODF 1.0 does not already support.”


I suppose that if your goal is to prevent your customers from considering formats that your software doesn’t support, then this tactic probably makes sense. After all, if you can convince people that Open XML offers nothing new that other formats don’t already offer, they’d probably be inclined to conclude that there’s no need for an Open XML standard, and they’d be content with software that doesn’t support the kinds of simple interoperability that Open XML enables.


But for organizations that aren’t part of the “my word processor is better than your word processor” debate, Open XML’s unique approach to custom XML support is very important. It enables what I call vertical interoperability, and in our developer workshops this is usually what developers find most compelling about Open XML: the ability to integrate other types of systems and data with Open XML documents, while maintaining a clean, simple separation of presentation (Open XML markup) and data (custom schemas and instances thereof).


For those who may be interested in understanding why developers get excited about Open XML’s custom schema support, I’d like to offer a simple walk-through of what that means and how it works.


Custom schema support: what’s not new


Open XML, like many document formats, allows for extensible metadata. You can add metadata to help describe your document, and Word 2007 even has a nice feature where your custom metadata can appear in a custom “document information panel” above the document you’re editing. But metadata extensibility is nothing new, and it has a pretty narrow range of applicability because the elements you add are simply describing the document itself.


Open XML also allows for custom XML markup within the body of a document. I’ve covered this concept before, and it’s a handy way to allow users to tag their content for interoperability with other types of software such as a custom LOB system.


Custom markup isn’t a new concept: many formats allow for this, and Word 2003 offered support for custom markup that has been widely used in custom systems for manufacturing, legislation, and other applications. The specific way that Open XML has implemented custom markup, using attributes to encode semantics (much like microformats) and therefore allowing for any schema to be used is extremely flexible compared to other approaches, but the core concept of custom XML markup isn’t new.


What is new


The place where Open XML breaks new ground is in its support for custom XML parts within the OPC package. You can put any existing XML file inside an Open XML document, and that XML file can be exposed to the user or not, whatever you’d like. You don’t need to change anything in your XML file, you don’t need to change anything in your document’s markup, and you don’t need to intermingle your custom markup and the document’s markup in the same part or file (as is done in custom-markup scenarios).


Open XML lets you use the XML messages you use today — the same ones that are generated by your custom systems, the same ones being passed around between your web services — right inside documents, as they are, without changing a thing. You don’t have to write code to pull business data out of document markup, because the two never get mixed together.


That enables simple, powerful interoperability. And that capability is only available in a format like Open XML that uses a standardized, flexible packaging convention like OPC to allow for the addition of any type of content to a document in a way that doesn’t interfere with the document architecture itself. (These concepts aren’t limited to XML data, as I’ve described before, but for this discussion we’re going to focus on custom XML parts.)


How it works


Let’s take a look at how custom XML parts work, from the ground up. I’ll download a typical XML instance off a public page on the web, put it in an Open XML document, and show you some of the things that this architecture enables.


First, we need some XML data to play with. I did a search for “XML instance sample” and found this page: http://exchangenetwork.net/schema/WQX/1/WQX_XMLExample_v1.0.xml


It’s a sample XML document used by the Exchange Network, an organization that helps states, tribes, and the U.S. Environmental Protection Agency exchange environmental information. Sounds good, let’s use that for our sample data.


One of the key concepts of Open XML’s custom schema support is that you can use any schema, and any existing XML instance — so the source of this XML, and the way it’s structured, don’t matter at all. What I’m about to show you applies to any well-formed XML file, with no restrictions. I picked this sample because it’s bigger and more complicated than most of the samples I found, and therefore a better example of a real-world XML document. Here’s a glimpse of what it contains:



So let’s save that XML in a file named item1.xml, which we’re going to embed in an Open XML document as a custom XML part and then bind it to some content controls.


Adding a custom XML part to a document


First we need to put the data inside an Open XML document as a custom XML part. This will require a bit of editing, which I’ll do with Notepad and the ZIP support built into Windows Vista. If you’re running a different environment, you can use any text editor and any ZIP program to follow along.


Here are the steps:


  1. Create an Open XML word-processing document. I did this by typing some text into Word 2007, and saving in the default format, as sample.docx.

  2. Rename this document to .ZIP, and put a copy of the XML data (item1.xml) in a folder named customXml inside the ZIP package.

  3. Add the following line to the relationships part (document.xml.rels) in the word\_rels folder: <Relationship Id=”myCustomXmlRelationship” Type=”http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml” Target=”../customXml/item1.xml” />

  4. Rename the document back to .DOCX.

We now have a document with an embedded custom XML part. I left out a couple of things, but that’s because I’m lazy and I want to keep this example simple. So I’ll let Word tidy up the things I left out.


Open that document (sample.docx) in Word 2007, type a change, and save it. The custom XML part is in the document, but you don’t see it in the Word user interface because we’ve not yet chosen to expose any of the business data. But Word follows the rules of the Open XML specification whenever it saves a document, so it has now cleaned up a few things for me:


  • It created a “custom XML properties part” for my custom XML item, named itemProps1.xml in the customXml folder.

  • It added a schema reference to that custom XML properties part, showing the schema for my piece of business data.

  • It assigned a GUID to my custom XML part, which is also stored in the custom XML properties part. This is very important, because the GUID will be used to uniquely identify my custom XML part when we bind presentation elements to nodes in the business data.

  • Word also likes to enumerate relationships as rId1, rId2, and so on, so it renamed myCustomXmlRelationship to rId1 or something like that. Relationship IDs don’t matter here, since this is an implicit relationship — keep in mind, there’s still not anything in the document body that refers to this custom XML part. So far, it’s just along for the ride.

Here’s the contents of itemProps1.xml in the sample document I’ve created:

<?xml version=”1.0″ encoding=”UTF-8″ standalone=”no”?>
<ds:datastoreItem ds:itemID=”{548E2E51-45D3-44E4-8845-1542F6638E2D}”
xmlns:ds=”http://schemas.openxmlformats.org/officeDocument/2006/customXml”>
    <ds:schemaRefs>
        <ds:schemaRef ds:uri=”http://www.exchangenetwork.net/schema/wqx/1″/>
    </ds:schemaRefs>
</ds:datastoreItem>


Our custom XML part is now embedded in the document, and all the necessary details are in place to assure that this part will remain intact through editing sessions. That’s useful for many scenarios — for example, Mindjet uses this technique to embed their own proprietary schemas in an Open XML document to enable round-trip interoperability with Word. An Open XML consumer (Word, for example) can just ignore these custom XML parts, and there’s nothing in the body of the document that refers to them, so they have no impact on rendering or presentation logic.


Adding structured document tags and data binding


Next, let’s have some fun with exposing that business data through the document itself. This is where it really gets cool: we can expose nodes of our business data, using 2-way binding that lets the user edit the business data through the document itself.


The next step is to add some structured document tags. They’re on the Developer tab (which you have to enable under Word Options if you don’t have it on the ribbon, since it’s disabled by default). Here I’ve created a table, entered some text in the left column (describing a few XML nodes I selected at random from our sample data), and then inserted a simple text content control (i.e., structured document tag) in the right column on every row:



The next step is to save that document and add the data-binding element that connects each structured document tag to a node in the custom XML part. If you’re into those details, you can take a look at the markup in the attached document — the element is called dataBinding, and it’s in the sdt properties section. Here, for example, is the element I inserted to bind the first content control to a node in the custom XML part:

<w:dataBinding w:prefixMappings=”xmlns:ns0=’http://www.exchangenetwork.net/schema/wqx/1′” w:xpath=”/ns0:WQX[1]/ns0:Organization[1]/ns0:OrganizationDescription[1]/ns0:OrganizationFormalName” w:storeItemID=”{548E2E51-45D3-44E4-8845-1542F6638E2D}” />


Note that this element specifies three things:


  1. The schema associated with my custom XML part.

  2. The XPath expression that points to the node I’m binding to in my custom XML part.

  3. The GUID that identifies the custom XML part. (Note that there could be many custom XML parts in a single document if desired.)

Typically, you’d write these dataBinding elements into the document from the code in your custom system. (In this sample, I added them with Notepad but that’s a bit tedious.) Another good option for adding these bindings is to use the Content Control Toolkit, which is available as a free download on Codeplex.



So now we have a fully data-bound document, with 2-way binding between presentation elements in the document (the content controls) and nodes in my custom XML part. This document provides a “front end” to the exposed business data, and when a user changes a value in a content control the corresponding node in the business data is updated. Conversely, developers can easily replace the custom XML part (with any environment that supports ZIP packages — no Open XML markup involved) and thereby re-populate the document with new data.


The user can modify the look and feel of the document all they want, independent of the data-binding architecture, because this document has true separation of presentation and data: there is no Open XML markup in the custom XML part, and no non-Open XML markup in the document body. And you can do all of the above with any XML file, from any source, using any schema. (Or no schema at all, actually — the schema is optional, and not necessary to make any of this work.)



Conclusions


That’s a quick look at the power of custom XML parts in Open XML documents.  I used a sample I found on the web, just as I used another sample I found on the web when I covered custom XML parts in a previous post.  I’ve done this to demonstrate that Open XML works with any schema, including the ones you’re already using, and the ones you’ll be creating in the future as your business needs change.


As developers start to understand the creative possibilities of this approach, they’re starting to use custom XML parts to enable interoperability that would have seemed impossible just a few years ago. Mindjet’s interoperability with Microsoft Word is a great example: it allows lossless round-trip collaboration between their 2D graphics application and a word-processing program.  And when I was in Slovenia a few weeks ago, I met two developers who are embedding custom XML parts in Open XML documents to provide a simple reporting mechanism for a client who wants customizable reports. The users can customize the reports in any way they’d like, and the data — in a custom XML part generated by a custom application — appears in their new format the next time they run the report. This is another good example of the clean separation of presentation and data that Open XML (and only Open XML) allows.


If you’d like to learn more, at an event hosted by a software company that has directly benefitted from their decision to embrace custom schema support, sign up for the upcoming Open XML workshop at Mindjet in San Francisco. Here’s the registration link, and there are still a few spots open.

sampledocs.zip

Comments (13)

  1. Doug had a great post last week discussing the importance of custom defined schemas. Check it out: http://blogs.msdn.com/dmahugh/archive/2007/05/19/custom-schemas-revisited.asp

  2. Wu MingShi says:

    How is the example different from embedded XForm?

  3. Doug Mahugh says:

    I’ve not worked with XForms myself, but it’s my understanding that the XForms output control is analagous to structured document tags.

    XForms and custom XML parts are solutions to different business problems.  XForms are about putting forms in web pages (or other XML documents), and custom XML parts are about putting arbitrary XML instances inside documents.  Those custom XML parts may be treated as form data or they may be used in other ways, depending on the needs of the user and the application.

  4. Doug’s right, this goes far beyond basic forms. It’s about structuring your document semantically. You could of course just add some basic structures which would make it look more like a form, or you could structure the entire document.

    -Brian

  5. Wu MingShi says:

    Dear Mahugh, Jones,

    That was a bait question. 😉

    The answer, as I see can tell, is about whether some form of processing is needed to extract the data. With XForm, it looks like some form of processing will be needed to convert the XForm data into custom schema, while according to your description, one can get the custom schema directly.

  6. 🙂

    Yes, that’s correct. The custom schema is stored as a seperate part, so you don’t need to do any additional processing. You can just program directly against that part, as it will contain all of your data.

    -Brian

  7. Doug Johnson says:

    I’m looking to combine pretty much everything I’ve been learning here.

    I have the situation of needing to combine a data extract from within our application with a Word document that is customizable by our users.

    Data binding looked like the answer, but I find that while the compatibility tool allows users of Word 2003 and earlier to open and/or save OpenXML docs, it will not do the data binding.  And, while I can mandate one copy of Word 2007 to create the original documents, I cannot force them to upgrade ALL of their Word installs, and that is what will be called after creating the updated document.

    Databinding in Word 2000-2003 on opening an OpenXML document would be a preferred solution, but couldn’t find a way to do that.

    Unless you have a better idea, my current thought is to include the schema (can be different at each location) in the base document that is used and let the user do in document markup with that schema.

    Is there a way to store the schema in the document and reference it so it will show up in the developers side panel without making the user go through that step?  I am thinking something to the effect of <w:attachedshema="http://mystuff.com/MyStuff"/&gt; then referencing the internal xsd, but I am not seeing how that would be done.

  8. Here’s the official TechEd 2007 site , and here’s the link to Virtual Tech Ed . Bob Muglia kicks things

  9. Doug Mahugh says:

    Andy Updegrove’s "Meanwhile, Back in Minnesota: Your Chance to Help" provides information about how to

  10. Doug Mahugh says:

    Like many people, I thought we’d know the official outcome of the DIS 29500 process today, but it looks

  11. cnblogs.com says:

    Open XML Resources for Developers Published 31 March 08 03:20 PM Like many people, I thought we&#39;d

  12. 247Blogging says:

    I’ve seen some signs of confusion about custom schema support lately. For example, I’ve seen a vendor claim that Open XML’s support for custom schemas is &quot;essentially inherent in XML itself&quot; and that &quot;there is nothing that OOXML supports