Integrating with business data: Store custom XML in the Office XML formats

As I've already talked about a number of times on this blog, there are two core pieces to the XML support in Office. The first piece is what we call our "reference schemas." The reference schemas are the schemas we've defined in Office to represent our files as XML. This includes WordprocessingML; SpreadsheetML; and PresentationML. The reference schemas are extremely helpful in consuming and generating Office documents, but they are only one piece to the XML story. The other piece is the support for custom defined schemas. It's that support that allows you truly integrate your documents with business processes and business data. Most organizations don’t really want another group to define their business data for them. That's why we took the XML 1.0 standard in combination with the XSD standard and built in native support for both. You can define your data using XML Schema syntax, and then you can use that data in your Office documents. By opening up our formats with our reference schemas, and supporting your custom defined schemas, you get true interoperability of your documents. Sorry if this is currently sounding more like a marketing pitch, but I wanted to make sure I reiterated our vision for XML support in Office documents and hopefully that will help you see the power that we see.

In Office 2003, Word and Excel both introduced support for marking up content in the files with custom defined schema, but one of the big things we saw from folks building solutions on top of our XML support in Office 2003 was the need to store your own XML data in the document. We had support for marking up a document with your own schema, but if you had data you didn't want to show to the user, there weren't a lot of options. An example use of this would be workflow scenarios where you have a ton of information you're tracking about a document to determine how to route it. Some of that data might appear in the document itself, but a lot of it is just extra meta-data.

XML Data Store

In Office 12, we've introduced a new feature to the formats that we're currently calling the XML data store, and the way it works is really simple. As you should all know by now, the new format consists of a ZIP file with a bunch of XML parts (files) inside. Up until now we've talked about all the parts that we in Office have defined to create our documents. You as a developer also have the ability to add your own parts though. You can take any XML file and put it inside the ZIP package. Then all you need to do is create a relationship from the main document part to your XML part, and the Office applications will roundtrip your XML with the file, which means:

Roundtripping your data: The ability to put your XML in the ZIP package means that you now have a place to store any data your solution may need. The data will travel with the document, but will always be stored as a separate XML part in the ZIP package. This means it's really easy to get to and modify without dealing with any of the application's data.

Accessing your data while the file is loaded: Whenever we load these files, we grab all the XML parts in the datastore and load them into memory. We then give you programmatic access to this data so you can read and write to the data while the user is editing the file. There is a full eventing model around this data as well so that if other processes make changes to the data you are notified. This really gives you a lot of power when you are building a solution because you can have a place where you can store all your information as XML, and you have full access to it both while the file is loaded, as well as when the file is saved to disk (just crack open the ZIP and go grab your XML part).

Separating data from the document: As well, because the information is stored in the data store, you benefit from the fact that the user cannot directly edit your data by editing the document (they can’t accidentally delete part of your data, since it’s stored separately.

There are a number of really cool features we've built on top of this functionality that I'll talk about in future posts.

-Brian