It looks like another group is planning on taking advantage of the Open XML formats that are coming in Office ’12’. Corel has stated that they will support the new XML formats in Wordperfect once we release Office ’12’. We’ve already seen other applications like OpenOffice and Apple’s TextEdit support the XML formats that we built in Office 2003. Now as we start providing the documentation around the new formats and move through Ecma we’ll see more and more people come on board and support these new formats. Here is a quote from Jason Larock of Corel talking about the formats they are looking to support in coming versions (http://labs.pcw.co.uk/2006/01/new_wordperfect_1.html):
Larock said no product could match Wordperfect’s support for a wide variety of formats and Corel would include OpenXML when Office 12 is released. “We work with Microsoft now and we will continue to work with Microsoft, which owns 90 percent of the market. We would basically cut ouirselves off if you didn’t support the format.”
But he admitted that X3 does not support the Open Document Format (ODF), which is being proposed as a rival standard, “because no customer that we are currently dealing with as asked us to do so.”
X3 does however allow the import and export of portable document format (pdf) files, something Microsoft has promised for Office 12.
I mention this article because I wanted to again stress that even our competitors will now have clear documentation that allows them to read and write our formats. That isn’t really as big of a deal though as the fact that any solution provider can do this. It means that the documents can now be easily accessed 100 years from now, and start to play a more meaningful role in business processes.
Back in the summer I wrote a bit about why we’ve made this move to open formats, I think it’s worthwhile repeating some of that:
In Office 2003, we really started to gain a lot of momentum around XML. We had heard from a number of big customers that they needed XML support for their Word documents. People were trying all kinds of hacks on top of the Object Models to produce XML that they could work with. We had Wall Street firms with the need to integrate with XML more dramatically than we had imagined, so that they could do structured authoring with repurposable data. We had law firms that were trying to build solutions that could automatically generate legal documents based on data about who was involved in the case, as well as business logic around what pieces of content were required for that case. We also were getting a lot of demand for supporting other people’s existing internal schemas. Not only did people want the Word document itself represented in XML, they also wanted to add their own XML markup to the files. Let’s take a government office as an example here. Imagine they have a template that folks can use to submit to receive a permit. While it’s nice that the formatting information can be represented in XML, they don’t care as much about what’s bold, numbered, or any other kind of random formatting. What they do care about is the name of the person that submitted the permit; what their address is; and what type of work they are seeking a permit for. Those things can all be labeled using custom XML support.
It was this support for both reference schemas (SpreadsheetML and WordprocessingML) in combination with support for customer defined schemas (your own XML) that finally made it possible for the content of Office documents to play a role in business processes. We had moved from the world of the Office document being a black box that only had a small collection of meta-data scrawled on top; to being an open, interoperable, extensible, and extremely valuable piece of business processes.
At the same time, there are zillions of documents out there in older binary formats. We had to ask ourselves “who is going to take care to make sure those older document have a path forward?” “Who is focusing on doing the hard work to preserve fidelity between the new and the old?” We’re doing that. We’re making a deep investment in this compatibility to make sure our customers have a very good experience.
Now we move to Office “12”. We are still building on the momentum we started over 6 years ago. Not only are we improving the XML formats so that they can represent every Word, PowerPoint, and Excel document out there, but we are making it the default format. We viewed this as something that we absolutely had to do this version. Office documents are so much more important as elements of business processes than we had initially been giving them credit for. You may have seen how we now talk about Office as a system. This is because it’s no longer about the documents behavior in the application. It’s about the entire document lifecycle. We have helped ourselves in all kinds of ways that no one has really thought about (or at least written about) yet. We can build smarts into Windows Sharepoint Services so that the server can actually look into the document, make decisions based on the document content, write data back into the document, all without having to run application code. We have a world where customers need to track and audit parts of documents that they never needed to do before.
We have customers in equity research who can’t wait for these new formats and the custom XML support. The speed with which they will be able to publish their documents, while at the same time meeting the increasing regulation requirements is amazing. All the information within each research report is available to them. The system used to consist of printing out the report and having humans read through each one verifying the financial figures and making sure they had all the necessary disclosures. Now that can just be an easily automated piece of the larger workflow.
There is a customer (a bank) that we’ve been meeting with that generates documents on demand for all their loans. They are currently running Office 2000. These documents are built using smaller document fragments, and the logic for which fragments are used is based on the details of the particular loan. The data is then pushed into the document using the Word Object Model to find bookmarks and push the data into the relevant bookmarks. They do this in an automated fashion and turn out thousands of these documents a year. They currently have over 70 servers each with Word 2000 installed to turn these documents out in an automated fashion. Word isn’t supported running in an unattended fashion, but they’ve decided to do it anyway (they didn’t really have a choice). Now with the new XML formats and the support for custom defined schema, generating these documents will be a snap. It wouldn’t even take up one full machine’s resources. It will only need to consist of a small bit of code to handle the business logic. The code to build the document itself will only be a few lines.
The last example I have is one that benefits us in Office. Today, we have a couple thousand specifications that we’ve written for the Office “12” project. For each spec, there are a number of required sections that people need to fill out based on different processes we have for our design. The folks driving any of those processes need to be able to make sure that everyone has filled out the proper sections. When the files were all binary documents, we had to automate Word to be able to do this check. The automation had Word open the file, find the range of text for the specific section, and see if it was filled in. It would take about 8 hours to run the check across those few thousand documents. Because of this we only ran the check every couple of weeks, and it would have to kick off at night when folks were leaving and checked out in the morning. Often the check would fail, so we’d wait until the next night and run it again. At PDC the other week, I showed a similar collection of documents (actually it was only about 300). These documents were all stored in the new format though. I wrote a small about of VB.net (30 lines of code)that iterated over all those document and returned the author, counted all the paragraphs, and counted how many comments there were. To run that solution (which was already more complex than what we were trying to do internally) it took about 1 to 2 seconds. So, if I had increased the collection to 3000, it would have been at most 20 seconds (compared to 8 hours)!
We knew a long time ago that customers and the development community would ask what they could do with the new Office XML formats since they are specifically designed to address scenarios that go beyond the desktop. That is why we decided to take an open and royalty-free approach almost two years ago when we launched Office 2003. There has been a lot of back and forth in this blog on whether we went far enough and whether our motives are pure. It is sort of fun to question motives and pick apart licenses (personally, I’d rather be talking about the design of the formats), but I can tell you that our intent is to make the formats useful to customers and the development community. If we wanted to create a bunch of “gotchas” to trip people up, I think we could have done a better job.
A side benefit of this move is that now that we are creating a new format, we can do a lot of the other things our customers have wanted us to do within the binary formats for the past few releases (which we weren’t able to do since we didn’t want to break compatibility). Improved robustness; file size; and new features are all added side benefits. I already mentioned how Excel is now able to increase the limits on the number of rows and columns as well as other limitations they had when confined to the existing binary formats. We’ve also found that using ZIP and XML leads to a significantly more robust file. I’ve given demos where I delete whole blocks of bits from the files and we’re still able to recover the remainder of the content. We see so many benefits to this new format, we often forget to mention all the best parts.
We’ve been fortunate to get a lot of great support from the public sector for our work. We’ve been working for many years now with governments to understand their needs with XML and they understand what we’ve been doing and our commitment to being open.
Hope everyone has a great weekend. I’m going to the Seahawk game, and so hopefully I’ll have an extremely good weekend. One more Seahawk win and we’re in the SuperBowl!