Comments from Tim Bray on OpenDocument

I'm sorry to post about this subject with all the other stuff I've been promosing to cover but I just read this and wanted to share my thoughts. I was reading Tim Bray's post the other day about the OpenOffice conference (https://tbray.org/ongoing/When/200x/2005/10/01/Open-Office-Conference). He brought up a point that I've been asked about seperately a few times related to the lack of formula support in the OpenDocument standard:

Bad Formula Trouble I learned, to my dismay, that the ODF specification is silent on spreadsheet formulas, they’re just strings. This is obviously a problem; much discussion on what to do ensued. I lean to the idea, much bally-hooed by Novell, of simply figuring out what Excel does, writing that down, and building it into ODF v.Next. Mind you, anyone who’s really been to the mat with Excel, in terms of Math & Macros, knows that it isn’t a pretty picture, there are real coherency problems. But it’s good enough and the world has learned how to make it work.

There's also this article on NewsForge titled OpenDocument office suites lack formula compatibility where the following was stated:

The OASIS Technical Committee in charge of this standard explicitly said last January that "while ... interoperability on that level would be of great benefit to users, we do not believe that this is in the scope of the current specification. Especially since it is not specifically related to the actual XML format the specification describes."

Even outside the Committee there is the same opinion: OpenDocument must only be about structure and how to represent content.

Someone asked me in one of my previous posts what my thoughts were on this article, and here's a bit of what I said:

From reading the article, it sounds like the thought was that they would standardize around the presentation aspect of the formats only. It's a bit unfortunate since the result of a formula does affect the ultimate display. In fact, formula results are often the most important part of the spreadsheet.

Did the original StarOffice format have formulas defined in their schema? Did they decide only push some of the schema through OASIS?

If this is an area folks are interested in, let me know. I can post some examples of Excel's schema for formulas...

As I've said before, it appears that there are a number of very similar goals between the Office XML formats and what Sun did with the StarOffice format. In Office, we have the additional responsibility of supporting everyone's existing documents, which means there are a huge number of features we need to support (all of them). The issue here around equations happens to be just one example of one of those types of features that is really important. I'm sure the StarOffice format did have this support, it just didn't make it into the OpenDocument spec.

This is one of those cases where it's important to understand the nuances of someone's design. It appears that as they moved the StarOffice format through OASIS to create the OpenDocument format one of the primarily goals was around display of content. For whatever reasons (time, effort, design goal, etc.), they made the decision that some application information (like formulas, or customer schemas) was not something they wanted to work into the standard. This is an example of where our minimum requirements have to be different.

Presentation-centered formats

In it's current state, the OpenDocument format appears to be focused primarily on presentation of information. I think in that way it has somewhat similar goals to those of PDF. The sections of the StarOffice file format that they decided to take through OASIS were the ones that affected display of the files. You could argue that formulas affect the display, but as long as you make sure all the formulas are calculated before you save into their format you are fine.

Application interoperability

I'm curious what applications that are going to use OpenDocument as their primary format have decided to do about these missing features like formula support. I know a number of them have support for spreadsheets. Formulas are such a key part of a spreadsheet I'm assuming they have to create their own extensions to the format to support this. I'd be curious to know how the applications standardizing around OpenDocument (KOffice and OpenOffice for example) are planning to exchange spreadsheets. I would assume they will take some approach (like transformation) to ensure the interoperability they are going after. That's one of the great benefits of an XML format; as long as it's well documented you can take advantage of it. We've had people get pretty upset at us though when we've had to extend an existing format when there is additional functionality we want to store that the format doesn't support (it's referred to as "embrace and extend"). I think in this case there isn't really a choice. You can't have a spreadsheet without formula support.

I'm sure that the long term goals of OpenDocument do include full roundtripping of all user data and if that's the case I'm sure they are going to work on a proposal for missing pieces of the spec like formulas at some point. Once they do decide on a way to add formula support to OpenDocument, then they'll also need to go back to all the files that get created under the current standard and update them from the proprietary extensions to match the decided upon standard.

Full fidelity formats

I've talk before about how full fidelity formats are really important to us because we want to ensure that all features you want to use can be fully represented. Formulas are an extremely important part of any spreadsheet. In fact one could argue that formula support is the primary reason for using an application like Excel. The Microsoft Office Open XML formats are specifically designed as an XML representation of our full file formats. Everything you can do in our default format is represented as XML. Our formats are primarily designed around viewing, editing and integrating the files with data, formulas, and other application behavior. Collaboration is extremely important to us as well, and it would really be lame if you couldn't collaborate on every aspect of your files (only a subset). This is another example of why we had no choice but to create our own XML file formats if we really wanted to move to XML formats as the default. Otherwise we would have been stuck with something that didn't fully persist all of our users’ features. The key is that we fully document that XML and provide the schemas to anyone that wants to use them. This way as we continue to innovate based on customer needs and demands, we can also incorporate that functionality in the file format and expose it to anyone that would want to leverage it.

-Brian