Guiding principles for Office’s ODF implementation

This blog post covers the main presentation from our ODF workshop that took place in Redmond last week: Peter Amstein’s explanation of the guiding principles behind our support of ODF in Office 2007 SP2. I’ve added explanations of some of the details that were covered verbally in the workshop, but if anything’s not clear here, please let me know.

Why ODF 1.1?

We’re implementing ODF 1.1 in our initial release of ODF support. We chose this version because it is the most current approved ODF specification, and because it is the version of ODF that current release versions of most other applications such as OpenOffice also support. We will support ODF in Word, Excel and PowerPoint, using the file extensions .odt, .ods, and .odp. The exact release date for Office 2007 SP2 has not been announced yet, but we expect ODF support to be available sometime in the first half of 2009.

Guiding Principles

As we set out to build in support for ODF, we developed a set of principles to guide our implementation team. Those principles are:

  • Adhere to the ODF 1.1 Standard
  • Be Predictable
  • Preserve User Intent
  • Preserve Editability
  • Preserve Visual Fidelity

Let’s take a look at each of these principles in more detail and with some examples.

Adhere to the ODF 1.1 Standard

Where the specification is clear and mapping between OOXML features and ODF features is straightforward, this is of course no problem. For example, OOXML’s italics property maps neatly to ODF’s italics property.

When we found the specification to be ambiguous, we decided to follow common practice as long as it adheres to the standard. We did not create extensions in the case of features supported by Office and OOXML that are not in ODF at all. For example, ODF doesn’t support the concept of multi-stop gradient fill for shapes, but Office supports this concept. So we chose not to write multi-stop gradient values when saving to ODF.

Extending the ODF spec might have been a pragmatic approach to addressing gaps in the spec in the short term. But we felt that it would not be good for the ODF ecosystem in the long term since other applications wouldn’t be able to read those extensions (unless those products also implemented the same extensions we do) – and we don’t see that approach as promoting interoperability or the best experience for ODF users. We also don’t want to be accused of “co-opting” ODF and “polluting” the cyberspace with many ODF files that don’t adhere to the standard. We think it is better to evolve ODF with the community in the OASIS Technical Committee and/or the appropriate SC34 Working Group.

On the flip side, Office does not have support for Gantt charts, but ODF does allow them. When we load an ODF file that contains a Gantt chart we leave the chart area blank rather than try to map it to some other type of chart. But we preserve the chart data so that the user can pick another chart type from the Excel UI if desired.

Be Predictable

The principle here is that we want to do what an informed user would likely expect.

Where ODF is a superset of OOXML, we can either ignore the ODF-only constructs, or map them to an OOXML construct where there is a logical way to do so.

When OOXML is a superset of ODF, we usually map the OOXML-only constructs to a default ODF value. For example, ODF does not support OOXML’s doubleWave border style, so when we save as ODF we map that style to the default border style.

Preserve the user’s intent

In simple cases, it isn’t a problem for Word to preserve document structure and semantics when saving an ODF file. For example, a document heading can be saved with a heading style that has an associated outline level.

In more complex cases we preferred a neutral approach when saving to ODF rather than implying semantics that the user did not intend. For example, in Word one can color code the bullets in a bulleted list by applying a color attribute to the paragraph character for the list item. Word can persist that attribute when saving to OOXML, but ODF does not have the concept of paragraph characters with attributes.

If we were to apply the color attribute to the paragraph style that would cause the entire list item to take on the color, and this might imply more than the user meant. So we choose to drop the bullet color, rather than color the whole list item.

Preserve Editability

We want to preserve the user’s ability to edit the contents of their document even if they have used a feature that can’t be saved to ODF, so that what the user sees in the document and how the user interacts with the document will not be changed until the user saves and closes the file.

For example if you insert a table in a PowerPoint slide and save as ODF, you still have a table in your open presentation with all of the normal table editing behaviors – you can easily add a row or insert a column, for example. The table becomes a group of shapes only after the user closes and reopens the file. Or as another example, you can open an ODS spreadsheet with Excel and use the conditional formatting features to analyze trends in the data. But the conditional formatting will not be preserved when you save and close the file.

Preserve Visual Fidelity

Wherever possible we write the ODF in such a way as to preserve visual fidelity when the document is opened in another application.

Chart gap width (e.g., the space between bars in a bar chart) is a good example. If the gap width of a chart is not specified in the file, OpenOffice applies different defaults than Microsoft Office and will render Chart gap widths differently. So in this case, Office will write our Chart gap width even when the gap width is the default value—i.e. when we traditionally wouldn’t write it.

High Level Architectural View

Word, Excel and PowerPoint have a Model-View-Controller design. The in-memory representation of the document, or Model, is designed to facilitate document revision and display functions and includes concepts which are never saved to the file, such as the insertion point and the selection.

The persistence code converts this in-memory representation to and from some sort of the disk file based representation. Office 2007 already had code to support a number of angle-brackety persistence formats including HTML and OOXML. When we built in support for ODF, we added it in that area of our code.

That’s a general overview of how we’ve approached ODF support in Office 2007 SP2. These topics were also the foundation of the roundtable discussions we had at the workshop; for a variety of perspectives on those discussions, see the blog posts by Dennis Hamilton, John Head, and Jesper Lund Stocholm.