Tracked Changes

When I blogged about the release of SP2 with ODF support two weeks ago, I mentioned that I was planning to blog about a few of the tough decisions we faced in our SP2 implementation of ODF, such as the decision not to support tracked changes.  I’ve spent some time since then covering our approach to formulas in ODF, and now I’d like to move on to answering the question of why we aren’t supporting ODF tracked changes.

For those who just want the summary, here’s a high-level recap of what I’ll cover in more detail below:

  • Tracked changes is a very complex aspect of document format functionality; for example, the ECMA-376 specification devotes over 100 pages to describing tracked changes
  • Microsoft Word has a long history of supporting tracked changes, and this functionality is used by a large number of Word users
  • Due to its role in collaborative processes, tracked changes is often used for documents with legal, financial or technical implications that are reviewed and edited by multiple people; in such scenarios, accuracy and reliability are critical
  • ODF 1.1 has a very limited description of tracked changes, covered in only 4 pages of the specification.  ODF 1.1 does not does explain how to implement change tracking for many of Word’s commonly used features, and in some cases it is not even clear if the ODF mechanism makes it possible at all.
  • As a result of these differences, we found that it is not possible to implement robust and reliable tracked changes with ODF; even very simple concepts, such as deleting a row from a table, are not supported by any existing ODF implementation of tracked changes
  • There is almost no interoperability among the various non-Microsoft implementations of ODF when it comes to tracked changes.
  • To protect our customers from losing data when using tracked changes, and to avoid making an interoperability promise that would turn out to be hollow, we made the difficult decision to not support tracked changes at all in ODF

The rest of this post will cover the details of the points summarized above.  This is a long post, and it gets a little technical in places, because change tracking is inherently a complex topic.

State of Tracked Changes Interoperability

SP2 is a new implementation of ODF, but there are many existing implementations of ODF that are already in wide use.  I’ve done an informal review of them to try to understand existing practices around the use of tracked changes in ODF documents.

Here’s what I’ve found:

If anyone knows of additional information on these implementations, or any other ODF implementation that supports tracked changes, especially if you know of one which is not derived from the OpenOffice.org source code, please let me know and I’ll update that list.

To test interoperability between current ODF implementations of tracked changes, I created a simple document with some tracked changes, saved it in ODF, and then looked at what happened when I opened that document in other ODF implementations.

So the first step is to create a test document.  Using Symphony 1.2, I followed these steps:

  • Click on “Create a new Document”
  • Insert a table (Create/Table), and put some text in each cell to identify the rows
  • Add a paragraph of text, below the table, containing two sentences
  • Add a numbered list of four items, below the paragraph

The starting point for my document looks like this:

image

Then I added some change-tracking, as follows:

  • Turn on change tracking (Edit/Revisions/Record)
  • Delete the second row from the table (right-click, Row/Delete)
  • Highlight the last sentence of the paragraph and the first two items of the numbered list, up through the (DELETE) on the second item, and delete that region

My document now looks like this in Symphony:

image

One things you’ll notice here is that the row I deleted from the table is simply gone, with no change tracking recorded.  This is due to an inherent limitation in ODF’s approach to change tracking, which does not allow table changes to be tracked in a standardized manner.

More on that later, but first let’s see what happens when I save this document as ODF 1.1.  After I click Save, here’s what I  see:

image

Take a close look at the numbering of the list items, and you’ll see that the second list item has no numbering any longer.  Very strange.  And if I reject all changes in the document, the numbering of that item doesn’t come back – it disappeared somehow, the instant I saved my document as ODF 1.1.

I suppose some people might be tempted to suggest that I should use the latest OpenOffice.org release for this test, which came out a couple weeks ago.  I tried that, and I get similar – but not identical – strange behavior by following the steps above.

Speaking of OpenOffice.org 3.1, let’s open this saved document in that implementation of ODF.  When I do, here’s what I see:

image

At first glance, it looks like all of the changes were accepted.  But in fact, the changes are still in the document, and you must go into Edit/Changes/Show to make the tracked changes appear.

In Google Docs, we see essentially the same thing that OpenOffice.org displayed by default:

image

Google Docs automatically accepts tracked changes in ODF documents, and then uses its own entirely different approach for managing change tracking.  Google Docs uses a Revision History feature to track changes to documents; for example, here’s what I see when I click on Tools, Revision History when viewing this document in Google Docs:

image

It appears that Google Docs is pretty committed to this approach to change tracking, based on this recent exchange on the Google Docs Help Center site:

Jcuesta: We need Track Changes. When?

Gill (Google Docs Guru): Who knows? Given that we already have Revisions, quite possibly never.

Moving on to another ODF 1.1 implementation, AbiWord 2.6.8 (which does not support tracked changes), here’s how my test document appears:

image

AbiWord doesn’t support tracked changes, so I would have expected to either see the document with no changes at all, or with all changes accepted.  Instead, I see what appears to be a random re-arrangement of the document content.  On closer inspection, I think this is due to ODF’s approach to handling deletions, which requires that deleted content be stored at a location separate from where it was deleted.  I’ll explain that in more detail below.

So far, we have two applications that seem to agree on how to display this document (OpenOffice.org 3.1 and Google Docs), and two others that each have a different way of displaying the document.  Sounds messy, but it gets even worse if you start varying which application creates the document in the first place.

For example, I followed the same steps outlined above, but started from OpenOffice.org 3.1 instead of Symphony 1.2.  Here’s the result:

image

But if I load this OO.o-created document in Google Docs, I see something quite different from what I saw when I loaded the Symphony-created document in Google Docs.  Instead of all tracked changes being accepted, and the deleted text gone, now I see all tracked changes being ignored, and the deleted text (except for the deleted table row) is present, although the list numbering skips over the second item:

image 

So we’ve seen that none of these implementations track changes to tables, and the behavior when loading tracked-changes documents into applications other than OpenOffice.org or Symphony varies between several possibilities, including accepting changes, ignoring changes, and restoring deleted content to a different position in the document.  Furthermore, this is only a simple test that includes nothing but deletions.  If you start combining deletions and insertions in the ways that people typically do while collaborating on documents, you’ll find even more surprising behavior when those documents are opened in applications other than the one that created  them.  This is the state of ODF tracked-changes interoperability today.

The Cause of the Problem

The problems above are not just caused by bugs in these implementations.  Rather, they are the result of inadequate specification of change-tracking functionality in ODF 1.1, combined with a peculiar design decision in ODF’s approach to tracking deletions.

To get a feel for how thoroughly ODF specifies change tracking, it’s instructive to compare the size of the relevant sections of the ODF 1.1 and ECMA-376 specifications.  ECMA-376, which supports 100% of the change-tracking functionality that Word uses, devotes 121 pages to change tracking in Part 4, Section 2.13.5.  ODF 1.1, by comparison, has only 4 pages devoted to change tracking in section 4.6 of ODF 1.1.

There are many areas where we found that ODF 1.1’s approach to tracked changes couldn’t provide the functionality and reliability that our customers have come to expect.

Where to put deleted content?

When you delete content with tracked changes on, the content remains in the document, marked as deleted by a particular user on a particular date/time.  But where in the document?  The answer is different for Open XML and ODF.

Let’s look at a simple example, and see how the two formats handle the deleted text.  Here’s the example we’ll use, a single sentence with a word deleted from it:

image

First let’s look at how Open XML handles this deletion.  Here’s the ECMA-376 markup that Word 2007 writes out for this sentence:

image

You can see that the deleted text is inline, right where it was before it was deleted, surrounded by a delText tag.

Now let’s look at the ODF markup that OpenOffice.org 3.1 writes for this deletion:

image

In this case, the deleted word does not appear inline.  Rather, there is a text:change element inline, with an ID of ct205721376.   Within the text:tracked-changes element (which occurs earlier in the body of the document), you can see where ID ct205721376 is defined as being a deletion by Doug Mahugh, containing the word deletion inside a text:p element.

There are two problems with this approach: one problem for implementations that don’t support tracked changes, and one problem for implementations that do support tracked changes.

To see the problem for implementations that don’t support tracked changes, refer above to the AbiWord screen shot.  AbiWord doesn’t know about tracked changes, but it does know about paragraphs (text:p elements), so it displays every paragraph it finds in the document, in the order that it finds them.  Since the deleted “paragraphs” appear first in the markup, they appear first in the displayed document.

I put paragraphs in quotes there for a reason: in the simple example we’re looking at here, I did not delete a paragraph, I deleted a word from inside a paragraph.  So why is the deleted text wrapped inside a paragraph element?

The answer is that the ODF spec requires deleted content (as contained in a text:deletion element) to be schema-compliant, regardless of whether the deleted region was a well-formed element or (as in this case) merely a fragment within some other structure, such as a word within a paragraph.

This is the source of the problem I alluded to above, for implementers who choose to support ODF tracked changes.  Each implementer must decide how to synthesize markup to make each piece of deleted content into well-formed XML, and then later – when it comes time to accept or reject the change – each implementer must make decisions about how to distinguish between the synthesized packaging and the deleted content itself.

Unfortunately, the ODF specification doesn’t provide much guidance on this complex topic.  Here’s the guidance provided in ODF 1.1 (Section 4.6.4 Deletion):

To reconstruct the text before the deletion took place, do:

  • If the change mark is inside a paragraph, insert the text content of the <text:deletion> element as if the beginning <text:p> and final </text:p> tags were missing.
  • If the change mark is inside a header, proceed as above, except adapt the end tags to match their new counterparts.
  • Otherwise, simply copy the text content of the <text:deletion> element in place of the change mark.

This guidance works for very simple cases, but does not allow for complex situations such as deleting part of a table, as described below.  A specific implementer may come up with an approach that works within their application, but since the spec doesn’t say how to synthesize the markup for the shim, what shows up as a deletion in one application might show up as a different deletion, or not deleted at all, in a different application.

The approach used by ECMA-376, as shown in the example above, keeps the delete text inline where it was deleted, thus eliminating all of these issues.  There is no extra synthesized markup added when a deletion is saved, and therefore implementers don’t need to make decisions about how or whether to remove that markup when it comes time to accept or reject the changes.

Changes to Tables

The ODF 1.1 specifiation says (in section 8.11) that “Change tracking of tables is not supported for text documents.”

And indeed, no existing ODF implementation that I’m aware of attempts to track changes to tables, such as adding or deleting rows or cells, modifying table properties or grid layout, and so on.  Looking at Section 4.6, it’s easy to see why this is so: there is no information provided about how to track table changes, and it’s not at all obvious how one would do so within the current mechanism.

Deleted sections of tables would be especially problematic in ODF, because of the need to create a shim to make the relocated deleted content schema-valid.  The ODF spec provides some guidance on how to revert deleted paragraph content (as quoted above), but for tables, there is no such guidance.

So if a row of a table is deleted, what should an implementer do?  Store in <text:tracked-changes> a table with one row inside the deleted-content section?  And how would another implementation know whether that indicates a deleted row of a table, or a deleted one-row table?

In the ECMA-376 specification, on the other hand, there are defined mechanisms for tracking changes to tables.  As one example, consider the simple act of deleting an row from a table while change-tracking is turned on.  In ODF, that row is simply gone, and reverting your tracked changes later will not recover the row.  But in Open XML, the <del> element can be applied to a table row, and as stated in Section 2.13.15.4, “This element specifies that the parent table row shall be treated as a deleted row whose deletion has been tracked as a revision. This setting shall not imply any revision state about the table cells in this row or their contents (which must be revision marked independently), and shall only affect the table row itself.“

Format Changes

Tracking changes also entails tracking changes to document formatting properties.

ECMA-376 has many elements dedicated to tracking formatting changes, including pPrChange, rPrChange, sectPrChange, tblPrChange, tblPrExChange, tcPrchange, and trPrChange.  These elements are described over 17 pages (pages 1015-1032 of Part 4).

ODF 1.1, on the other hand, has a single format-change element, which is documented as follows in Section 4.6.5, Format Change:

A format change element represents any change in formatting attributes. The region where the change took place is marked by a change start and a change end element.

Note: A format change element does not contain the actual changes that took place.

Much was made during the IS29500 standards process of the difference in the size of the ODF and Open XML specifications.  This is a good example of where that difference comes from: in this case, a concept glossed over in three vague sentences of the ODF spec gets 17 pages of documentation in the Open XML spec.

Summary

This has been a long blog post, but I wanted to make sure that people understand why we made the difficult decision to not support tracked changes in our Office 2007 SP2 implementation of ODF.

When you load an ODF document containing tracked changes into Word 2007 SP2, all existing changes will be accepted, and you will not be able to save any further tracked changes in the document unless you save as DOCX.  This is an inconvenience, but a necessary one to protect users from unexpected surprises in the various scenarios outlined above.  Keep in mind that you can still use Word’s document compare feature to compare a previous version of an ODT file to a newer version, in order to see what changed.

Finally, there are a few questions that I anticipate some people may ask, so I’d like to address those here …

Couldn’t you have at least supported tracked changes for simple cases, as OpenOffice.org does?

Change tracking that handles “some” or even "most” of the changes a user makes would be extremely risky to use, because the user may be surprised to discover later that certain types of changes were not being tracked.  We’ve learned through clear feedback we get from our customers that a feature which works “most of the time” can be worse than no feature at all.  Users count on accurate, reliable change tracking for managing updates to their critical business documents.

We really wanted to make change tracking work for our ODF implementation in Office 2007 SP2. I’ve spoken to some of the developers on the Word team, who wrote a lot of code for this and really tried to solve the problems. But ultimately our test team pointed out that the feature was just not “ship quality” and there was no good way to make it better without extending ODF - which our first principle of Adhere to the ODF 1.1 standard told us not to do.

Will change tracking be improved in ODF 1.2?

Unfortunately, it doesn’t look like it.  The current draft of ODF 1.2 contains no additions to Section 4.6 of ODF 1.1 (which is Section 4.5 in ODF 1.2 due to renumbering).  The only change is that the examples have been removed from the section.

Why didn’t Microsoft work to get this fixed in the ODF TC?

We joined the OASIS ODF TC last June, and we started slowly because some people have stated concerns about Microsoft having too much influence on ODF’s direction.  The first proposal we made was a very simple proposal to add two optional attributes to indicate maximum grid size for spreadsheet applications, which would have addressed a specific real-world interoperability problem we encountered with a major ODF implementation.  Other TC members argued against this proposal, and after several such exchanges we decided not to push the matter.

We then continued submitting proposed solutions to specific interoperability issues, and by the time proposals for ODF 1.2 were cut off in December, we had submitted 15 proposals for consideration.  The TC voted on what to include in version 1.2, and none of the proposals we had submitted made it into ODF 1.2.

We look forward  to contributing more to the ODF TC in the future, and we would welcome the opportunity to work with other TC members to improve ODF’s ability to handle tracked changes.