Tracked Changes


When I blogged about the release of SP2 with ODF support two weeks ago, I mentioned that I was planning to blog about a few of the tough decisions we faced in our SP2 implementation of ODF, such as the decision not to support tracked changes.  I’ve spent some time since then covering our approach to formulas in ODF, and now I’d like to move on to answering the question of why we aren’t supporting ODF tracked changes.


For those who just want the summary, here’s a high-level recap of what I’ll cover in more detail below:



  • Tracked changes is a very complex aspect of document format functionality; for example, the ECMA-376 specification devotes over 100 pages to describing tracked changes

  • Microsoft Word has a long history of supporting tracked changes, and this functionality is used by a large number of Word users

  • Due to its role in collaborative processes, tracked changes is often used for documents with legal, financial or technical implications that are reviewed and edited by multiple people; in such scenarios, accuracy and reliability are critical

  • ODF 1.1 has a very limited description of tracked changes, covered in only 4 pages of the specification.  ODF 1.1 does not does explain how to implement change tracking for many of Word’s commonly used features, and in some cases it is not even clear if the ODF mechanism makes it possible at all.

  • As a result of these differences, we found that it is not possible to implement robust and reliable tracked changes with ODF; even very simple concepts, such as deleting a row from a table, are not supported by any existing ODF implementation of tracked changes

  • There is almost no interoperability among the various non-Microsoft implementations of ODF when it comes to tracked changes.

  • To protect our customers from losing data when using tracked changes, and to avoid making an interoperability promise that would turn out to be hollow, we made the difficult decision to not support tracked changes at all in ODF

The rest of this post will cover the details of the points summarized above.  This is a long post, and it gets a little technical in places, because change tracking is inherently a complex topic.


State of Tracked Changes Interoperability


SP2 is a new implementation of ODF, but there are many existing implementations of ODF that are already in wide use.  I’ve done an informal review of them to try to understand existing practices around the use of tracked changes in ODF documents.


Here’s what I’ve found:



If anyone knows of additional information on these implementations, or any other ODF implementation that supports tracked changes, especially if you know of one which is not derived from the OpenOffice.org source code, please let me know and I’ll update that list.


To test interoperability between current ODF implementations of tracked changes, I created a simple document with some tracked changes, saved it in ODF, and then looked at what happened when I opened that document in other ODF implementations.


So the first step is to create a test document.  Using Symphony 1.2, I followed these steps:



  • Click on “Create a new Document”

  • Insert a table (Create/Table), and put some text in each cell to identify the rows

  • Add a paragraph of text, below the table, containing two sentences

  • Add a numbered list of four items, below the paragraph

The starting point for my document looks like this:


image


Then I added some change-tracking, as follows:



  • Turn on change tracking (Edit/Revisions/Record)

  • Delete the second row from the table (right-click, Row/Delete)

  • Highlight the last sentence of the paragraph and the first two items of the numbered list, up through the (DELETE) on the second item, and delete that region

My document now looks like this in Symphony:


image


One things you’ll notice here is that the row I deleted from the table is simply gone, with no change tracking recorded.  This is due to an inherent limitation in ODF’s approach to change tracking, which does not allow table changes to be tracked in a standardized manner.


More on that later, but first let’s see what happens when I save this document as ODF 1.1.  After I click Save, here’s what I  see:


image


Take a close look at the numbering of the list items, and you’ll see that the second list item has no numbering any longer.  Very strange.  And if I reject all changes in the document, the numbering of that item doesn’t come back – it disappeared somehow, the instant I saved my document as ODF 1.1.


I suppose some people might be tempted to suggest that I should use the latest OpenOffice.org release for this test, which came out a couple weeks ago.  I tried that, and I get similar – but not identical – strange behavior by following the steps above.


Speaking of OpenOffice.org 3.1, let’s open this saved document in that implementation of ODF.  When I do, here’s what I see:


image


At first glance, it looks like all of the changes were accepted.  But in fact, the changes are still in the document, and you must go into Edit/Changes/Show to make the tracked changes appear.


In Google Docs, we see essentially the same thing that OpenOffice.org displayed by default:


image


Google Docs automatically accepts tracked changes in ODF documents, and then uses its own entirely different approach for managing change tracking.  Google Docs uses a Revision History feature to track changes to documents; for example, here’s what I see when I click on Tools, Revision History when viewing this document in Google Docs:


image


It appears that Google Docs is pretty committed to this approach to change tracking, based on this recent exchange on the Google Docs Help Center site:



Jcuesta: We need Track Changes.  When?


Gill (Google Docs Guru): Who knows?  Given that we already have Revisions, quite possibly never.


Moving on to another ODF 1.1 implementation, AbiWord 2.6.8 (which does not support tracked changes), here’s how my test document appears:


image


AbiWord doesn’t support tracked changes, so I would have expected to either see the document with no changes at all, or with all changes accepted.  Instead, I see what appears to be a random re-arrangement of the document content.  On closer inspection, I think this is due to ODF’s approach to handling deletions, which requires that deleted content be stored at a location separate from where it was deleted.  I’ll explain that in more detail below.


So far, we have two applications that seem to agree on how to display this document (OpenOffice.org 3.1 and Google Docs), and two others that each have a different way of displaying the document.  Sounds messy, but it gets even worse if you start varying which application creates the document in the first place.


For example, I followed the same steps outlined above, but started from OpenOffice.org 3.1 instead of Symphony 1.2.  Here’s the result:


image


But if I load this OO.o-created document in Google Docs, I see something quite different from what I saw when I loaded the Symphony-created document in Google Docs.  Instead of all tracked changes being accepted, and the deleted text gone, now I see all tracked changes being ignored, and the deleted text (except for the deleted table row) is present, although the list numbering skips over the second item:


image 


So we’ve seen that none of these implementations track changes to tables, and the behavior when loading tracked-changes documents into applications other than OpenOffice.org or Symphony varies between several possibilities, including accepting changes, ignoring changes, and restoring deleted content to a different position in the document.  Furthermore, this is only a simple test that includes nothing but deletions.  If you start combining deletions and insertions in the ways that people typically do while collaborating on documents, you’ll find even more surprising behavior when those documents are opened in applications other than the one that created  them.  This is the state of ODF tracked-changes interoperability today.


The Cause of the Problem


The problems above are not just caused by bugs in these implementations.  Rather, they are the result of inadequate specification of change-tracking functionality in ODF 1.1, combined with a peculiar design decision in ODF’s approach to tracking deletions.


To get a feel for how thoroughly ODF specifies change tracking, it’s instructive to compare the size of the relevant sections of the ODF 1.1 and ECMA-376 specifications.  ECMA-376, which supports 100% of the change-tracking functionality that Word uses, devotes 121 pages to change tracking in Part 4, Section 2.13.5.  ODF 1.1, by comparison, has only 4 pages devoted to change tracking in section 4.6 of ODF 1.1.


There are many areas where we found that ODF 1.1’s approach to tracked changes couldn’t provide the functionality and reliability that our customers have come to expect.


Where to put deleted content?


When you delete content with tracked changes on, the content remains in the document, marked as deleted by a particular user on a particular date/time.  But where in the document?  The answer is different for Open XML and ODF.


Let’s look at a simple example, and see how the two formats handle the deleted text.  Here’s the example we’ll use, a single sentence with a word deleted from it:


image


First let’s look at how Open XML handles this deletion.  Here’s the ECMA-376 markup that Word 2007 writes out for this sentence:


image


You can see that the deleted text is inline, right where it was before it was deleted, surrounded by a delText tag.


Now let’s look at the ODF markup that OpenOffice.org 3.1 writes for this deletion:


image


In this case, the deleted word does not appear inline.  Rather, there is a text:change element inline, with an ID of ct205721376.  Within the text:tracked-changes element (which occurs earlier in the body of the document), you can see where ID ct205721376 is defined as being a deletion by Doug Mahugh, containing the word deletion inside a text:p element.


There are two problems with this approach: one problem for implementations that don’t support tracked changes, and one problem for implementations that do support tracked changes.


To see the problem for implementations that don’t support tracked changes, refer above to the AbiWord screen shot.  AbiWord doesn’t know about tracked changes, but it does know about paragraphs (text:p elements), so it displays every paragraph it finds in the document, in the order that it finds them.  Since the deleted “paragraphs” appear first in the markup, they appear first in the displayed document.


I put paragraphs in quotes there for a reason: in the simple example we’re looking at here, I did not delete a paragraph, I deleted a word from inside a paragraph.  So why is the deleted text wrapped inside a paragraph element?


The answer is that the ODF spec requires deleted content (as contained in a text:deletion element) to be schema-compliant, regardless of whether the deleted region was a well-formed element or (as in this case) merely a fragment within some other structure, such as a word within a paragraph.


This is the source of the problem I alluded to above, for implementers who choose to support ODF tracked changes.  Each implementer must decide how to synthesize markup to make each piece of deleted content into well-formed XML, and then later – when it comes time to accept or reject the change – each implementer must make decisions about how to distinguish between the synthesized packaging and the deleted content itself.


Unfortunately, the ODF specification doesn’t provide much guidance on this complex topic.  Here’s the guidance provided in ODF 1.1 (Section 4.6.4 Deletion):



To reconstruct the text before the deletion took place, do:



  • If the change mark is inside a paragraph, insert the text content of the <text:deletion> element as if the beginning <text:p> and final </text:p> tags were missing.

  • If the change mark is inside a header, proceed as above, except adapt the end tags to match their new counterparts.

  • Otherwise, simply copy the text content of the <text:deletion> element in place of the change mark.

This guidance works for very simple cases, but does not allow for complex situations such as deleting part of a table, as described below.  A specific implementer may come up with an approach that works within their application, but since the spec doesn’t say how to synthesize the markup for the shim, what shows up as a deletion in one application might show up as a different deletion, or not deleted at all, in a different application.


The approach used by ECMA-376, as shown in the example above, keeps the delete text inline where it was deleted, thus eliminating all of these issues.  There is no extra synthesized markup added when a deletion is saved, and therefore implementers don’t need to make decisions about how or whether to remove that markup when it comes time to accept or reject the changes.


Changes to Tables


The ODF 1.1 specifiation says (in section 8.11) that “Change tracking of tables is not supported for text documents.”


And indeed, no existing ODF implementation that I’m aware of attempts to track changes to tables, such as adding or deleting rows or cells, modifying table properties or grid layout, and so on.  Looking at Section 4.6, it’s easy to see why this is so: there is no information provided about how to track table changes, and it’s not at all obvious how one would do so within the current mechanism.


Deleted sections of tables would be especially problematic in ODF, because of the need to create a shim to make the relocated deleted content schema-valid.  The ODF spec provides some guidance on how to revert deleted paragraph content (as quoted above), but for tables, there is no such guidance.


So if a row of a table is deleted, what should an implementer do?  Store in <text:tracked-changes> a table with one row inside the deleted-content section?  And how would another implementation know whether that indicates a deleted row of a table, or a deleted one-row table?


In the ECMA-376 specification, on the other hand, there are defined mechanisms for tracking changes to tables.  As one example, consider the simple act of deleting an row from a table while change-tracking is turned on.  In ODF, that row is simply gone, and reverting your tracked changes later will not recover the row.  But in Open XML, the <del> element can be applied to a table row, and as stated in Section 2.13.15.4, “This element specifies that the parent table row shall be treated as a deleted row whose deletion has been tracked as a revision. This setting shall not imply any revision state about the table cells in this row or their contents (which must be revision marked independently), and shall only affect the table row itself.“


Format Changes


Tracking changes also entails tracking changes to document formatting properties.


ECMA-376 has many elements dedicated to tracking formatting changes, including pPrChange, rPrChange, sectPrChange, tblPrChange, tblPrExChange, tcPrchange, and trPrChange.  These elements are described over 17 pages (pages 1015-1032 of Part 4).


ODF 1.1, on the other hand, has a single format-change element, which is documented as follows in Section 4.6.5, Format Change:



A format change element represents any change in formatting attributes. The region where the change took place is marked by a change start and a change end element.


Note: A format change element does not contain the actual changes that took place.


Much was made during the IS29500 standards process of the difference in the size of the ODF and Open XML specifications.  This is a good example of where that difference comes from: in this case, a concept glossed over in three vague sentences of the ODF spec gets 17 pages of documentation in the Open XML spec.


Summary


This has been a long blog post, but I wanted to make sure that people understand why we made the difficult decision to not support tracked changes in our Office 2007 SP2 implementation of ODF.


When you load an ODF document containing tracked changes into Word 2007 SP2, all existing changes will be accepted, and you will not be able to save any further tracked changes in the document unless you save as DOCX.  This is an inconvenience, but a necessary one to protect users from unexpected surprises in the various scenarios outlined above.  Keep in mind that you can still use Word’s document compare feature to compare a previous version of an ODT file to a newer version, in order to see what changed.


Finally, there are a few questions that I anticipate some people may ask, so I’d like to address those here …


Couldn’t you have at least supported tracked changes for simple cases, as OpenOffice.org does?


Change tracking that handles “some” or even “most” of the changes a user makes would be extremely risky to use, because the user may be surprised to discover later that certain types of changes were not being tracked.  We’ve learned through clear feedback we get from our customers that a feature which works “most of the time” can be worse than no feature at all.  Users count on accurate, reliable change tracking for managing updates to their critical business documents.


We really wanted to make change tracking work for our ODF implementation in Office 2007 SP2. I’ve spoken to some of the developers on the Word team, who wrote a lot of code for this and really tried to solve the problems. But ultimately our test team pointed out that the feature was just not “ship quality” and there was no good way to make it better without extending ODF – which our first principle of Adhere to the ODF 1.1 standard told us not to do.


Will change tracking be improved in ODF 1.2?


Unfortunately, it doesn’t look like it.  The current draft of ODF 1.2 contains no additions to Section 4.6 of ODF 1.1 (which is Section 4.5 in ODF 1.2 due to renumbering).  The only change is that the examples have been removed from the section.


Why didn’t Microsoft work to get this fixed in the ODF TC?


We joined the OASIS ODF TC last June, and we started slowly because some people have stated concerns about Microsoft having too much influence on ODF’s direction.  The first proposal we made was a very simple proposal to add two optional attributes to indicate maximum grid size for spreadsheet applications, which would have addressed a specific real-world interoperability problem we encountered with a major ODF implementation.  Other TC members argued against this proposal, and after several such exchanges we decided not to push the matter.


We then continued submitting proposed solutions to specific interoperability issues, and by the time proposals for ODF 1.2 were cut off in December, we had submitted 15 proposals for consideration.  The TC voted on what to include in version 1.2, and none of the proposals we had submitted made it into ODF 1.2.


We look forward  to contributing more to the ODF TC in the future, and we would welcome the opportunity to work with other TC members to improve ODF’s ability to handle tracked changes.

Comments (33)

  1. hAl says:

    How does Microsoft handle this when saving to ODF.

    Do tracked changed get lost when ODF 1.1 saving is used in MS Office 2007 SP2 ?

    Could you show the difference in MS Office 2007 SP between tracking of changes using the different fileformats ?

  2. Hi Doug

    Nice detailed post.

    I am eagerly awaiting Rob Weir’s next post slamming Google and the other vendors for their unreasonable, anti-ODF behaviour. That should be good.

    As Rob shows, aping OOO behaviour is the only way to avoid the wrath of the arbiter(s) of ODF holiness.

    Canonical app interop is now the order of the day, it seems 😉

    I am waiting to receive the stone tablets from Rob on what SP2 should have done.  Might it be that it should do as OOO does? I wonder …  

    Gareth

  3. hAl says:

    As I understand it ODF does not do change tracking in math equations either as Jesper already showed in his comparisons of math use in OOXML and ODF

  4. Mitch 74 says:

    I see one advantage to the ODF method: the way Google does it.

    Should changes be tracked in the document’s flow? That’s the choice taken in OXML. It’s a very valid choice for a local or lightly distributed, heavy client office suite.

    However, how does that work when modifying a huge document shared among many parties, like Google Docs allows? Well, you then have to re-parse the whole document and rebuild its history from the modification. It also makes the document’s XML almost impossible to read.

    ODF, on the other hand, gives a very basic way to store and modify changes done to a document, but it has one definite advantage: changes are kept out of the "final" document’s tree. It could also be stored inside a database as an "edit history", or as a set of diff changes…

    In short, the document can be edited from outside any office suite and still be valid and rather easy to work with – that’s one advantage ODF’s method has over OXML’s.

  5. Doug Mahugh and a bunch of the standards crew (both in and out of Microsoft) have been having a great

  6. ODF ist das native Format von OpenOffice und sie haben sich beschwert, dass Microsoft ODF angeblich nicht vollständig unterstützt und dass die offene MS-Spezifikation für das .docs-Format zu umfangreich wäre. http://blogs.msdn.com/dmahugh/archive/2009/

  7. dmahugh says:

    That’s an interesting point,  and it may have been what the ODF designers were thinking about when they chose that approach.   But as we see above,  it turns out that it is very hard to get the implementation of the ODF approach right for anything except simple deletions.   It’s an interesting question whether change tracking information should be stored in the document or handled through a repository of all revisions of the document.  The repository approach, as used by Google Docs, eliminates some markup and complexity from the document, but it also ties the document to a specific platform (unless the entire set of revised versions is migrated, which could be a very big task).  That approach also doesn’t work as well in the sometimes-connected scenario, such as working on the document while on an airplane.

    In any event, the question of whether to store all revised versions of a document is mostly independent of the document format, and the benefit of that approach could be applied equally to OXML,ODF, or other document formats.

  8. ghomem says:

    The impact of loosing tracked changes is way lower than the spreadsheet nonsense done on SP2. Please use your energy on fixing the formulas.

    Thank you

  9. Linux Usage Stats v GUIs v Word Processors v Document Standards

    Oasis first announced it had "formed a technical committee to advance an open, XML-based file format specification for office applications" on November 20, 2002. Back then, Linux had a -1% market share on the desktop.

    Four and a half years later, Oasis announced Open Document Format for Office Applications (OpenDocument) v1.0. OpenDocument to provide "a royalty-free, XML-based file format that covers features required by text, spreadsheets, charts, and graphical documents." Linux still had a -1% market share on the desktop.

    Six months after the announcement of OpenDocument version 1.0, the OpenOffice.org Project announced the release of OpenOffice.org 2.0. Linux had a -1% market share on the desktop.

    On February 13, 2007, Oasis announced OpenDocument Version 1.1. The press release indicated the following companies as supporters of the new version: IBM, Nokia, Novell, Red Hat and Sun Microsystems. Linux remained at -1% in the desktop market share.

    Eight months later, the OpenOffice.org Community announced the release of OpenOffice.org 3.0. This new version was the first to run natively on the Mac OS X platform. Linux continued at -1% in the client market share.

    A few weeks ago, a Linux advocate exposed the reasons on why Linux hasn’t been accepted as a desktop alternative by mainstream computer users, see http://www.krsaborio.net/research/2000s/09/0426.htm . The slides for the presentation are available at http://www.krsaborio.net/research/acrobat/2000s/090426_linux.pdf

    The open source community must dedicated most of its time to fix the problems with the adoption of Linux on the desktop instead of wasting its time on the banalities of ODF v OOXML and similar agendas.

    I’ve gathered a timeline on all the above facts at http://anonymous-insider.blogspot.com/2009/05/linux-usage-stats-v-guis-v-word.html

  10. Stefan Gustavson says:

    Doug, your article is well written and relevant, but in the current context it amounts to gripping for straws. It could easily be interpreted as a futile attempt to save face if one were so inclined. Nobody has suggested that the SP2 implementation of ODF is terrible because tracked changes are not supported. As you point out, it is not a well supported feature either in the ODF standard nor in existing ODF implementations – yet few people are asking for it. Tracked changes are simply a minor part of MS Word functionality, important only to a small group.

    On the other hand, formula import and export seems to me a very fundamental core feature of Excel which is of utmost interest to everyone using the software. I really think that a lot more people would like to see you explain that mess better, instead of covering other relatively minor defects where you made wiser design decisions.

    The facts, which you have not disputed, are that Excel 2007 SP2 drops all formulas on ODF import from any other source than itself, and writes formulas which are unreadable to any other ODF implementation. The format is not even compliant to the standard. How does that count as interoperability?

  11. dmahugh says:

    Stefan, it sounds like your customers and ours have quite different requirements.  Tracked changes is a critical feature for many of our customers, and that’s why I explained the issues in some detail above.  I had been asked to explain our decision regarding tracked changes before the topic of ODF formulas was raised, and as I said when SP2 was released, I was already working on a blog post to cover the topic.

    Regarding your claim that “the format is not even complaint to the standard,” you may find it helpful to read what other ODF implementers have to say on the matter, or take a look at the conformance testing that has been done on the sample spreadsheets you’re referring to, in addition to the explanation of that matter which I posted last week.

  12. Mark Parity says:

    That’s just too bad.  It works for everyone else.  Keep thinking you "can’t" do it or the "non-400-page-spec doesn’t explain it."

    That wind that always blows IN your window is because microsoft sucks.

    You’ve sown the sucking wind.  Now reap the sucking whirlwind.

    M

  13. Ira Skygazer says:

    Tracking any change is as simple as implementing code management with Subversion. Why is the wheel being recreated? Simply use the internal management processes provided by subversion and add them to the ODF specification. Then again, would it be appropriate to provide change management outside of the ODF specification and recommend that application developers, implementing ODF v2, implement Subversion as a change management capability. Hmmm…

  14. Fiery Spirirted says:

    It is funny…but tracked changes in Microsoft Office is the one single feature of MS Office that I have heard concerns about. For instance one book publisher of Roleplaying books that I buy from had numerous difficulties with errors that kept appearing in the final book even though it had been fixed earlier.

    The source of the problem seemed to be that if more than one MS Office version is involved in the editied then all bets was off about how the tracked changes could mess up. Their solution was to turn off tracked changes totally even though it in theory sounds like a good idea.

  15. ghomem says:

    @dmahugh

    It is exactly as Stefan Gustavson puts it. You will never be accused of dishonesty by not implementing tracked changes which is an advanced feature and apparently not perfect on ODF.

    Formulas, one the other hand, are a basic feature and you broke it.

    No matter how much important tracked changes are for your customers, I seriously doubt that it takes priority over err….. being able to exchange spreadsheets.

    Question is: what will you guys do about this?

  16. hAl says:

    It is very amusing that whilst Rob Weir tried to show lack of interoperability in MS Office he actually produced files from Symphony and openOffice that seem invalid ODF

    http://adjb.net/post/Notes-on-Document-Conformance-and-Portability-4.aspx

    It is actually MS Office and KOffice that produce the valid ODF files

  17. Mitch 74 says:

    @dmahugh: porting from a revision system (be it for a file format or a code repository) to another is, indeed, a problem by itself – which actually falls a bit off the path of an office file format.

    What I mean is: an office document is a ‘tangible’ thing, an object if you will, while change tracking is a feature:

    – it may be basic, like the implementation proposed by ODF is; it probably is a leftover from the StarOffice XML file format, kept around because it was non obtrusive (if an office suite can’t read it, it can discard the whole markup and keep the document still logical; current implementations try, but some fail, to implement it – like IE 6 tried to implement CSS positioning, a bad implementation may be worse than no implementation at all)

    – it may be advanced, like ECMA376’s: however, a complex solution entails complex implementation, which are easy to botch (a comment above seems to indicate that even from a MS Office version to another, it may not be glitch-free).

    Now, change tracking can be implemented in many ways: be it at

    – the document level (managed by the office suite),

    – the file level (managed by a revision tracker),

    – the file system level (managed by an history-based file system, like Time Machine on Mac OS X, or some other file systems in *NIX OSes).

    So, there WILL be a need to define interfaces to translate change tracking from a system to another – in which case, I’d say that ODF’s way is more forward-looking since it isn’t as intrusive upon the document as OXML is: if an office suite relies upon Time Machine to track document changes (and provides GUI tools for that), then it shouldn’t need to care for the OXML change tracking system.

    Or, if an office suite tracks changes using Subversion, or CVS or Git (it could!) or a mere collection of .diff files, all allowing distribution of patch collections and reimaging (which could be included along with the document), then there will be a need to translate from one to another (please note that there are already converters between these trackers), which WILL have to be implemented by the office suite anyway or ignored.

    However, all of these have the advantage of not caring about the document itself: you don’t need to parse the document in its entirety to get its revision history, or to remove it: you can still get the document.

    Corollary: if OXML and ODF had their changes tracked through, say, Subversion, then an office suite would need a single implementation of Subversion to implement the feature for both file formats. It would then fall upon the office suite to translate the content of the revision from one format to the other, but even that may not be necessary. See:

    A file is created with change tracking activated, under the OXML format. It is modified several times,  with changes tracked through Git, stored inside the document’s ZIP file as a Git tree.

    The document is converted to ODF, under the same office suite. The Git tree, still included inside the document file, thus contains a complete state of the document before conversion, and then keeps tracking changes to the document in ODF format.

    The file is sent to someone whose office suite implements a different tracker (say, CVS). The user is given the option of porting the Git tree to CVS, or to discard it (notice that since Git to CVS translators already exist and are documented, it wouldn’t be a heavy feature to add to an office suite).

    Conversion from a tracker to another is very complex, indeed; you may end up with an unusable history tree. However, with an external tracker, you can at least be sure that the final document (which is the important part) is never corrupted by stray "history": which is the important point.

  18. HeadMasterT says:

    Being a technical engineer myself, I must agree with Mitch 74. You limit your flexibility to separately select formating and version control solutions to fit your requirements when those functions are tied together.

    I would rather have ODF focus on document formatting and leave change/revision control solutions to tools specifically designed for that function. Just because OXML wishes to bundle the support for disparate functions into their format, I don’t agree that ODF should do likewise.

  19. MS will fix ODF OOXML interoperability sooner than the open source community will fix the following:

    http://montanalinux.org/files/lfnw2009-linux-sucks.pdf (slides)

    http://montanalinux.org/lfnw-2009-lunduke.html (video)

    ROTFL

  20. A Nonymous says:

    Nice wookie. Now, what about the spreadsheet formula interoperability issue?

    Name withheld.

  21. @Name withheld, let’s make a bet.

    Microsoft will fix ODF OOXML interoperability issues before Linux distributions fix problems with audio and video.

  22. Mitch 74 says:

    @Anonymous Insider: I really fail to see what Linux distributions’ alleged problems with audio and video have to do with an office document format.

    Now, since you displaced the matter at hand, two questions:

    – isn’t the guy having "video problem" using an Nvidia driver? I’ve had troubles with them too: the non-Nvidia drivers detected and configured my screen properly, the Nvidia blob would give an ‘out of range’ error every friggin’ time. Luckily, I could force it to behave using a config file. If this happens under Windows, your only hope is to get a different card.

    – sound problems, right now, happen mainly when using PulseAudio – a layer that allows you to specify sound volume for any application running on your system independently from the other (like DirectSound does in Vista). You can bypass/disable it though. Want to do that in Windows so as to get glitch-free sound from those 96 KHz, 32-bit floating point sound samples and without using up a complete CPU core to do the automatic downsampling to 44 KHz 16-bit integer that Vista does? I’ll answer that one: write your own player that hits upon OpenAL. If your current sound card doesn’t support OpenAL, see question 1.

    I never had any Wi-fi problem under Linux; at least, none that I couldn’t permanently solve with some config file kung-fu; yet, I still have no cure for the "Vista dropped wi-fi connection and requires a reboot" problem I’ve had on several separate machines.

    Sound is a problem if you let it be one: not happy with the PulseAudio abstraction layer? Disable it! You’ll still have hardware-based mixing available under ALSA.

    The video and sound problems are solved. OOXML’s? Not exactly. So, you’ve lost your bet.

    Back to our schedule: document revision and office XML-based file format. Current question is: should document revision history be kept inside the document like OXML does, or should it be moved to whatever change tracker the office suite or file system already implements? As you may have guessed, I myself prefer the ODF+database Google designed. The nice thing about it is, it could be implemented in OXML as-is, without even touching the current spec!

  23. A Linux Feature or a Huge Linux Bug

    Skype for Linux Download Web Page:

    http://www.skype.com/download/skype/linux/choose/

    Opera for Linux Download Web Page:

    http://www.opera.com/download/index.dml?platform=linux

    So far I think I’m winning my bet!

    Now, you and Rob Weir seem to be quite impatient about ODF OOXML interoperability. Why not take a deep breath and relax?

    It helps to remember what the open source community has done about to office applications:

    2000-07-19 Sun Open Sources StarOffice Technology

    2000-10-16 Sun Announces StarOffice Source Code on OpenOffice.org

    2002-04-30 OpenOffice.org 1.0

    2002-11-20 OASIS to Advance Open XML Format for Office Apps

    2005-05-23 OpenDocument Version 1.0

    2005-10-20 OpenOffice.org 2.0

    2006-05-08 ISO and IEC Approve OpenDocument OASIS Standard

    2007-02-13 OpenDocument Version 1.1

    2008-10-13 OpenOffice.org 3.0

    Why all of the sudden you and Rob want 100% interoperability? Relax, it will happen soon.

  24. Mitch 74 says:

    @AI: the fact is that we are, here, discussing how a feature may be implemented: existing solutions have problems (OXML’s requires parsing the whole document to see edits, ODF’s is incomplete); moreover, due to the fact that both formats share a similar structure file-wise (a collection of XML files, zipped together), some existing technologies exist that would allow all office suites to cumulate both of their advantages (OXML’s exhaustive support for edits, and ODF’s ability to leave the document alone) without having to rewrite specs (if the revision system is kept outside of the document, it can be implemented in many ways; the document would however remain readable, were revisions supported or not by the office suite!)

    I mean, keeping revisions inside a document’s flow requires a document reader to implement support for them, eventhough it has no need for it: creating a thumbnail, parsing keywords for indexing, or even converting its present state to HTML or whatever!

    Revisions can be extensive; you may end up with several dozen times the actual content of the ‘final’ version stored all over the place. You would thus require an indexing engine to go through all edits, discard them as being ‘dead’, until it reaches ‘active’ data! If, on the other hand, revisions are kept apart from the ‘final’ document, indexing is a snap. This would also allow the document to be loaded faster, as history would need to be loaded only if he user starts editing it.

    And I repeat, this would NOT require to rewrite existing specs, be it for ODF or OXML, it would NOT compromise compatibility, as the ‘final’ document would be untouched (history may be discarded, but as we could see, be it in ODF or OXML, glitches exist in current implementations – and a bad implementation of a feature is often worse than no implementation at all), it could even be kept across document conversions from one format to another, and it would even allow for faster editing, indexing and viewing!

    So, one thing that could be proposed though TC-34 (I think that’s the body in charge of both formats now), is to keep current systems as "legacy" specifications, and to start working on giving both formats an export method to a content versioning system common to both formats – be it Subversion, CVS, Gitwhatever, or even a mere interface to these systems.

    Heck, if Office 2010 used, say, Subversion for OXML, and OpenOffice 3.2 used Git for ODF, while Koffice used CVS for both, and Mac Office used Time Machine, as far as I know there already are converters between at least some of  these systems! Talk about interoperability, a proposed feature already has a working and extensively tested implementation! Who loses? Nobody, who wins? Everybody!

    So, in that case 100% interoperability is actually almost an afterthought: this is reimplementing a feature in a more useful, more flexible way that could be common to both formats while still separating document data from office suite feature.

  25. dmahugh says:

    Interesting thoughts, Mitch.  I happen to be a fan of having the revisions tracked in the document itself, but you make a good case for the benefits of an alternative approach.

    When you say that existing solutions have problems and "OXML’s requires parsing the whole document to see edits," I’d say that this isn’t a differentiator between the OXML and ODF approaches.  The ODF approach moves deleted content to a separate location in the document, whereas the OXML approach leaves deleted content in its original location, so as a practical matter the ODF approach requires a little more processing to retrieve and assemble the content.

    One note on SC34 — it’s not in charge of both formats at the present time.  SC34 handles maintenance of IS29500 (in WG4), and maintenance of ODF is handled by the OASIS ODF TC.

  26. The Illuminati is at it again:

    http://www.groklaw.net/article.php?story=2009051922175320

    One of the best comments from the Illuminati:

    "This time the world will adopt ODF and Microsoft will be left with a product no one wants. This time Microsoft is going to lock itself out of it’s own market. The funny part is that MS won’t figure it out till it’s too late and everyone has already switched. Microsoft will become a penny stock."

    Didn’t Linux break the 1% market share on the client recently?

    So I wonder how the Illuminati will get Microsoft "to lock itself out of it’s own market".

  27. Some background on Groklaw’s editor:

    1. He/she uses a Mac

    2. He/she is against iPhone jailbreaking

    See http://tinyurl.com/p6ycue

    Am I looking at double standards or is it just my imagination?

  28. Mitch 74 says:

    @Anonymous Insider: will you please stop bringing Linux to the table?! What the heck does an OS have to do with an office suite?

  29. Mitch 74 says:

    @Anonymous Insider: will you please stop bringing Linux to the table?! What the heck does an OS have to do with an office suite? An OS runs an office suite, and is otherwise IRRELEVANT!

    @dmahugh: I did mention that the approach I liked best was Google’s ODF+database (not ODF alone); even then, the legacy StarOffice method, being stored in a different ‘tree branch’ in the XML document’s tree than that of the main document, makes it so that once the revisions’ tree branch is ignored, the rest of the document can be parsed – or repaired – fast.

    Many people started using OOo because it was able to repair MS Office corrupted binary files, so "repairs damaged documents" is a feature that can’t be ignored :p

    ODF’s change tracker’s main problem, as you pointed out, is that modifications made to tables and other nested elements are not tracked at all, and reverting back to a previous state will more easily fail due to more radical DOM manipulation: implementing the feature is thus hard on the developer for only small results. However, in case of a damaged file, repairs can be done more easily since the document’s final tree branch is kept cleaner. And in cases where the feature isn’t required (indexing, viewing, printing), parsing is still a bit more simple. So, in my opinion, in that matter ODF is more portable.

    In OXML, since modifications are nodes kept ‘in place’, the parser has to go over each of them one after the other to reconstruct the final document, but showing revisions is then only a matter of styling them – implementing this feature is thus easy on the developers, and very powerful. However, a damaged file is then hell to repair, manually or otherwise. I thus stand by my previous point that OXML’s system is more geared towards heavy office suites and ill adapted for viewers, content indexing or online editing.

    But with current file system technologies (database-based file systems, change trackers, revision systems…), keeping revisions outside the document, in a reasonably format-independent manner, may reduce redundancy and increase robustness – and solve the problem for both formats at the same time.

  30. Google announced a reseller program for Google Apps in January 2009.

    According to a press release, Google "announced a program enabling technology solution providers to sell Google Apps to businesses around the world. Authorized resellers will be able to sell, customize and support Google Apps Premier Edition for customers of all sizes, creating new revenue opportunities for partners and easier access to Google’s popular cloud services for more businesses."

    The Google Apps Premier Edition suite of communication and collaboration tools included Gmail, Google Calendar, Google Docs, Google Sites, Google Talk, and Google Video for business.

    Are Google Apps resellers behind the latest crusade for 100% document interoperability?

    You may view documents related to Google Docs at http://tinyurl.com/reyw3r

  31. Anonymous says:

    Hahah Microsoft man cries because his number ‘2’ disappears. Live with it buddy! just like all us poor plebs have lived with MS Word’s quirks over the years.

    But seriously, good points and a good read. People want features that work all the time, and the ODF standards should better support Tracked changes.

    I also prefer the Google style revisions, but that’s because I’m a software engineer and I love to merge and diff my files. I’m having trouble getting my MD to sign up to OOo without good tracking of changes, simply because its so visible and easy in MS Word.

  32. Voici les derniers posts concernant Open XML et l’implémentation d’ODF dans Office 2007 SP2. De très