Comments from Tim Bray on OpenDocument


I’m sorry to post about this subject with all the other stuff I’ve been promosing to cover but I just read this and wanted to share my thoughts. I was reading Tim Bray’s post the other day about the OpenOffice conference (http://tbray.org/ongoing/When/200x/2005/10/01/Open-Office-Conference). He brought up a point that I’ve been asked about seperately a few times related to the lack of formula support in the OpenDocument standard:



Bad Formula Trouble I learned, to my dismay, that the ODF specification is silent on spreadsheet formulas, they’re just strings. This is obviously a problem; much discussion on what to do ensued. I lean to the idea, much bally-hooed by Novell, of simply figuring out what Excel does, writing that down, and building it into ODF v.Next. Mind you, anyone who’s really been to the mat with Excel, in terms of Math & Macros, knows that it isn’t a pretty picture, there are real coherency problems. But it’s good enough and the world has learned how to make it work.


There’s also this article on NewsForge titled OpenDocument office suites lack formula compatibility where the following was stated:



The OASIS Technical Committee in charge of this standard explicitly said last January that “while … interoperability on that level would be of great benefit to users, we do not believe that this is in the scope of the current specification. Especially since it is not specifically related to the actual XML format the specification describes.”


Even outside the Committee there is the same opinion: OpenDocument must only be about structure and how to represent content.


Someone asked me in one of my previous posts what my thoughts were on this article, and here’s a bit of what I said:



From reading the article, it sounds like the thought was that they would standardize around the presentation aspect of the formats only. It’s a bit unfortunate since the result of a formula does affect the ultimate display. In fact, formula results are often the most important part of the spreadsheet.

Did the original StarOffice format have formulas defined in their schema? Did they decide only push some of the schema through OASIS?

If this is an area folks are interested in, let me know. I can post some examples of Excel’s schema for formulas…


As I’ve said before, it appears that there are a number of very similar goals between the Office XML formats and what Sun did with the StarOffice format. In Office, we have the additional responsibility of supporting everyone’s existing documents, which means there are a huge number of features we need to support (all of them). The issue here around equations happens to be just one example of one of those types of features that is really important. I’m sure the StarOffice format did have this support, it just didn’t make it into the OpenDocument spec.


This is one of those cases where it’s important to understand the nuances of someone’s design. It appears that as they moved the StarOffice format through OASIS to create the OpenDocument format one of the primarily goals was around display of content. For whatever reasons (time, effort, design goal, etc.), they made the decision that some application information (like formulas, or customer schemas) was not something they wanted to work into the standard. This is an example of where our minimum requirements have to be different.


Presentation-centered formats


In it’s current state, the OpenDocument format appears to be focused primarily on presentation of information. I think in that way it has somewhat similar goals to those of PDF. The sections of the StarOffice file format that they decided to take through OASIS were the ones that affected display of the files. You could argue that formulas affect the display, but as long as you make sure all the formulas are calculated before you save into their format you are fine.


Application interoperability


I’m curious what applications that are going to use OpenDocument as their primary format have decided to do about these missing features like formula support. I know a number of them have support for spreadsheets. Formulas are such a key part of a spreadsheet I’m assuming they have to create their own extensions to the format to support this. I’d be curious to know how the applications standardizing around OpenDocument (KOffice and OpenOffice for example) are planning to exchange spreadsheets. I would assume they will take some approach (like transformation) to ensure the interoperability they are going after. That’s one of the great benefits of an XML format; as long as it’s well documented you can take advantage of it. We’ve had people get pretty upset at us though when we’ve had to extend an existing format when there is additional functionality we want to store that the format doesn’t support (it’s referred to as “embrace and extend”). I think in this case there isn’t really a choice. You can’t have a spreadsheet without formula support.


I’m sure that the long term goals of OpenDocument do include full roundtripping of all user data and if that’s the case I’m sure they are going to work on a proposal for missing pieces of the spec like formulas at some point. Once they do decide on a way to add formula support to OpenDocument, then they’ll also need to go back to all the files that get created under the current standard and update them from the proprietary extensions to match the decided upon standard.


Full fidelity formats


I’ve talk before about how full fidelity formats are really important to us because we want to ensure that all features you want to use can be fully represented. Formulas are an extremely important part of any spreadsheet. In fact one could argue that formula support is the primary reason for using an application like Excel. The Microsoft Office Open XML formats are specifically designed as an XML representation of our full file formats. Everything you can do in our default format is represented as XML. Our formats are primarily designed around viewing, editing and integrating the files with data, formulas, and other application behavior. Collaboration is extremely important to us as well, and it would really be lame if you couldn’t collaborate on every aspect of your files (only a subset). This is another example of why we had no choice but to create our own XML file formats if we really wanted to move to XML formats as the default. Otherwise we would have been stuck with something that didn’t fully persist all of our users’ features. The key is that we fully document that XML and provide the schemas to anyone that wants to use them. This way as we continue to innovate based on customer needs and demands, we can also incorporate that functionality in the file format and expose it to anyone that would want to leverage it.


-Brian

Comments (41)

  1. anti-FUD says:

    So, what’s the problem with storing formulas as strings? That’s how they are naturally represented, and it makes perfect sense to store them that way.

    ODF is the way to go at the moment, because there is no other option. There is no other equivalent that that is free to be implemented by any vendor without restrictions.

  2. BrianJones says:

    While it would be nice if were structured and could be schematized, storing as a string can still definitely work. Is there an agreed upon syntax for those strings that the different applications are using?

    -Brian

  3. orcmid says:

    I loved the Tim Bray article, the whole post. What fun. Thanks.

    I agree that they should move to repair this very quickly and simply. I can’t believe it happened in the first place.

  4. Todd Knarr says:

    I think the whole issue of formulas is pretty much not a major problem for word-processing documents. Spreadsheets yes, but not for the average document like you’d prepare in Microsoft Word or OpenOffice Writer. It makes sense, then, to not let a spreadsheet-specific issue hold up the whole format.

    I don’t think anyone should be suprised at how the formula issue is progressing. It’s how RFCs have been developed for years, after all. OASIS isn’t sure what the right thing is when it comes to formulas that lots of applications will use. They’re probably trying to find a representation that’ll allow flexibility for non-spreadsheet purposes as well as fit the existing needs of spreadsheets. That’s non-trivial. Meanwhile, rather than codify an incomplete and inadequate initial standard, they’re leaving it undefined. Strings can be filled in with whatever a particular application needs, and the application can use XML to tag the document to identify it’s own format for later detection. When a standard for formulas is settled on, applications can then drop the app-specific tags and representation and use the standard one, with the app-specific tags allowing them to detect older versions of their own documents and upgrade them transparently. That particular trick isn’t even particularly advanced, in fact section 2.2 on meta-data in the OpenDocument spec defines how to create application-specific metadata tags for just this kind of thing. If you maintain any kind of coherence between the XML document tree and the internal representation, I can even see how to preserve formulas in the document even if the application can’t understand or display them. Same for meta-data tags.

    I’m a C++ hacker, not an XML guru. If I can figure this stuff out, I’m fairly certain the Microsoft Office developers can too. This whole hullabaloo smells like FUD to me.

  5. kathy says:

    MS is floated by money collected from the sales of office and OS. Now opendoc will undermine this in a relatively short time, given every countries (except USA) will adopt this standard very soon, no matter how many FUD is posted by the PR of MS. Google now join forces with Sun to provide SOA model of office. Only make it happen faster.

    Given everyone will switch to linux after vista is released, MS will soon lose most of her revence. All cost centers in MS are at the mercy of Xbos team to see how many workforce bill gate will decide to lose!

  6. SlashDotJunkie says:

    Todd Knarr: "Strings can be filled in with whatever a particular application needs, and the application can use XML to tag the document to identify it’s own format for later detection. When a standard for formulas is settled on, applications can then drop the app-specific tags and representation and use the standard one, with the app-specific tags allowing them to detect older versions of their own documents and upgrade them transparently. That particular trick isn’t even particularly advanced …"

    And then you save your spreadsheet to ODF in OOo/SO with formulae represented by OOo/SO-specific XML tags (in OOo/SO-specific namespace, I presume) and then load in KOffice or Gnumeric, who have their own app-specific tags and namespaces and thus ignore OOo/SO’s formulae. So much for a not particularly advance app-specific tags…

    Still love custom tags idea? Think of upgrade scenario. What would OOo/SO and others do when the common denominator formula standard is introduced, and alas it doesn’t cover all the features of their in-house file format? Ok, OOo/SO is probably safe, thanks to Sun. KOffice? Gnumeric? Yet-unknown-but-cute underdog spreadsheet app? Sucks to be them.

    I do have a question for you guys: when Office 12 schemas are published, how long till OOo/SO and others start supporting new Microsoft formats? They won’t have much of a say I figure. And why would anyone use ODF XML formats after Office 12 XML formats are integrated?

    Love, SDJ

  7. Todd Knarr says:

    SDJ: as I said, app-specific representations are a transitional matter only. As I noted, one of the reasons OASIS isn’t specifying a formula standard just yet is that they’re trying to avoid just the problem you describe. They want to be sure that the standard will cover everything and won’t leave either current or future apps out in the cold because what they need won’t fit into the standard. I consider it a feature of the standard that they can allow apps to do app-specific things without breaking the standard for other apps and without breaking future standardization. I’m still in a better position than I would be with a standard that wouldn’t ever support what my apps needed and I’ve got to go through yet another format change. By my estimation, the transition from app-specific to standardized formulas should be fairly uneventful: I’ve probably got the apps that wrote the app-specific data, so once the standard’s finalized I update them to a standards-compliant version and my files get updated to standards-compliant formulas as I open them. Meanwhile, why should I hold up a standardized word-processing format just because I’m not quite sure how to put one particular bit of spreadsheet-specific stuff (which word processors won’t care about) in?

    I doubt OOo or any of the others will be supporting the Office 12 XML schemas soon. For one, the developers aren’t likely to take on the patent-infringement risk laid out in MS’s license. For another, their customers are asking for OpenDocument formats, not Office 12, and OOo are listening to customers, not their competitors. Given that the majority of the world’s going to remain on pre-12 formats for the foreseeable future, I think the pressure will actually be to avoid Office 12 formats, not adopt them. I know where I work we’ll probably be rejecting Office 12 formats since nobody will be running software that supports them. Support for the format isn’t a security issue so the next possible upgrade timeslot it could go in is summer of 2006 (assuming the virus-scanning software has had it’s upgrade to support the format by then, and that an Office 12 format plug-in for Office XP is available by no later than the end of the year). We’ve had major upgrades to Windows (to XP), Office (to OfficeXP) and several other software packages this year, another big expenditure’s going to be a hard sell (even if the plug-in’s free, staff time and disruption to work is expensive and money for it’s got to be found in the budget).

  8. Yuki says:

    When you’re done grasping at straws trying to prove how superior your ClosedXML format is to ODF, can you please explain why you still haven’t made clear that all you said about IP problems with ODF involving Sun was a mix of FUD and lies?

  9. cheesybagel says:

    OpenDocument has support by several apps. The Wikipedia article on OpenDocument has more info:

    http://en.wikipedia.org/wiki/OpenDocument

    Support is not limited to OSS offerings. For example: TextMaker 2005 is a closed source application, currently in beta, with some OpenDocument support:

    http://www.softmaker.com/english/tmw05_en.htm

    Adobe, Corel and IBM were behind this standards effort as well. Not just Sun.

    If Microsoft has a problem with the specification, perhaps they should join the commitee. Microsoft is a member of OASIS already AFAIK.

    No one is asking Microsoft to dump Office XML. Just to support OpenDocument, as RTF or Wordperfect were supported.

  10. Craig Ringer says:

    Thanks for the informative post. I find your comments on extending the format interesting – as surely that’s exactly what XML is for?

    I personally would have no problem with an extended format, so long as features built into the format were used where they existed, instead of being replaced by custom extensions. It’s much like the support for XML blobs in PDF – sure, if you want to include your own extra metadata or other details in a PDF, why not? It doesn’t harm other readers (which can just ignore it), and adds functionality for users.

    Extensions, to my mind, only become a problem when a set of tools rely on them to the exclusion of support for the standard. For example, if some hypothetical PDF viewer shipping with a product would only open PDFs that had the vendor’s own XML extensions, or if it hopelessly butchered files that lacked them, that’d be an issue. Providing extra functionality with the extended files would not – especially if those extensions were clearly documented and reasonably licensed.

    Even zealots would have a hard time screaming about it if Microsoft chose to extend OpenDocument. They’d do it anyway, but they would be even less credible than usual. OpenOffice.org, after all, significantly extends the OpenDocument format. Save a .odt, unzip it, run content.xml through `tidy -xml -indent’ and have a look at it – you’ll see that it imports a number of OO.o specific schema in addition to the OpenDocument ones. That said, a fairly complex .odt I just picked up doesn’t seem to actually use any elements from the ooo namespaces.

    Extensions have to be done carefully, but they’re hardly evil in and of themsevles. Where would LDAP be if we couldn’t add to the schema, for example?

    I was personally rather disappointed that OpenDocument doesn’t require the undamaged preservation of unrecognised markup, let alone specify preservation options like Word supports. A few tests confirmed that current OpenOffice 2.x betas strip custom markup added to content.xml, and in fact don’t even appear to notice or complain about invalid (though well-formed) markup. Some more research indicated that this issue has been deferred for a later revision of the OpenDocument spec.

    While understandable – in terms of implementation difficulty if nothing else – it’s disappointing. Support for preserving, and in fact actively working with, customer schema extensions is one area where Word has me drooling. You guys will, alas, probably never release a lightweight version for UNIX/Linux thin clients, so for the time being I’ll only be able to use it on some of the systems at work.

    I must express agreement with cheesybagel on one point – it’d be very nice if Microsoft chose to ship OpenDocument import/export filters, even if only for Word at first. As he points out, much like RTF it’d only need to support what features the format could handle, plus the option of saving in that format by default. I personally see little wrong with simply warning a user that "some formatting and features may not be preserved when saving in this format". Alternately, it would seem reasonable to add schema extensions to support the round-trip of MS-specific features (though it seems other apps would clobber them). I suspect that this might, if nothing else, calm the seas of the current fuss about document formats in government.

    I would be very happy to have OpenDocument filters in Word. Not "native" format use, just filters. As a business that currently uses both Word and OpenOffice, it’d actually simplify our use of Word and make it more attractive, not less. If I wanted to transition all the users across to OpenOffice, I could do it just fine based on the Word 2000 format support, so I don’t see how it’d be a threat to business.

  11. Craig Ringer says:

    Yuki: That’s been answered here before. As someone who was keeping an eye on the discussion, I can assure you that at the time the questions were quite legitimate. Sun had a rather ambiguous declaration that linked to the W3C policy. That policy, by the way, doesn’t guarantee perpetual availability or sublicensing, or GPL compat if that’s what you’re worred about. The declaration didn’t say if they actually had any patents on the format.

    Sun has since moved, pleasantly quickly, to post an updated patent grant.

    I don’t really understand why so many people are getting worked up about this. It didn’t look like a big deal then, and it doesn’t now.

  12. Just a minor point, Brian, that in fact reflects on a larger critique I have of the FUD coming out of Redmond:

    You take a small point about missing formula specs and then make a grand claim that OD is "presentation-oriented", and so somehow more limited than MS Office XML. This is just wrong. The spec has a quite clean separation of presentation logic from content/structure.

    Is it as clean as, say, DocBook? No, but that’s no surprise. Does it do a better job than your format? Probably.

    BTW, David Wheeler has been working on the missing formula stuff (though I know nothing about it). It’s called OpenFormula. That’s the nice thing about an open format: people can actually contribute to improving it!

  13. BrianJones says:

    Thanks for the comments everyone.

    Bruce, I wasn’t trying to turn this into FUD. In reading through those articles it sounded like the OpenDocument committee had decided they wouldn’t focus as much on things like formulas and instead focus on things that affect the presentation of the files. I got that from the text in that article I pointed to. I could be completely mistaken though.

    I’m actually curious about what peoples thoughts were around formulas. One could make an argument that using strings for formulas is the right way to go, but in order to have a shared document format I’m assuming they still need to have everyone agree on a single type of syntax for those strings. Otherwise the formats aren’t interoperable.

    I’ll have to check out the work David is doing around the OpenFormula work. Thanks for pointing that out.

    -Brian

  14. Todd Knarr says:

    Brian: I think strings aren’t the final format for formulas to be represented in. I suspect they’ll come up with an XML-tags-based format in the end. For the moment, though, strings let applications stuff their own specific representation in until a standard form is settled on.

    And you weren’t mistaken about the committee’s focus. They know what they want from presentation, they aren’t sure yet what’s needed for formula representation, so they’re focusing on what they’ve got a firm handle on while leaving formulas simmer. This is typical of RFC practice (the canonical example of an open process): let everyone try different approaches, see what works and what doesn’t, give people time to try combining the best ideas from different approaches and see how those turn out, and then and only then codify into a standard what’s been proven to work best.

  15. Bruce D'Arcus says:

    Brian — I was objecting to the logic of this reasoning:

    <blockquote>In it’s current state, the OpenDocument format appears to be focused primarily on presentation of information. I think in that way it has somewhat similar goals to those of PDF. The sections of the StarOffice file format that they decided to take through OASIS were the ones that affected display of the files.</blockquote>

    I just don’t know how you can say that. It tells me you probably haven’t looked through the details of the spec. That’s fine; it’s just I object to you making conclusions like the above based on that limited knowledge.

    I’ve worked with the OD TC, and I’m just an end-user with a passion for one specific piece of the document pie: citations. When we designed the new coding for that, the focus was on really nice separation of content/structure from presentation. And when that makes it into the spec and starts to be implemented, it will mean standardized coding for this that will enable features I don’t think are possible with existing tools.

  16. SlashDotJunkie says:

    Todd Knarr: "This is typical of RFC practice (the canonical example of an open process): let everyone try different approaches, see what works and what doesn’t, give people time to try combining the best ideas from different approaches and see how those turn out, and then and only then codify into a standard what’s been proved to work best."

    Translation: "ODF standard is a work in progress, and as such is immature."

    You are missing quite a few points:

    1. Standardizing a big commercial office suite on such immature standard is very risky at best.

    2. It’s also pointless, since most features, starting with formulas, would require creating non-standard extensions and namespaces.

    3. There is no way to ensure full fidelity between old Microsoft Office file formats and ODF – they are simply too different.

    4. There are more applications in Microsoft Office than ODF is designed for, meaning again completely proprietary extensions and namespaces crammed in a general ODF container.

    5. ODF (or their OOo implementation, which I expect is not too shabby) are s-l-o-o-o-o-w, period. There you go again: http://blogs.zdnet.com/Ou/index.php?p=101

    Need more? Ok, here is the killer:

    6. Until ODF is comprehensive, there is no way to ensure interoperability of office suites. Some features and formatting would always be lost when moving documents between suites. There is simply no way successful commercial application can afford this.

  17. FARfetched says:

    Brian, you wrote in part: "We’ve had people get pretty upset at us though when we’ve had to extend an existing format when there is additional functionality we want to store that the format doesn’t support (it’s referred to as "embrace and extend")."

    The extension isn’t the problem. The problem is when Microsoft tries to lock up their extensions behind onerous licensing (e.g. Kerberos) instead of working with the people charged with maintaining those standards. It’s the same deal with locking up the XML file formats with licenses and patents: totally bogus, dude.

    I’m not a numbers jockey, so spreadsheet formulae aren’t that important to me: whatever OASIS settles on will be fine. What’s more important is that I be able to move into *and out of* Office as I please, taking advantage of the strengths of whatever tools I have at hand. I don’t require (or expect) that documents look exactly the same in every tool, just that I don’t lose important content (in the case of wordsmithing, that’s text and structure). Try competing on the *merits* of your software and file formats, instead of on who has the biggest set of locks, and you’ll see a lot of people change their attitude about Microsoft.

  18. Todd Knarr says:

    SDJ: counterpoints:

    1. ODF is hardly immature. It’s been in development for quite a while, and is based off of a pre-existing format which has had several years of real-world use and modifications in response to actual problems. In addition, a large number of actual users of documents have been involved in the development specifically to insure that the resulting format met their needs. Given that, it seems to me less risky than depending on a brand-new format, designed within the last year or two, with no real-world experience or direct input from real-world users. Your assertion that ODF is immature has been, as well, thoroughly discredited by any number of others, both involved with OASIS and not.

    2. Hardly pointless, since OASIS is working on those features. Applications that would need, for example, formulas already have an application-specific format that they could continue to use, and they can transparently upgrade to the standardized format when that’s finalized (as I described above).

    3. Customers aren’t looking for full fidelity with Office formats, they’re looking for a standard format that can represent various document formats accurately. They’re looking for a replacement for the Office formats, not something that can reproduce them, and fidelity is irrelevant when you’re not using the Office formats.

    4. Right now ODF is explicitly designed to eventually cover all the applications that exist in the Office suite. It doesn’t yet, but that’s not a major problem so far as I described above. The initial target is word processing, and frankly there’s more implementations of word processors using ODF than there are implementations of Microsoft Word.

    5. Slow? Practical experience indicates that OpenOffice, KOffice and AbiWord are at least as fast as Microsoft word. In any case, whether it takes 1 second or 1.1 seconds to write out a document that took 4 hours to write and must remain accurately readable for the next 200 years seems to me not particularly critical.

    6. That’s the argument "We can’t release anything until we’ve got everything perfect.". This is countered by every bit of software development methodology in the last 10 years, particularly open-source development which thrives on releasing early to get real-world feedback to guide further development. It’s also countered by the Internet protocols themselves, which follow the incremental-development model and have almost completely wiped from existence protocols developed on your "get it complete and perfect or it’s useless" model.

    I don’t have to call FUD on any of these arguments, it’s already been called on all of them by many others.

  19. BrianJones says:

    I’m going to have to disagree with you a bit there Todd. First off though, I’d like to ask that you please stop using the term FUD when we have disagreements on particular issues. It really doesn’t serve a purpose. Just say you disagree and make your point. The way you use the term FUD, it could really apply to anyone that says something or asks something you don’t like, and that really kind of takes away any meaning from the term.

    I don’t know who the customers are for these applications that are moving to support OpenDocument so I can’t comment directly on what they are ok putting up with, but I can say definitively that our customers would not be happy if we moved to a new file format that wasn’t even fully defined for all the existing functionality. Of course as you move forward and add new functionality you’ll have to adjust the formats, but existing functionality has to be represented.

    I really wish that cross version compatibility was as simple as you suggest, but it isn’t. Changing formats from version to version is a big pain. Let’s take the formula case here. If we were to go and use method "A" for representing formulas in version 1, we’d be ok for now. But once they come along and say that method "B" is the right way to do it, we’d have a difficult decision to make. When we move forward to version 2, do we output just method "B", or both "B" and "A"? If we don’t output method "A", then there is no way for anyone with version 1 to look at our formulas. That isn’t really acceptable because we haven’t even changed the way formulas work internally, just the way they are persisted. Maintaining cross version compatibility is extremely important. Not everyone is able to always upgrade to the latest version, so you need to take that into account. That’s the reason we’ve stuck with the same binary formats for so long. Only now are we moving to new default formats, and we’re actually doing a ton of work to provide updates to the past 3 versions of Office so they can also support this new format.

    You’re third point that "customers aren’t looking for full fidelity" really blows me away! Maybe you’re only talking specifically about customers that want to move to OpenDocument (which is a really small set)? Most customers don’t care what format their files are saved in as long as everything works. We think we can do better though and that’s why we build the XML formats. We think that Office documents can play a much larger role in business processes. Most customers aren’t there yet though.

    Your comments about the speed of OO are a bit surprising. I do know that we’ve had customers that would refuse to upgrade if the product slowed down, so it definitely is a big deal to a lot of people, even it if isn’t a big deal to you.

    Also, as a side note, for those of you folks that claim formulas aren’t that big of a deal because they only affect spreadsheets, I unfortunately think you’re letting your bias affect you a bit. Excel is one of the key applications in the Office system, and if we told our users that a core piece of functionality in that product wasn’t supported (or at least we weren’t guaranteeing cross version compatibility), we’d be in a lot of trouble. 🙂

    -Brian

  20. Mark says:

    Brian: "I don’t know who the customers are for these applications that are moving to support OpenDocument so I can’t comment directly on what they are ok putting up with, but I can say definitively that our customers would not be happy if we moved to a new file format that wasn’t even fully defined for all the existing functionality."

    Not sure if I missed something but could you point me to an example of where you are being asked to move to a new primary file format? I thought everyone was just asking for a way to save in this format (as in the Save As… option)?

    Thanks

    Mark.

    (oh and I see WorkPerfect is to support OpenDocument http://www.eweek.com/article2/0,1759,1864843,00.asp )

  21. Eduardo says:

    A bunch of new links about OpenDocument Format. Includes WordPerfect support, submission to ISO, adoption by Australian National Archive, and a FAQ from Massachusetts that addresses a lot of the questions that people are debating here:

    http://www.groklaw.net/article.php?story=20051006155004596

  22. BrianJones says:

    Mark, we’ve definitely had a lot of requests to move to open, documented, XML formats. The average end user doesn’t care, but we’ve had a large number of big companies ask for this support. The number of people asking for XML formats compared to the number of people asking for OpenDocument format is significantly different. And aside from people specifically wanting the new format, there are also feature requests we get that we will be able to solve because of the new format.

    We also have had a lot of partners request the format change to make it easier to build solutions on top of our files. And as I’ve mentioned before, even we benefit from it because it allows us to build more rich functionality into the Office System for handling Office documents in a rich and powerful way (workflow, document assembly, etc.)

    I know that it’s important for people to understand why we did this work in the first place. I talked about a number of the reasons for moving to the new format in this post: http://blogs.msdn.com/brian_jones/archive/2005/09/29/475340.aspx

    -Brian

  23. Todd Knarr says:

    Brian: one example of a customer asking to move to OpenDocument would be, obviously enough, Massachusetts.

    As for formula formats, it’s easy and I described how to do it earlier (at least in OpenDocument, which accomodates this). You use a user-defined meta-tag to indicate the formula representation used in the document. When you need to change over to a standard format because it’s been finalized, you start writing in the standard format when you save documents. After all, everyone else will be using that format too, so there’s no reason to write in a proprietary format. When reading, you look for your meta-tags. If you find yours, it tells you what format you need to read formulas in. If you don’t find one, you attempt to read the standard format. If you can’t read the standard format, you use the numeric representation in the tags instead as a last-ditch fallback. This causes loss of backwards compatibility (old versions can’t read the formula representation of newer versions, but can still get the numeric form), but this is traditional with every MS Office version upgrade so I don’t see where it can be considered unacceptable, and reader plug-ins can be added to older versions just as they sometimes are with older versions of MS Office. In summary, this appears to only be a problem if the software authors decide to make it a major problem.

    As for "full fidelity", I think what most customers are looking for isn’t what you’re saying they are. They want full fidelity of appearance and representation. That is, they want fonts, line spacing, page layout, embedded images, lists, tables and such to appear reasonably. Some details can differ (eg. the exact shape of dots in unordered lists, the exact appearance of borders on tables), but it needs to appear "close enough" to a human. I don’t think they want or need full fidelity with the way Word, for example, represents things internally. If I select 10-point Arial Bold for my text and I get 10-point Arial Bold when I look at it, I don’t care that the OpenDocument format loses some details about Windows’ and Word’s exact internal font selection. Similarly, when I’m distributing a document outside my organization I probably don’t want to make assumptions about whether the recipient can display streaming video and I likely won’t care that the format loses some information about things I’m going to avoid. And of course OpenDocument *will* be able to handle embedded multimedia content in the same way it handled embedded images, with at least as much capability as MS Office (modulo codecs and formats that aren’t available on the system being used to read the document, but that’s a problem with the Office formats as well).

    I’d also note that OpenDocument was, as has been noted repeatedly, developed in consultation with a wide variety of actual users, including people like Boeing with really complex and picky document requirements. All of them appear satisfied that OpenDocument can represent everything they need (or will be able to as areas of the standard are finalized). I have to ask: what parts of Microsoft Office documents can’t be represented in OpenDocument?

    And as Mark indicated earlier, to satisfy Massachusetts MS Office doesn’t need to use OpenDocument as a native format. All it needs to be able to do is save in it and be able to be set to default to saving in it. Office already can save in a large number of older formats that don’t support all the modern features. It seems to me that if it’s so hard to support OpenDocument this way, why does Office support older Office formats, ASCII text, RTF and the like which have the same problems?

  24. Todd Knarr says:

    Brian: there also seems to be a disconnect on the definition of "open XML format". The "XML format" part seems to be clearly understood. The "open" part, though, seems to be defined by Microsoft as "Publicly described as of the current version.". This is part of "open", but people like Massachusetts are asking for "open" in a more extensive sense:

    a) Must be implementable by anyone, on any platform, for any purpose without having to get permission from any entity (beyond the minimal "You can’t call it OpenDocument unless it actually conforms to the OpenDocument spec."). This includes creating documents, not merely reading them.

    b) Can’t be controlled by any one entity. No one person or company can change or add to the standard on their own, they have to get support from a wide base including users and competitors first. Equally important, no one person or company can *block* changes and additions to the standard that *do* enjoy a wide base of support.

    Massachusetts has pointed out where the Office XML formats fail on both of those points.

  25. SlashDotJunkie says:

    Todd Knarr: "As for formula formats, it’s easy and I described how to do it earlier (at least in OpenDocument, which accomodates this)."

    I read OpenFormula’s RFC attempt, my estimate is it’s about 3-4 years from getting to alpha quality. I’d expect that most ODF-based speadsheets would adopt OOo/SO format long before this RFC matures.

    Todd Knarr: "This causes loss of backwards compatibility (old versions can’t read the formula representation of newer versions, but can still get the numeric form), but this is traditional with every MS Office version upgrade so I don’t see where it can be considered unacceptable"

    And where exactly did you pull this load of [three-letter word that Brian doesn’t want in his blog] from? Most version upgrades of Excel maintained both backward AND forward compatibility on formula level. There is simply no other way if you want to keep your customers happy.

    Todd Knarr: "As for "full fidelity", I think what most customers are looking for isn’t what you’re saying they are. They want full fidelity of appearance and representation."

    Which application, word processor? Perhaps, although I’d rather have my text intact 😉 From your perspective PDF is the best format: even that it does not mandate persisting the actual text, it does maintain the "appearence and representation" of it.

    I’d expect most customers would want full fidelity in whatever the major function of particular application is. Who cares about appearence if their models do not travel correctly between OOo and KOffice spreadsheets?

    Todd Knarr: "I have to ask: what parts of Microsoft Office documents can’t be represented in OpenDocument?"

    What parts of ODF cannot be represented by a bunch of XML and binary files stuffed into ZIP container? Or by a generic blob of binary data in a single .tgz gile? Same answer – none 🙂 Which does not answer or prove anything about relative values of MSO, ODF and TAR+GZIP containers.

    Love, SDJ

  26. Eduardo says:

    OO’s Florian Reuter on various topics including ODF versus Office XML. Reuter thinks OO’s X-forms is superior to Microsoft smart forms. He also talks about some of the difficulty in reverse engineering Microsoft formats.

    http://madpenguin.org/cms/html/62/5297.html

  27. Mark Miller says:

    Regarding some of the other comments I see on here that suggest it would be easy for Microsoft to make ODF the default saving format for Office (or Word, at least), I think I can provide some clarification. I don’t work for Microsoft, and never have. I’m a developer of custom productivity applications for businesses. Some have argued that Microsoft should make Office configurable so that it can save files in ODF format by default, as MA specifies in their enterprise standard, but won’t, to protect its position in the marketplace. One suggested the reason this would be easy is that Office already saves in a multitude of other formats.

    I think this represents a kind of blind naivete about how most people use office applications, and a strict allegiance to a particular brand of open standards no matter the other consequences. As a developer, I take the user experience seriously. That’s the reason I’ve bothered to follow this story at all.

    Microsoft Office serves a very wide market, ranging from power users, to those who are just computer literate enough to turn on their computer, open up an application or two, and do a few simple things with it, but not much else.

    If I were in Tim’s shoes, my reticence towards making it possible to make a different format the default saving format would be twofold. One, making a different format the default effectively requires the application to lock the user out of certain features. In this scenario the application has to assume that since a less feature-rich format has been selected, that the application itself must only allow access to the features that the format can handle. Also, certain documentation items in Help, would either need to be made unavailable, or notated with "Only available if native Word format is used". It would make for a terrible user experience if the application went ahead and allowed the user access to all of the features it had, but then when the user hit "Save" (ie. save to the default format), the application would warn, "FYI, you’re going to lose some of your work since the default format doesn’t support some of what you did," or alternatively, if a feature was visible, but when selected, the application said, "The default format doesn’t support this feature."

    Second, given that certain features were locked out, it would create a misimpression in some customer’s minds after a while, that Office only supports the limited set of features that their selected default format supports. In other words, if asked, they would say that Office only supports features x, y, and z. They would think that, but only because the default format they selected also only supports those features. They would forget about, or not venture to explore other features that Office had that might help them do something easier. It would reduce Office’s marketability, and it would be something Microsoft would have to "fight against", I would think.

    I think the focus of Office, and productivity applications from other vendors, has been to help users get something done. The emphasis is on what the application makes possible, not the file format it’s saving data in. The file format is seen primarily as supporting the features of the application, not the other way around.

    The reason Office supports other formats for export is that many customers expressed a need to exchange files with other users who didn’t have Office, or had an older version of it. Sometimes the export formats are limited in capability, so the user is warned that some of their work may be lost in the conversion, but at least they have the opportunity to save the document in Word’s native format as well, to preserve their work.

    Conceivably, Office could support ODF as an export/import format, IMO, but I agree that it would work against the marketability of Office in general to make it possible to change the default save format. I think the reason Microsoft has not included ODF yet is they don’t see enough of a market for it to make it worth their while.

  28. Mark Miller says:

    Andrew Updegrove’s article (referred to by Eduardo’s 2nd link) is an interesting read. He reveals some facts about MA’s decision I hadn’t heard anywhere else. Everything I had read about it up to this point said "MA chooses ODF". What Updegrove says is that only the Executive Agency (I presume this means the executive branch of government) had made this decision, not the entire state government. But still, tens of thousands of computers are involved.

    I listened to the audio of the Sept. 16 meeting of the committee, and I recall one of the comments made by Jonathan Zuck of ACT was that the standards document looked like "a late paper" (ie. one that was hastily written). I kind of get this feeling from reading Updegrove’s article as well, not critiquing his writing, but rather his description of the decision. The enterprise standard clearly states that the Agency must commence migrating their systems to use applications that save to ODF by default, starting in Jan., 2007. Yet, in Updegrove’s article he says the committee’s FAQ on the subject says that users can continue using Microsoft Office after that date, and can even save in native Office format, but only need to convert to ODF when documents need to be archived. He also says that outside parties (contractors, lawfirms, etc.) can continue using whatever file formats they are now, to send documents to the state. There is no requirement that they change to using ODF in order to submit documents. He also says the standard states (it may have been changed to say this) that the Agency was not going to convert existing documents to ODF, but just leave them in their existing format. He says that only new documents, created in Jan., 2007 and after need to be saved in ODF for archiving. This sounds like a more reasonable policy, given their goals, and given the difficulty in migrating technologies. I just wonder why they didn’t say all this in their enterprise standard up front. I think there’s been a lot of confusion around the issue, and perhaps the committee that came up with this standard hasn’t communicated clearly exactly what their intentions are.

  29. Todd Knarr says:

    Mark: I think you’re missing one point: the customer has *asked* to have the features thus restricted. I think that if, as a business, I make the decision that I don’t want certain features in my documents, I *want* my office applications to tell users they can’t use those features.

    Also, regarding the apparent conflict in what formats are allowed, I don’t think it’s that confused. The Executive Agency has decided that all the documents it keeps and makes available have to be in an open format. Individual users can still use native Office formats for working copies, it’s only when the document leaves their computer headed elsewhere that it needs to be converted to a standard format. The same thing for incoming documents, it’s just the document implementation of the Internet rule "Be strict in what you generate and liberal in what yiou accept.".

  30. Mark Miller says:

    Hi Todd. I understand what they wanted. I think perhaps what hamstrings Microsoft in this particular situation is that I don’t believe they’ve ever made a custom version of their software for particular clients. I could be wrong about that, but I’ve never heard of them doing it. This is just my image of Office, but I’ve always seen it as a one-size-fits-all application suite. If they implement a feature, they implement it for *everyone* that could potentially buy the product. That’s the POV I was speaking from.

    To tell you the truth, MA’s decision reminds me of how government bureaucracy used to be described before President Clinton came to office. One of the initiatives the Clinton Admin. tried to put through, called "Reinventing Government", tried to do away with, as much as possible, custom designed products that were made just for the federal government. I remember Al Gore talked about an example of a floor cleaner that was made to the bureaucrats’ exact specifications. Like everyone’s been saying with this case, this was just the government saying what they wanted. It was up to manufacturers to bid and make it for them, whichever ones wanted to participate.

    The thing was they couldn’t buy the floor cleaner from any well-known manufacturer, because none of them wanted to bother with making a custom product for the government. They made more money making a commodity product that was sold to more customers. It was illegal for the government to buy any formula that didn’t fit the specification. So the federal government bought it from either one or a few manufacturers at very high cost, when compared to the retail equivalent. The "Reinventing Government" program tried to change that, and as I heard, partly succeeded, thus lowering the operating cost of government a bit.

    I think the Executive Agency is trying to kill two birds with one stone. They want the open archiving format, and they feel they’ll also see a significant savings by not having to pay as much in commercial licensing. The difference here is that a standards committee, OASIS, has come up with the format, so more customers than just the state of MA can potentially use it, maybe creating a greater economy of scale for products that support it.

    Ultimately, I wish MA success in whatever they do. I want IT to succeed, no matter how it’s implemented. I know I can’t judge their decision entirely from afar (as I am far away from the situation, literally), but from what I’ve read, I would take issue with this decision of theirs if I was working with them, particularly about the timing. I’d say, spread the word around that this is what you want, get producers interested in the concept, wait for the products that support what you want to come out, let them find a market, let the products mature, and let the market winnow out the weaker players, and then make the choice. What’s the rush?

  31. Todd Knarr says:

    Mark: I think you’re wrong about the timing. Firstly, MA doesn’t intend to change over tomorrow. As I understand it the transition isn’t going to happen over the course of weeks or months, it’s going to be phased in over the course of a couple of years at least. You have to start at some point or people will just keep putting it off, and I really don’t see where anyone remotely competent will have a problem meeting timeframes of several years to comply. As for waiting for products, why wait when the products already exist and are mature? The OpenDocument standard’s had several years already to be hashed out, with document users constantly using the spec and providing feedback about what in their documents isn’t supported by the spec. Boeing and IBM both have requirements a lot more complex than most users will ever have, and I doubt they’d’ve signed off on something for their own use that they knew didn’t support their existing documents. As for products, there’s at least 3 that either already support OpenDocument or will within the next couple-three months (all of them have been preparing for this support for a long time now and have had it working in CVS releases to make sure it gets tested). Saying to wait for products to come out with support now is like saying to wait for the train to arrive when the train’s not just sitting at the platform but blowing it’s whistle and getting ready to pull out.

    I think you’re right, though, about the commodity nature MA’s looking for. They explicitly said it, in fact: they’re looking for a format that can be implemented by any number of vendors so both they and anybody who has to deal with them can have a variety of choices *and* can interoperate without having to all make the same choice.

    As for the expensive government purchases, I agree that some of them are silly. On the other hand, I recall a comment by an Air Force maintenance officer: "Yeah, 5 grand for a coffee-maker’s a lot more than you’ll pay for one from WalMart. But the coffee-maker in your kitchen doesn’t have to make good coffee when the boiling point of water’s 20 degrees lower because of lower cabin pressure, and it doesn’t have to keep scalding-hot water from flying around the cabin when it’s inverted and there’s 5 g’s trying to pull the pot off of the warming plate.". Or contrast document readability concerns, where Microsoft can’t even read their own formats from 15 years ago (Word95, Word 6) correctly, while governments have survey and land-title deed documents from the 1700s that’re actively used by surveyors today. We won’t even get into NASA losing data that isn’t even as old as I am because the formats aren’t known and the hardware to read the media isn’t made anymore. Silly requirements are just that, silly, but government sometimes does really have requirements that’re just that different.

  32. As we move forward with the standardization of the Office Open XML formats, it’s interesting to look…

  33. OK, forgive the random Sneaker Pimps reference and I promise we will move off this topic of ODF politics…

  34. Andrew Sayers had a great suggestion that I should have a page set up that gives an overview of the blog