Everybody’s blogging about Massachusetts

I really don't have anything to do with Office XML formats so can't
contribute much of substance to the debate over Massachusetts' draft Enterprise Technical Reference Model v 3.5
which mandates the OASIS Open Document Format. This has generated a lot
of weblog posts, mostly from open source advocates or employees of
Microsoft competitors fulsomely praising it and hoping that this
political decision will give their preferred technologies more economic

A couple of more independent assessments raise issues that I find more interesting. For example, Stephen O'Grady

acknowledges that Microsoft (or third parties) could easily support the
ODF format is there is a business need to do so.  That gets to
the  question of whether Massachusetts' very real needs for
document interoperability and longevity are best met by the XML familiy
of technologies in general, or one specific format in particular. 
MS Office and OpenOffice / StarOffice have taken different paths here.
Office 2003 can handle custom XML schemas and stylesheets quite
dynamically, whereas OO has been hard coded to handle a specific
schema, and the latest beta versions have evolved to support the OASIS
ODF.  It's not at all clear to me why supporting specific
schema(s), whether or not they are endorsed by a standards body, is
considered more "open" or "standards based" than support for the more
fundamental XML, XML Schema, and XSLT standards that Office 2003

One technical note - one can't simply configure Office 2003 to handle
OASIS ODF documents directly because the OASIS Technical Committee
chose to define that format using the RELAX NG schema spec (endorsed by
OASIS and ISO) rather than the W3C XML Schema spec, the W3C Recommendation
and which  Office 2003 supports.   There are some
plausible technical reasons for this in that RELAX NG is simpler, based
on a more solid formal underpinning, and somewhat better suited for
defining complex textual document
formats than is W3C XML
Schema.  Unfortunately, that advantage does not carry over into
tool support (few mainstream XML editors currently support RELAX NG
validation) or support for structured data within the text.  For
example, tools that support the popular data-oriented XML programming
technique known as data binding (which allows XML to be parsed easily
into instances of application-level programming objects rather than
abstract node trees) almost all require W3C schemas as input.

This gets to a second issue, raised by Joe Wilcox (and something I debated at length in previous O'Grady postings):

Considering the OpenDocument format is only truly supported by OpenOffice
2.0, which isn't even available yet, I'm at a loss to see how the XML-based
format meets the Commonwealth's goals for openness or backward compatibility.
Nobody's really using the format yet, right? How, uh, open is that?

My other problem is one of definition. Looks like the Commonwealth considers
Adobe's PDF as open, because the spec is openly published. OK, I'm scratching my
head, because if you download Corel's WordPerfect SDK the WPD specification is
right there. As for Microsoft, while I'm grumbly about the company's liberal use
of open, I have to say if PDF meets the Commonwealth's standard so should Office
formats; at the least the XML-based formats coming with Office 12. Microsoft
does publish its XML schemas and license them on a royalty-free basis.

There have been all sorts of inconclusive debates about the real
meaning of terms such as "standard" and "open", and I don't want to go
there.  It's important, however, to appreciate that there is a
very real distinction between a specification ratified by a committee
or standards organization, and a "standard" that enables real-world
interoperability.  MS Word's binary format or PDF are usually
considered "de facto standards" in the sense that one can reasonably
expect a random correspondent to be able to read a document in one of
those formats.  Obviously we can do better, and  almost all
concerned believe that moving to some sort of openly documented XML
format is a better way to achieve short term interoperability and long
term usability of documents.  In general, the more eyes that have
helped debug a spec and the more organizations have endorsed it, the
more of a real standard it will be. But there are plenty of examples of
standards organizations producing specifications that have not led to
real world  interoperability of any significance.  W3C XLinkOASIS WS-Reliability,  and ISO HyTime are clear examples of this in the SGML/XML world.

The important thing to remember is that industries, not
standards committees, are the ones who produce industry
standards.  Knowing many of the people who helped produce OASIS
ODF, I expect it to be a carefully crafted and good-quality
specification, but it has to prove itself capable of solving real-world
problems before it can legitimately be called an industry
standard.  For example, its reliance on RELAX NG makes it pleasing
to XML geekdom, but greatly complicates the processing task for most
actual developers. Will  the mainstream be diverted by the need to
support ODF documents, or will ODF remain in a backwater?  Good
intentions don't often make for successful policies.

Bismarck supposedly said
""People who enjoy eating sausage and obey the law should not watch
either being made".  That applies to industry standards, which we
enjoy once they've been "cooked", but get produced by a messy process
at best.  It's easy to sympathize with Massachusetts' desire to
buy its document standard sausage only from nice clean  kitchens
that use wholesome cruelty-free ingredients, but I've spent too much
time in the XML standards sausage factory to believe it until I taste

Tofu bratwurst for the Labor Day picnic, anyone? 

Comments (9)

  1. Noory says:

    This is hilarious.

    Of course it was a political decision!

    Very few people would dispute that Office XML would be a better ‘open’ format than OpenDocument on technical reasons, if for no other reason than maximum backcompat.

    But Office XML isn’t ‘open’:

    1) it is patent encumbered

    2) it might be royalty free but it requires each user to relicense

    3) it comes with no guarantees about future openness

    4) it is controlled by a company with a history of bad practices who must look after its shareholders before the people of MA

    5) Unlike GPL2/LGPL which have been around for over a decade the licensing implications aren’t clear to software developers, for both Free Software or non-Free developers.

    6) Despite being asked repeatedly Brian jones still hasn’t answered simple questions about licensing. IANAL is easy to say, but it doesn’t inspire confidence.

    7) Archival issues are very different to day-to-day document management issues, a lot of the points you bring up simply aren’t valid in an archival setting.

  2. Bruce says:

    Your point about RNG is a red herring and smacks of FUD. There is plenty of solid tool support for the language, and even if your tools don’t support it directly, it’s trivial in most cases to generate an XSD version of an RNG schema via Trang.

    Indeed, that was one of the reasons why the OD TC chose RNG. The same with DocBook and TEI, each of which are now authored in RNG but provide alternate representations via Trang (and each of which have more history as document formats than either WordML or OD).

    I do take your point that it’s a little strange to mandate specific formats, rather than to simply insist they be (fully) open.

  3. Mike, I’m sure you are aware of the tools available for easily transforming RELAX NG grammars into XML Schema. I’m also very curious what would happen if you put the XSD for any usefultext document format into any of the data binding tools available. My bet is that not a single one will be able to process them.

  4. mikechampion’s weblog : Everybody’s blogging about Massachusetts Based on the following paragraph we get the quote of the day which immediattely follows it via the above link… — Bismarck supposedly said ""People who enjoy eating sausage and obey the law…

  5. Bilbo says:

    You all have to get over it. Open formats are more important to the customer than any feature microsoft can offer. Reading a document in 50 years is more important than inserting video in a transient document today.

    Your little schema patent and continual attempt to shut out competitors with changing interfaces put an end to any possibility that the large amount of work done on XML by microsoft would be usefull in the long term.

    The industry has moved on, it not "gee wiz" any more, its building infustructure for long term use.

    I like microsoft products, but if you can’t/won’t offer what I need then I can’t use them.


  6. tecosystems says:

    Although I promised you a summary of what I discussed with Microsoft last Friday, much of it’s already been told – if not discussed. Like CRN’s Paula Rooney, I got the chance to connect with Microsoft’s Alan Yates (GM of…

  7. Simon Phipps says:

    I’m not sure I like being dismissed so easily as wrong because I am a partisan competitor, Mike, I am making points that I actually believe in (like I hope you are) and anyway, like IBM, I thought we were partners 🙂 Anyway, I just left a monster reply to your monster comment over on my blog at http://blogs.sun.com/roller/comments/webmink/Weblog/coursey_is_wrong_on_massachusetts#comment4

  8. The war of words over Massachusetts’ proposal to standardize on the OASIS Open Document Format continues…

Skip to main content