Everybody's blogging about Massachusetts

I really don't have anything to do with Office XML formats so can't
contribute much of substance to the debate over Massachusetts' draft Enterprise Technical Reference Model v 3.5
which mandates the OASIS Open Document Format. This has generated a lot
of weblog posts, mostly from open source advocates or employees of
Microsoft competitors fulsomely praising it and hoping that this
political decision will give their preferred technologies more economic
clout.

A couple of more independent assessments raise issues that I find more interesting. For example, Stephen O'Grady

Please enable Javascript in your browser

acknowledges that Microsoft (or third parties) could easily support the
ODF format is there is a business need to do so.  That gets to
the  question of whether Massachusetts' very real needs for
document interoperability and longevity are best met by the XML familiy
of technologies in general, or one specific format in particular. 
MS Office and OpenOffice / StarOffice have taken different paths here.
Office 2003 can handle custom XML schemas and stylesheets quite
dynamically, whereas OO has been hard coded to handle a specific
schema, and the latest beta versions have evolved to support the OASIS
ODF.  It's not at all clear to me why supporting specific
schema(s), whether or not they are endorsed by a standards body, is
considered more "open" or "standards based" than support for the more
fundamental XML, XML Schema, and XSLT standards that Office 2003
implements. 

One technical note - one can't simply configure Office 2003 to handle
OASIS ODF documents directly because the OASIS Technical Committee
chose to define that format using the RELAX NG schema spec (endorsed by
OASIS and ISO) rather than the W3C XML Schema spec, the W3C Recommendation
and which  Office 2003 supports.   There are some
plausible technical reasons for this in that RELAX NG is simpler, based
on a more solid formal underpinning, and somewhat better suited for
defining complex textual document
formats than is W3C XML
Schema.  Unfortunately, that advantage does not carry over into
tool support (few mainstream XML editors currently support RELAX NG
validation) or support for structured data within the text.  For
example, tools that support the popular data-oriented XML programming
technique known as data binding (which allows XML to be parsed easily
into instances of application-level programming objects rather than
abstract node trees) almost all require W3C schemas as input.

This gets to a second issue, raised by Joe Wilcox (and something I debated at length in previous O'Grady postings):

Considering the OpenDocument format is only truly supported by OpenOffice
2.0, which isn't even available yet, I'm at a loss to see how the XML-based
format meets the Commonwealth's goals for openness or backward compatibility.
Nobody's really using the format yet, right? How, uh, open is that?

My other problem is one of definition. Looks like the Commonwealth considers
Adobe's PDF as open, because the spec is openly published. OK, I'm scratching my
head, because if you download Corel's WordPerfect SDK the WPD specification is
right there. As for Microsoft, while I'm grumbly about the company's liberal use
of open, I have to say if PDF meets the Commonwealth's standard so should Office
formats; at the least the XML-based formats coming with Office 12. Microsoft
does publish its XML schemas and license them on a royalty-free basis.

There have been all sorts of inconclusive debates about the real
meaning of terms such as "standard" and "open", and I don't want to go
there.  It's important, however, to appreciate that there is a
very real distinction between a specification ratified by a committee
or standards organization, and a "standard" that enables real-world
interoperability.  MS Word's binary format or PDF are usually
considered "de facto standards" in the sense that one can reasonably
expect a random correspondent to be able to read a document in one of
those formats.  Obviously we can do better, and  almost all
concerned believe that moving to some sort of openly documented XML
format is a better way to achieve short term interoperability and long
term usability of documents.  In general, the more eyes that have
helped debug a spec and the more organizations have endorsed it, the
more of a real standard it will be. But there are plenty of examples of
standards organizations producing specifications that have not led to
real world  interoperability of any significance.  W3C XLinkOASIS WS-Reliability,  and ISO HyTime are clear examples of this in the SGML/XML world.

The important thing to remember is that industries, not
industry
standards committees, are the ones who produce industry
standards.  Knowing many of the people who helped produce OASIS
ODF, I expect it to be a carefully crafted and good-quality
specification, but it has to prove itself capable of solving real-world
problems before it can legitimately be called an industry
standard.  For example, its reliance on RELAX NG makes it pleasing
to XML geekdom, but greatly complicates the processing task for most
actual developers. Will  the mainstream be diverted by the need to
support ODF documents, or will ODF remain in a backwater?  Good
intentions don't often make for successful policies.

Bismarck supposedly said
""People who enjoy eating sausage and obey the law should not watch
either being made".  That applies to industry standards, which we
enjoy once they've been "cooked", but get produced by a messy process
at best.  It's easy to sympathize with Massachusetts' desire to
buy its document standard sausage only from nice clean  kitchens
that use wholesome cruelty-free ingredients, but I've spent too much
time in the XML standards sausage factory to believe it until I taste
it. 

Tofu bratwurst for the Labor Day picnic, anyone?