Abriendo Puertas con XML

When Win Office 2003 shipped, there was a great deal of debate as to the “openness” of Microsoft’s use of XML. The debate resurfaced with the announcement of the new Office 12 XML-based file formats, and it’s been further brought to the fore in recent days with Massachusetts’ recent decision regarding the adoption of “open” file formats.

On one side of the debate, you have those who would argue that, as long as it’s XML, it’s open. The other side of the debate would argue that mere use of XML isn’t sufficient, that the schemas need to be established or endorsed by an independent standards body. There are subtle shades on both sides of the debate, including schema publishing and licensing, but even with a published schema with a royalty-free license, there are those who would argue that a format isn’t open unless the schema itself has been approved by a standards body.

One of the most articulate proponents of the “standards body” side of the debate has been Joe Wilcox over at www.microsoftmonitor.com. Joe’s reasoning can be found here and here.

In reading Joe’s remarks, however, it’s difficult to find a coherent position. At one point, he bases his notion of “open” on the acceptance of a standard by an independent standards body. At another point, he defines “open” based on the extent to which independent software vendors have supported the format with a certain degree of fidelity. Thus, OASIS’s OpenDocument XML format is “open,” but so is Adobe’s PDF.

As a side note, back on September 1, Joe was scratching his head about Massachusetts’ inclusion of PDF in their definition of “open.” Apparently Joe forgot that he’d done exactly the same thing back in June. To be fair, Joe’s reasoning was subtly different from that invoked by the Commonwealth of Massachusetts, but neither line of reasoning is all that coherent in the exclusion of Microsoft’s use of XML in Office from the “open” rubric.

Joe’s second post from last June comes closest to articulating a coherent stance on the subject. In that post, he likens OASIS’s OpenOffice format to a simpler, more widely understood, idiom, albeit within the same XML language, to the idiom adopted in by Microsoft Office. Joe write:

But, even though the two people agreed on a common language, suppose one starts using geeky engineering jargon the other can’t understand. Tough to communicate, right? So the one gives the other a big, fat book of definitions for the jargon–kind of like Microsoft publishing its schemas–so that they can talk. But the other person would have to learn the jargon first. Sure all the jargon is in the book, but wouldn’t it just be better to communicate (e.g. be more “open”) by speaking the basic language previously agreed on?

I think Joe’s reasoning would be sound if Microsoft’s addition of “geeky engineering jargon” was merely gratuitous. His reasoning breaks down, however, when we note that the “geeky engineering jargon” in Office’s use of XML is necessary to adequately describe the features that are available in Office.

We can see this by altering Joe’s analogy. Suppose we aren’t talking about “geeky engineering jargon.” Suppose, rather, we’re talking about the jargon used within an academic field. The jargon in any academic field arises when various academics coin new terms to express various ideas within the field. Economists, for example, talk about IS-LM curves. Lay people haven’t a clue what that’s about, but, among Economists, a great deal of information can be conveyed very succinctly by using the jargon of IS-LM curves.

One important point of the academic jargon analogy is that anybody can extend the vocabulary. No one sits around waiting for some standards body to approve each new term before they’re allowed to coin it in some academic paper. Academic jargon is open not only because anyone who is willing to engage in a study of the field is able to understand the lexicon. It’s also open because anyone who works in that academic field is able to extend the lexicon.

Moreover, extension of the lexicon is based entirely on voluntary adoption of that lexicon within the field. An academic can coin a new term, but that term won’t get adopted into regular usage unless other academics find enough value in the ideas that’s expressed by the new terminology.

And, yes, there is a point where the analogy breaks down. Academic jargon doesn’t get the same copyright protection that XML schemas get, and no academic field is bifurcated into those who produce new studies and those who only read the new studies the way the software field is bifurcated into vendors and users.

But, I think the difference between Joe’s analogy and mine is still instructive in terms of the underlying values that each analogy expresses. Joe’s analogy values user choice of equally adept vendors. My analogy values the ability of vendors to extend software to resolve new user problems. I would contend that both values are worth preserving for the benefit of people who use software.

Users do benefit from software commoditization in their ability to choose different vendors and in the ability of offerings from different vendors to interoperate. But this benefit comes at a sacrifice of product differentiation. Users benefit from product differentiation as vendors strive to solve user problems in new, and more effective, ways.

The ideal solution would be able to accommodate both aspects of “openness.” In the world of software, it might not be possible to come up with a solution that balances both values, but I have difficulty imagining one that does a better job of balancing both than the approach we’ve adopted with Office’s use of XML. The schemas are published with a royalty free license. Anybody is free to use those schemas.

Moreover, the way XML support is implemented in Office, people can extend those schemas. Word 2003 supports custom schemas, and the number of solutions providers who are incorporating Office 2003 into solutions that make use of a number of XML standards relevant to particular vertical industries is growing at an impressive rate.

Lastly, XML, with the inclusion of XSLTs into the standard, provides a ready tool for translating one idiom into another. Through the use of XSLTs, for example, it’s possible to have Office support OASIS’ file format out of the box, albeit with a certain loss of information on the save side.

“Abriendo puertas,” is Spanish for “I’m opening doors.” In an ideal world, we would be “opening doors” for both vendors and for customers to both use common formats and be able to extend them. That is at least what we’re trying to do with the new XML formats. The future will tell us how well we’ve succeeded.

I just hope that the future gets decided by the people who actually have to use the software than either by government fiat or by pundits who have difficulty arriving at a coherent definition of the word “open”.



Currently playing in iTunes: Hablemos El Mismo Idioma by Gloria Estefan

Comments (8)

  1. I think the big issue I’m hearing is concern about using the file formats in open source applications. That is, things like Open Office and other applications being able to make use of these file formats. An additional note on this is making sure these are the default file formats for application use.

    Now, with that said, the bigger issue for me is getting XML support in things like Access or Outlook. I’ve got a lot of applications that will be able to write to an excel spreadsheet easily now (although it wasn’t too hard with .csv files beforehand), but it’s near impossible to do automatic calendaring entries or database queries on exchange/access systems right now. I’m using java for my development environment, and it’d be WONDERFUL to have a java library that would let me connect to Exchange/Access (using a NATIVE java library, not a jdbc-odbc bridge type system, as these applications could run on any java application server, not necessarily a windows app server).

    SO, a definite step in the right direction – it’d just be nice to have a few other file formats in XML.

  2. Tomas says:

    Rick, I read this with interest. My definition of ‘open’ when it comes to document formats involves the following two conditions: (1) It is publicly documented, and (2) it does not restrict interoperability. The latter means not that it is simply royalty free, but that it is unconditionally royalty free, i.e., it does not require any kind of licence whatsoever. In this regard I consider PDF, RTF, and more or less also the Word 8 formats ‘open’ (although the latter is poorly documented), but I consider neither the office XML nor the OpenDocument ‘open’, for both formats are encumbered by patents with restrictive licensing terms.

    Complexity of the design and its specificity to a particular application should not contribute to the determination of ‘openness’. One of the key factors in file format design is that it allows my application to work efficiently with it, and if my application works differently from yours, then my preferred file design is likely to be different. I think this whole idea of a ‘Swiss army file format’ to end all file formats, which is behind the OpenDocument, is deeply misguided. I care about interoperability, but interoperability is not uniformity.

    The central issue is the ownership of the data; the data that is stored in files such as word processing documents does not belong to the application designers, and the owner of that data must, therefore, be free to access it in anyway she wishes, not just on terms of the application designer. Consequently, I think distinction needs to be made between reading data from a file and writing it to the file. I can see how an author of a particular clever file format might want to prevent others from benefiting from her work in storing their users data in her clever way, but I see no way in which you could make a case that as a designer of that format you can restrict others in anyway from parsing it to extract the data in it. In any case European Law goes as far as to allow reverse engineering for interoperability purposes, and I think that will be enough for European geeks to design open source importers for the Office xml file formats.

  3. Simon Phipps says:

    Tomas: OpenDocument is not the subject of any restrictive license that I’m aware of, and to make that doubly clear I’ve explained Sun’s new ‘Covenant’ in my blog http://blogs.sun.com/roller/page/webmink?entry=raising_the_bar_on_patents

  4. Rick, one of the problems with the MS licensing is that it’s not compatible with the GPL:

    <i>People have asked for a yes/no answer for compatibility with the GPL, and the bottom line is I think he is right that the Microsoft license for the Office XML reference schemas is not compatible with the GPL. The GPL says that there can’t be a requirement that you give credit to the author of the program (something called “attribution”). The GPL also says that you can’t put a limitation on sublicensing IP rights. As Craig says, the Microsoft license has both these requirements, so it is not compatible with the GPL. Now, it is really up to you to decide whether or not those conditions are important to you, but from my point of view mentioning that the schema came from us and not sublicensing the IP rights seems to be totally reasonable. Those are really the only two issues I’m hearing that make them incompatible. There are other licenses similar to the GPL that don’t have those added restrictions.</i>

    (from Brian’s blog)

    another issue in his blog:

    <i>I don’t really understand the point about it being changed at any time. If you accept the license, then you have a deal. Microsoft can’t come back later and say the deal is different. I don’t see any restrictions in this license on distribution of programs created under this license.</i>

    Well, actually, software licenses and other licenses can be changed. So while everything up to that point may be under the old deal, everything from that point on is under the new deal. As well, what happens when you modify documents created under the old license with software that has the new terms? If the application deletes and recreates the document when you hit "save", is that still the same document?

    That’s a reasonable concern for people. Format changes are not something to be taken lightly, (MS of all people should be hypersensitive to this after the Office 97 debacle).

    So Brian’s statement that " The files you save will be freely accessible forever." is still really a "maybe". I can use them under the old license as long as I never use anything that might convert them to the new license. What if there’s a critical service pack that does this? (Windows Media is horrid at this. They’ve pulled this stunt a few times) I open the file, make a minor change, voila, new license. Or i have to keep a buggy, unpatched version of the software around to avoid this? Again, "maybe"

    But right now, the MS XML format is not something that GPL software can really use without some kind of "MS XML License condom" library around it. This is ironic, since the GPL tends to create the same problem.

    Now, i’m no fan of the GPL. I don’t consider it terribly open, and in its way, it’s as restrictive as anything else out there, despite cutesy-poo language to assure the masses that "it’s so different". (Copy LEFT? BARF). I like the BSDL, it’s truly open. I like Jordan Hubbard’s statement on the GPL:

    <i>The GPL is not something we really considered to be a license so much as a political manifesto, and speaking purely for myself, I prefer to keep my license agreements and my politics separate. I feel that code which isn’t being used in a situation where it COULD be used is code which isn’t achieving its full potential and the GPL scares a lot of potential users away, which is simply counter-productive in my opinion. I don’t care whether or not the users give their changes back to me, that’s just an added bonus if it happens and nothing I’d want to try and enforce at the point of a gun.</i>

    However, the fact is, that if you want people outside of the MS arena to be able to really use that format, it has to be GPL – friendly. I don’t like that this country hasn’t adopted the metric system, but if I want to function well in the U.S., I need to deal with the (formerly) English system. Opinion doesn’t trump the requirements of the situation. Microsoft can talk around the GPL issue however they want, that won’t change the reality of the situation.

    The Office Win teams intransigence on even being able to directly deal with the OpenDocument format within Office is just silly, and detracts from the force of their argument, ESPECIALLY since this, along with the PDF capabilities in Office 12 Win would solve the Massachusetts problem en toto.