XML 2006 Observations

Article
12/08/2006

I could only attend half the conference due to a family health issue, but here are some thoughts on what I did see. The links are mainly to the conference program; I believe the entries will eventually link to the actual presentation slides and submitted papers.

Roger Bamford’s keynote spent much time showing how the high-level architecture of a typical enterprise application 30 years ago was not that much different than today: 3270 pages vs HTML pages, CICS transaction monitors vs app servers, and a back end DB that is a scarce resource that the rest of the system tries to offload. There was a discussion of how “share nothing” approaches worked then and now to achieve scalability, but he noted that people who truly understand this were scarce then and remain so now.

He got to the main point by arguing that today’s real problems have much to do with a single DB having to support applications with very customized views via side tables to define extended attributes that apply to a customization but are not in the core schema. He noted that XML is more extensible than SQL, and proposed that this core problem could be solved more easily by storing customized data an XML store. That implies XQuery as the query / middle tier programming language … or actually “XML + XQuery(P) + Apache + REST == no middle tier”

His vision of how to bootstrap this seems to be:

- Bamford (personally?) has started something called the FLWOR Foundation to produce an open source XQueryP implementation called Zorba.

- Oracle’s main DB product will be completely interoperable with it at the XQuery code level

- The people with low end requirements use Zorba, and those with enterprise-scale requirements use Oracle.

- I guess the rest of the world will ~~re-engineer to be code compatible with Zorba~~ "correctly" implement the XQueryP standard, and thus we get interoperability.

Very interesting, especially since I bought into this basic vision big time in 1999, which motivated me to take a job at Software AG because of their Tamino XML database. That hasn't worked out particularly well for Software AG, but obviously Oracle has immensely more resources, and maybe the time is ripe now even if it wasn't then. It's quite different from Microsoft's vision, however. We stress the interoperability of data more than attempts to create standards for portable code. We work hard to make our platform and tools the best in the industry, and the fact that many are proprietary is not a drag on interoperability since it is standard data -- which generally means XML these days -- flowing across applications and platforms. A distributed application may use Java and XOM on one node, Linux and XSLT on another, and Windows and LINQ on another ... and any of them may use XQuery to get data out of a database ... but so long as they speak XML to one another each node has no need to care how the others produce and consume the XML. Let the most productive tool win!

I spent the rest of the first day chairing the Enterprise XML Computing sessions. First up, Dana Florescu and Ralf Lämmel shared a session that I introduced as “starting from different ends of the declarative / imperative continuum and ending up in about the same place”. That is, the XQuery people start with a declarative query language and are adding imperative features to make it more easily usable for scripting, etc., whereas the .NET people are taking imperative languages and adding declarative query features to make applications more scalable / streamable / parallelizable / etc. Dana talked about XQueryP and gave a great presentation, with concrete use cases, realistic assessments of the competition (including LINQ to XML!), etc. Ralf talked about how the LINQ to XML team is thinking about extending LINQ to XML to gracefully handle streaming use cases.

There were some good audience comments -- and Dana and Donald Kossman met with me later to reinforce the point -- arguing that XQueryP can do all that stuff in less code (just one nice compact string rather than all those method calls), and will have [someday] an optimizer to figure out whether a particular query can be streamed or not. I think we agreed at the end that we are aiming at different audiences – XQueryP advocates seem to be going after the people who see XML as the center of their data world, like to learn new languages, and demand proper computer science solutions; whereas with LINQ to XML we go after people who use all sorts of data, want to change one thing at a time, get a lot of help from Intellisense to build queries rather than having to type in an arcane string, and not necessarily worry too much about the underlying theory. LINQ to XML is "only" an API and not a declarative language processor / optimizer so we have no option but to ask the programmer to decide whether to write a query over a loaded tree or over a dynamic stream.

Next, Ralf and Igor Peshansky of IBM gave what amounted to compare/contrast talks on how strongly typed XML and web services features are being incorporated into the .NET and Java languages. If you read Ralf's blog you know what we are doing. See https://alphaworks.ibm.com/tech/xj for the Java side of things.

My summary (as chair, trying hard to be neutral) was:

Similarities:

- Both start from the position that current XML processing is too hard and the programming language – XML impedance mismatch needs to be bridged.

- Both provide a more OO-ish experience to their users than is possible with existing XML APIs.

- Both provide automated facilities to map XSD types into language types

- Both can be thought of as competing with XQuery in the programming environment, but don’t undermine XQuery’s value as a query language. (Apparently the XJ folks get "I thought that's what XQuery does" questions a lot too!)

Differences:

- LINQ is being put into the core of .NET whereas XJ essentially creates a domain specific language for XML on top of Java.

- LINQ is the query language, XJ incorporates XPath as the query language

- LINQ provides a common programming model for objects and XML, XJ is disconnected from ordinary Java objects, i.e. you can't query over object graphs or join across XML objects and ordinary Java objects.

- C# doesn’t incorporate XML syntax into the language, XJ (and VB9 of course) does.

- The LINQ framework doesn’t have any special support for web services, XJ does (that was the focus of Igor’s talk).

In the last session of the day, Paul Downey described the work of the Schema Patterns for Databinding WG and how it’s doing useful work (but don’t get no respect). One notable quote: “databinding tools don’t all suck, they just suck in different ways so they don’t interoperate.” I'm just not in a position to know much about or comment on Microsoft's thoughts on this ...

Finally, there was a panel discussion of XML Pipeline Processing. Sam Page described the concept and some real world use cases. Norm Walsh described XProc standardization. The canonical use case is for applying the numerous inclusions, validations, and transformations needed to build the DocBook documentation, and having the different people who work on different platforms be able to get the same result. Norm did a good job of deflating some overheated expectations (e.g. that it is a minimalist alternative to BPEL). Still, having attempted to evangelize an XML pipeline product while at Software AG, IMHO the principal response in the real world is going to be: “I can do that in 10 lines of C#/Java/Python/Perl/whatever, why on earth should I do it in 100 lines of XML?” Also, the WG has agreed to disagree on what the underlying data model might be, so an actual pipeline may consist of nodes that exchange XML text, DOM trees, PSVI serializations of some sort, or whatever. Thus, anyone wanting to write a pipeline component will have to write it differently for every supported implementation… although pipeline end users could write one “script” and have it produce the same result no matter what the implementation. Nevertheless, this looks like a model W3C effort: they're standardizing existing practice rather than doing a "design by committee", finding the intersection of what is known and needed rather than the union of everyone's hopes and dreams, and working in a fast and focused manner. It looks like it will do one thing well, and if you need that one thing done, this will be the spec for you.

The only presentation I saw on Wednesday was Douglas Crockford’s “JSON: The Fat-free alternative to XML” (Slides are already online at https://www.json.org/xml2006.ppt). A couple of things I took away from the talk:

- JSON took a very different approach than XML in a couple of areas: there is no version number because the spec is declared stable forever; it takes a non-draconian “be liberal in what you accept and conservative in what you produce” philosophy to be friendly to supersets (such as YAML www.yaml.org, which is big in the Ruby world)

- The JSON folks are agitating for the infrastructure vendors to support a JSONRequest API https://www.json.org/JSONRequest.html that addresses some limitations in XmlHttpRequest for AJAX environments.

- The JSON people have an interesting response to the question of why it doesn’t have namespaces, which echoes the feelings of many namespace-haters in the XML world:

Every object is a namespace. Its set of keys is independent of all other objects, even exclusive of nesting.
JSON uses context to avoid ambiguity, just as programming languages do.

- There is a JSONT spec that defines a way to declaratively transform JSON to HTML, XML, etc. https://goessner.net/articles/jsont/

I'm not sure if Crockford was positioning JSON as a better-XML-than-XML for data interoperability across applications. I had thought of JSON as “the real X in AJAX”, i.e. for communicating between browser and server in a Web application, but not a serious contender for interoperability purposes. Will we see JSON substituting for XML in SOAP messages, RSS feeds, etc.? Will people store JSON persistently? That will require JSON-flavored APIs, schema/”data contract” languages, query/transformation languages, apps, etc. If this starts to get momentum (I’m talking about momentum as a data interop and storage format, not the momentum it clearly has for “Web 2.0” server-browser communication), things could get interesting. I'm not even sure how I feel about that ... Maybe the time is ripe for the stuff we talked about on the sml-dev mailing list years ago. YAML evolved out of that list, and JSON is essentially a subset of YAML. I'm sure that lots of us at in the MS Data Programmability team are pondering what we should think and do about JSON. More later.

XML 2006 Observations

Additional resources