Typed XML programmer -- Where do you want to go tomorrow?

This post starts a series (of blog posts) on what I would like to call “Typed XML programming”. The overall goal of the series is to engage in a discussion on requirements, scenarios and priorities around typed XML programming. The first post sets up some real basics, poses some questions, and hopefully triggers appetite in getting back to this thread.

What is typed XML programming anyway?

In elevator speech, I mean by that “XML programming in mainstream OO languages like C#, Java and VB while leveraging XML schemas as part of the XML programming model”. I am trying to scope out XSLT, XQuery and other DSLs in the present series, if you don’t mind. Otherwise, I would like to go for a broad definition of XML programming including scenarios such as (i) consuming XML as input for an application; (ii) producing XML as output of an application; (iii) operating on in memory representations of XML; (iv) streaming over XML; (v) accessing XML in the database, and what have you.

 

Let’s start with ‘untyped’ XML programming. Here is an archetypal C# function that takes an (in-memory) XML tree with purchase orders and calculates the total over all order items (i.e., sum up price times quantity for all items):

// Use your favorite XML API (such as DOM or … XLinq in my case)

public static double GetTotalByZip(XElement os, int zip)

{

   double total = 0.0;

   foreach (XElement o in os.Elements("order"))

     if ((int)o.Attribute("zip") == zip)

       foreach (XElement i in o.Elements("item"))

         total += (double)i.Element("price")

               * (int)i.Element("quantity");

   return total;

}

 

 

It is somewhat discriminatory to label the above code as ‘untyped’ since the mere use of the XML API is still subjected to static type checking; also, the look-up of elements and attributes is sort of dynamically checked. Likewise, I would like to avoid restricting ‘typed’ XML programming to a narrow notion of static typing. Instead, XML types (aka XML schemas) may contribute to the XML programming model in various ways such as validation protocols, precondition checking, exception handling, intellisense, tool tips and others. For now, let me just do the most obvious thing -- assume a C# object model for the kind of elements in the purchase-order example. (The object model may have been derived from an XML schema by a code generator like xsd.exe.) Based on such an object model, the above ‘untyped’ XLinq code is transcribed to a ‘typed’ C# function as follows:

 

 

// We presume object types for order collections, orders and order items.

public static double GetTotalByZip(orders os, int zip)

{

   double total = 0.0;

   foreach (order o in os.order)

     if (o.zip == zip)

       foreach (item i in o.item)

         total += i.price * i.quantity;

   return total;

}

 

 

For clarity, let’s show the diff on the untyped vs. typed versions.

I strike through ‘untyped slack’:

 

 

public static double GetTotalByZip(XElement orders os, int zip)

{

   double total = 0.0;

   foreach (XElement order o in os.Elements("order"))

     if ((int)o.Attribute("zip") == zip)

       foreach (XElement item i in o.Elements("item"))

         total += (double)i.Element("price")

                * (int)i.Element("quantity");

   return total;

}

 

 

So in this instance of typed XML programming, we managed to get rid of all casts, all string-encoded element names and attributes, and we might have enjoyed intellisense and tool tips as we typed in the code. Furthermore, type checking prevented us from several kinds of typos, but we had to type in considerably less code anyhow. Finally, we also enjoy the object types at run-time helping us in debugging and dispatching efforts. It sounds like typed XML programming is a good idea, but I am of course aware of contrary opinions (and I promise to get back to them later in the series). Let me say that typed XML programming gets a lot of attention. For instance, check out the sheer number of technologies for XML data binding and research efforts on programming languages for typed XML programming (cf. Comega, XJ, Xtatic, etc.).

Requirements? Scenarios? Priorities?

I haven’t provided much context yet for a deep discussion, but let’s assume that readers of this blog have a certain understanding of “Typed XML programming -- today”. So what I would like to do now is pose some questions, which can be summarized as follows: “Typed XML programmer -- Where do you want to go tomorrow?

  1. Do we expect OO developers to understand XML types?
  2. Is XML Schema the right basis for typed XML programming?
  3. What are the MoSCoW requirements for typed XML programming?
  4. What are the key weaknesses of current XML data-binding technologies?
  5. What are the expectations or reservations regarding XML/OO `language cocktails’?
  6. How much do we care about X/O mapping when compared to O/R mapping?
  7. How do we (programmatically or otherwise) mediate between given XML and OO types?
  8. What other questions should have been posed here?

 

In a few days, I am getting back to you.

My plan is to mumble a bit about “Typed XML programming -- today”.

 

Ralf Lämmel