XML's overhead will open wallets?

There's a new article on the overhead that XML creates on networks, and what can be done about it:"Eyes, wallets wide open to XML traffic woes in 2005" This is a topic near and dear to my heart:  I've been involved in  long-running threads on the xml-dev mailing list on this, and gave a paper / presentation on this at the XML 2004 conference.  Let's look at the points raised in the serarchwebservices article in some detail: I think it addresses a real challenge that people are having with XML, but it paints a somewhat misleading picture of the alternative solutions.

 

First, the article begins: "Enterprise affection for XML Web services may have C-level hearts fluttering over the immediate efficiency and productivity gains, but the other shoe is about to drop in this relationship." The obvious rejoinder is that in most organizations, human efficiency and productivity gains add vastly more to the bottom line than savings on hardware and wired network bandwidth, which gets cheaper and cheaper all the time.

Next, it is true that many people are starting to "realize en masse how taxing XML is on enterprise networks".  This is true, but only in a couple of fairly specific scenarios. As I put it in the XML 2004 paper:

 

It is quite clear from surveying the research in this area that XML really does impose a significant overhead on a significant set of real-world applications, especially those in enterprise-class transaction processing environments and those involving wireless communication. In both scenarios it is clear that developers, vendors, and customers desire the benefits of standards-based portability and interoperability, but are unable to use XML in its current form.

Furthermore, currently deployed technological fixes do not alleviate this pain for these two classes of users. As for reducing size, conventional text compression algorithms do not work at all on the short messages with little redundant text that are common in web services applications and preferred by wireless developers. Likewise, the studies noted above generally show that the processing of of these algorithms often negates any perceived performance benefit from reducing the amount of bandwidth needed to send message. Furthermore, "throwing hardware at the problem" is not a viable option for battery powered mobile devices with intrinsically limited bandwidth and where every extra CPU cycle drains the battery all the sooner.

 

Let's be clear, however -- this refers to a relatively small number of use cases in which XML could be valuable, but its size or processing overhead stands in the way of its widespread use today.  The article says "Users and experts expect 2005 to be the year companies realize en masse how taxing XML is on enterprise networks, sparking a spending spree on XML acceleration products and optimized appliances that offload this burden."  Time will tell, of course, but I would find these predictions more credible if the article itself didn't have some factual errors.

 

For example,  the author asserts that "standards bodies like the World Wide Web Consortium (W3C) work in the shadows on the ratification of a single binary XML standard that could bring an about-face to the commitment companies have to the ASCII text encoding that is currently the foundation of XML 1.0."  This is not even close to being true.  Besides the technical point that XML 1.x defines a Unicode text, not ASCII encoding :-) the real objective of the W3C XML Binary Characterization Working Group  is:

 

 gathering information about uses cases where the overhead of generating, parsing, transmitting, storing, or accessing XML-based data may be deemed too great for a particular application, characterizing the properties that XML provides as well as those that are required by the use cases, and establishing objective, shared measurements to help judge whether XML 1.x and alternate (binary) encodings provide the required properties.

 

The W3C is explicitly not ratifying a single binary XML standard, it is investigating whether that is even worth attempting.  The early indication seems to be that while specialized, proprietary binary formats are widespread across the XML industry, finding a generalized standard binary format will be somewhere between politically difficult and technically impossible.

 

Finally, the article quotes James Kobielus of the Burton Group

Network managers are going to implement these XML acceleration appliances to offload the overhead of XML processing from application servers so [the app servers] can focus on their core competency, which is business logic.

 

I'm highly skeptical of this, although I am intrigued by the capabilities of the XML acceleration appliances.  First, as it stands now, the acceleration appliances can only be used by network managers to offload processing of standalone operations such as XSLT transformations or processing WS-Security SOAP headers.  Using them to offload the time consuming aspects of XML processing from general-purpose hardware requires more involvement and investment from the industry as a whole.  Examples would include  software products that detect the presence of XML acceleration hardware and use APIs that exploit it, and standards for efficiently exchanging parsed XML Infosets across hardware components in a distributed system.  

 

Where does that leave us?  As I see it (and stealing from my XML 2004 presentation):

 

  • We have to deal with the reality that XML really requires too much bandwidth for many wireless scenarios, and requires far more processing resources than equivalent formats in transaction processing scenarios. Moore's Law won't make them go away, because it doesn't apply to wireless bandwidth or batteries. The bare facts are not really in dispute, what is in dispute is how to reduce the costs without destroying the benefits of XML. There are numerous alternatives, including XML-specific compression algorithms and improved XML text parsers, that are being researched that would not require end-user eyes or wallets to be opened.
  • No known alternative offers anything resembling a silver bullet. There are probably plenty of alternative serializations of the XML Infoset that would be both smaller to transmit and faster to parse than XML text, but whether they offer enough value to justify putting them into the XML core is not at all clear. Likewise it is clear that dedicated hardware components can parse XML more efficiently than conventional parsers, but it is much less clear whether this translates into more cost-effective systems.
  • As with all software optimization, the first thing to do is to determine where the bottlenecks are and figure out how to address them. Many of the "XML is bloated and slow" complaints I hear could be alleviated by being more clever about how technology is used: "Doctor, Doctor it costs me lots of money per megabyte of XML I download to my mobile device." Uhh, get a better mobile data plan? Or,  "Doctor, doctor, it hurts when I try to process a 1-MB file to find the two attribute values I need"! Uhh, so don't DO that. Don't use expensive validation unless you really get value from it, restructure the XML so that a pull parser or SAX can find what you need quickly, use the right tool for the job, whatever it takes.
  • Use enterprise-class tools to do the heavy lifting: Leverage the support for XML in database products such as SQL Server 2005 to pull out small chunks of relevant XML rather than forcing the parser to do that job. Use the fastest XML technology available, even if it costs money.
  • Accept that premature standardization is the problem, not the solution. It is probably best to let individual industries such as wireless figure out serializations that meet their needs and then come to more global organizations such as W3C for standardization. It may be that experimentation and evolution brings us to a single, optimal serialization format toward which we can all migrate, but it is a very good bet that design-by-committee and consortium politics will not. Yes, there will be a period of confusion and inefficiency as developers have to support multiple formats for different user bases, and it will probably be obvious what we should have done in 20:20 hindsight. But so long as the alternative formats are relatively simple, it should be no more difficult to handle diversity than it is to handle the multiple graphics formats that are in widespread use -- even mobile devices typically support JPEG, GIF, and PNG.