Get a Real XML Parser

Today's post is more observational than informational. Enjoy.

It's sometimes possible to write XML without having an XML library. If your XML documents are sufficiently similar and templated, then you can craft validly formed XML through little more than string manipulation. The trivial case is where the string is a constant expression and the XML document is actually the same every time through. The more interesting case is where the string has many spots where you can fill-in-the-blank with arbitrary content. Once that content becomes constrained instead of arbitrary, you'll almost always want to make use of an XML library assuming that you want your emitted messages to be validated.

It's pretty much impossible to read XML without having an XML library. XML simply has too many rules about whitespace handling, processing instructions, character formats, and nested elements to realistically build hand-crafted parsers in an application. A hand-crafted parser likely supports exactly the message that you saw one day looking at the wire but contains numerous bugs that prevent all of the equivalent ways of writing the same message from being accepted. Even seemingly unimportant changes to the network configuration can result in literal changes, such as the text encoding or the division of text into text nodes expected by a DOM. If you want to understand XML, then you'll need to get a real XML parser. Using an XML parser instead of string manipulation is more expensive but if you can't afford this cost, then you probably can't afford the benefits that XML provides either.

Next time: Accessing the Query String