Article Idea: Processing XML in the Real World


Coincidentally just as I finished reading a post by Tim Bray about Private Syndication, I got two bug reports filed almost simultaneously about RSS Bandit’s support for secure RSS feeds. The first was SSL challenge for non-root certs where the user complained that instead of prompting the user when there is a problem with an SSL certificate like browsers do we simply fail. One could argue that this is the right thing to do especially when you have folks like Tim Bray suggesting that bank transactions and medical records should be flowing through RSS. However given the precedent set by web browsers we’ll probably be changing our behavior. The second bug was that RSS Bandit doesn’t support cookies. Many services use cookies to track authenticated users as well as provide individual views tailored to a user. Although there are a number of folks who tend to consider cookies a privacy issue, most popular websites use them and they are mostly harmless. I’ll likely fix this bug in the next couple of weeks.  

These bug reports in combination with a couple more issues I’ve had to deal with while writing  code to process RSS feeds in RSS Bandit has givn me my inspiration for my next Extreme XML column. I suspect there’s a lot of mileage that can be obtained from an article that dives deep into the various issues one deals with while processing XML on the Web (DTDs, proxies, cookies, SSL, unexpected markup, etc) which uses RSS as the concrete example. Any of the readers of my column care to comment on whether they’d like to see such an article and if so what they’d want to see covered?


Comments (3)

  1. Real World XML sounds good

    I’m not exactly a regular reader though. I liked the C-omega article though

    and passed it round the office. C-omega: I definitely need something like that,

    but it’s a little hard to explain to other people why.

    Tenuously linked anecdote from my last project…

    I needed to take some input XML and transform it to pseudo-XML while interacting with

    a database.

    Input XML can have about 40 different structures, called "templates". It’s produced

    by string manipulation in code, so it’s not guaranteed to fit any DTD. Example: it splits

    bunches of text into tags called line1, line2 etc depending on the length of text, with no upper limit.

    It’s a great relief to work with this version of the input system, since it checks for

    well-formed-ness. In the previous version, if a data entry guy typed in a quote or something

    the XML wasn’t even well-formed.

    The input system developers think I’m being totally anal even by insisting on well-formed XML. XML is just

    text, right?

    Ah, life in the real world…