Feeds and well-formed XML

Here in Windows, we’re working hard on Windows Vista Beta 2, and we’ve recently been doing some work on how we parse feeds.

Our years of experience in with HTML in Internet Explorer have taught us the long-term pain that results from being too liberal with what you accept from others. Hence, we’ve adopted the following overriding principle for IE 7 and RSS platform in Windows Vista: 

   We will only support feeds that are well-formed XML.

This principle allows us to build a more predictable feed parser. As a platform, it’s important that applications using the platform to consume feeds can rely on the fact that the platform will always be providing information in the way that the publisher intended (trying to guess what a publisher meant to do when there is an error in a feed can be tricky, at best). We also spoke to several people in the RSS and developer community at Gnomedex and at PDC, and they wholeheartedly supported this.

When viewing a feed that doesn’t validate as correct XML, IE7 will flag it (and highlight the error, just like we do today for generically bad XML feeds – so feed publishers can see what’s going on). When the platform downloads a feed with errors during regular updates, it will discard that update, and will try again at the next scheduled download (so feeds with temporary errors won’t be permanently affected).

That said, we do recognize that there is a great deal of variance in the actual content of RSS feeds, so we’ll be more liberal when it comes to what elements are required in a feed. We will post on exactly how we’re handling different feeds in a future post.

– Sean

Comments (28)

  1. This is the right thing to do, and I’m glad you’re doing it – thanks.

  2. Nicholas says:

    Definitely the right thing to do!

  3. Der Haken says:

    Yes, … we are amazed! Probably you are right, however … Good luck folks

  4. Reed says:

    So you have decided not to go by the old maxim "be liberal in what you accept, strict in what you produce"?

  5. Arken says:

    Why not well formed and well defined and just stick to Atom

  6. Danny says:

    Are you going to have any policy on the content markup – XHTML or tag soup? Singley, doubley or trebley escaped? Silent data loss?




  7. Yes, if only Microsoft had this philosophy with IE4, we wouldn’t have all the problems with IE we have today.

    Because Netscape would still be the only browser anyone used…

  8. Der Haken says:

    > Because Netscape would still be the only browser anyone used …

    Maybe, we’d all be better off with Netscape being the only browser anyone used?

    No, sorry, just kidding, but honestly, will your ‘well formed’ claim hold for the parsing methods of your folks search engine as well? Probably not, ’cause then in 2009 Google’ll still be the only searchengine anyone will use.

    Obeying Postel’s law is a question of having the appropriate marketshare, hence standardization power … If you haven’t enough you’d better obey.

    Anyway, I love standard ‘compliantnes’. So, go ahead.

  9. Does this mean that you will follow RFC 3023 (i.e. XML served over HTTP) to the letter as well?

  10. greatdevourer says:

    The problem with this is that I can see M$ creating their own standard of RSS, and everything else being "wrong"

  11. Mark Munz says:

    This is like Ford declaring that their cars will only run on roads that do not contain any flaws. Nice in theory, but totally unrealistic.

    For me, it would result in me moving immediately to something that is more focused on letting me (the customer) do my task rather than being "right".

    I think it is legitimate to flag feeds that are not well-formed, but it is completely user-UNfriendly to discard them.

    And ironically, this is coming from a company that plans to keep some long standing CSS bugs in IE7. If you plan to stick 100% to the spec, you need to be consistent and do it with EVERYTHING.

    The first step is to educate. Let folks know just how many broken feeds there are. Then… maybe… in Vista, you can consider an OPTION to ignore feeds that are not well-formed.

  12. Brian says:

    This isn’t like Ford declaring it will only run only on a perfect road. Its Ford declaring all cars are not ATVs.

    How many people actualy write there own RSS feed?

    Most RSS feeds are generated, ala feedburner, so this should be a non issue. If it is the content writer should be fixing this.

  13. Gordon Q says:

    I find it hilarious, that you take this stance now. (it is almost the right stance)

    That said, if you are going to post articles like this, could you at least pretend that you know what you are talking about, by posting messages in a standard format?

    < A > (spaced to avoid deletion only) is not a valid XHTML tag. The < BR > tag is also wrong, and not self closing. Ditto for < IMG > tags, (PS I didn’t find title attributes on them either) in fact, your whole RSS and ATOM Feeds for this darn blog, fail the guidelines you are trying to preach.

    OMG! What is this! (from your feed)

    "< /FONT >< /FONT >< /FONT >< /FONT >< /FONT >< /FONT >< /FONT >< /FONT >"

  14. Ross Rader says:

    How will you be dealing with instances where the spec is vague or inconclusive? For instance, feeds that have multiple enclosures per item?

  15. Zephris says:

    And I suppose being too liberal in your support of other’s work doesn’t include png files?

    You’ve got a lot of work to do for IE7 if you want to continue to hold market share. RSS feeds are only a small part. A step in the right direct, granted. I’ll be surprised if you can manage to pull it off. Tabbed browsing, close security issues, revamp the options to be easier to set, *proper* support for standards … if you can get those, you’ll be good. Just don’t stop now.

  16. bertboerland says:

    If only MSFT understood the most important quote from all RFC’s (793)

    The Internet Robustness Principle: "Be liberal in what you accept, and conservative in what you send."

    If only MSFT understood RFC’s…

    If only..

  17. I have heard a few bloggers complaining about this, I can only hope that Microsoft stays with WC3 standards and continues IE/WC3 compliancy. Dont get me wrong – I love what Microsoft has and is doing in the internet community, I just hope that the whole world will be able to benifit from this related work.

  18. Jordan says:

    This is fantastic guys, a great move on Microsofts part. I hope the new IE7 will also follow other WC3 Standards aswell, it will make life a whole lot easier for all of us, not to mention be a big boost in how we typically think of Microsoft, this is definitely a good thing. Keep it up Guys, I’m really excited to see a Standards Compliant IE7.

  19. WC3 Supporter says:

    Microsoft, do you just like pissing people off? First you go off on not following browser standards to ensure ‘the best possible user experience’, aka proprietary hell. Now instead of full XML (and no ATOM whatsoever) support, you are going half-assed by only supporting ‘properly formated’ XML. You can’t have it both ways, either be fully WC3 compliant, or accept all comers. All you Microshaft jarheads get your heads outta the sand. Your Bill is not interested in an open, compatible web, but a Microsoft-proprietary revenue maximizing walled garden. Let the Open Source movement (with a little help from Google) turn Microsoft into what it deserves to be, technological roadkill.

  20. Nicholas says:

    WC3 Supporter, I’m guessing you meant to say "W3C Supporter"? Anyhow, you state that IE7 will not support ATOM, <a href="http://blogs.msdn.com/rssteam/archive/2005/08/03/446904.aspx">but it will</a>.

    Regarding IE7’s stance on bad XML; how is this "half-assing it", by only supporting properly formatted XML? Why should Microsoft have the responsibility of trying to decipher someone else’s malformed XML? Even if they did that, you guys would complain because it didn’t parse it like you were expecting, blah blah.

    It’s funny how quickly people forget that Microsoft essentially brought us AJAX years and years ago, as well as drove competition to create DHTML and other tricks we "take for granted" today. Do yourself a favor and really research the history of browsers for the past 5+ years.

  21. Matt says:

    As others have stated, I’m very concerned that Microsoft’s definition (even though a w3c.org page was referenced) will differ from the rest of the world. It’s been done too often in the past. Microsoft has never played nice with others. I sure hope you stick to your goals of standard compliance (at least in this one case)

  22. Matt and others. Note that this statement is only about WF, not about "valid" or conforming to any other spec. WF definition is pretty clear, and is the bare-bones minimum requirement for a document to be considered "XML". W3C is very clear on this — if it’s not WF, it’s not XML, period. So it’s analagous to saying, "the RSS platform in Vista requires that your XML feeds actually be XML". So it’s not as dramatic a statement as some people are assuming. W3C spec is extremely clear on this point, and all the major XML stacks honor the spec.

    Also, Bert, you quote Postel’s law, which doesn’t apply to XML. A web search for "postel and wellformed" should give a good historical perspective on why industry practice is draconian when it comes to WF. This is straight from W3C spec, which mandates that XML parsers MUST fail with FATAL error and NOT attempt to recover on a WF violation. Any parser which attempts to liberally recover from a WF violation is NOT an XML parser per W3C and is not considered to be parsing XML.

  23. Peter Nixey says:

    Hurrah and well done. Postel’s law is a very good law but has been totally bastardised by lazy coders.

    XML is useful because it works with standardised tools and in a predictable manner. If it doesn’t do this then it’s not useful.

    If there are x producers of XML in the world and y parsers then for each distinct error on the producrs part (Nx) there must be Yx fixes = Nx.Yx = a lot. If the producers mind their own shop then there are only Nx fixes required = a lot less = less time = less money/more features.

    Being free with what you accept is not a catch all clause. Requiring well formed XML is the least a parser should do. If not then where does it end – should an XML parser also include image recognition software in case <a href="http://neopoleon.com/blog/posts/434.aspx&quot; title="Excel as a database">someone in marketing</a> is asked for an XML feed?

  24. Nick Katsivelos says:

    I just want to echo Nick Bradbury – why say more – he has all the cred anyone could ever need!

  25. Brian says:

    > We will only support feeds that are well-formed XML.

    Nice idea, product killer in practice. Your pain came from accommodating malformed input in the core code. Don’t do that. Write a layer that consumes all manner of mangled gibberish and emits a well formed document. Have the product code see only this well formed document.

  26. the1geek says:

    How about "valid" RSS and its various namespace elements?

    Case & point – all Microsoft blog RSS use invalid wfw:commentRss, which should really be wfw:commentRSS (remember XML is case-sensitive?)

    Please read the specs @ http://wellformedweb.org/news/wfw_namespace_elements/