MSXML XHTML DTDs - making the web better

The fix to requests for XHTML DTD files from the W3C Web server has been recently released. Windows Update should offer the fix automatically, but you can download and install the fix manually from the following links for various MSXML versions:

https://support.microsoft.com/?kbid=973688 [MSXML4 SP2... if you haven't tried SP3 yet see below]
https://support.microsoft.com/?kbid=973685 [MSXML4 SP3]
https://support.microsoft.com/?kbid=973686 [MSXML6 Out-Of-Band, for XP SP2 and Win2K3]
https://support.microsoft.com/?kbid=973687 [MSXML3 and MSXML6 for all OSes where these components are in-band]

What exactly is this about? Well, there are a number of cases where this can happen, but one common scenario that we've seen people run into when writing AJAX web pages is to request a page of information from the server, load it into an XML document and extract some information or merge it into the existing page.

The problem with this approach is that when you load the document into an MSXML document "by hand" (ie, not through responseXML on the XHR object), DTD processing is enabled, and the DOCTYPE declaration directs MSXML to go look up the XHTML DTD so you can use entities like  .

Multiply this by each web browser accessing a popular site, and you can imagine why no-one is happy with this situation: web sites break when the resource is not available, the W3C servers get overloaded, and users of web sites lose functionality with odd scripting errors.

The fix caches the XHTML DTDs in MSXML - these resources haven't changed for years and will likely end up in different URLs if something new comes along. So now clients save at least one round-trip (possibly more) and always get the DTD support they need.

Try running this bit of script in a .js file to see the awesomeness in action.

function pullXHtml() {
  var xml = new ActiveXObject("Msxml2.DOMDocument.3.0");
  xml.async = false;
  xml.loadXML(
    "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">" +
    "<html xmlns='https://www.w3.org/1999/xhtml' xml:lang='en' lang='en'><head><title>simple document</title></head>" +
    "<body><p>a simple&nbsp;paragraph</p></body></html>");
  if (xml.parseError.errorCode != 0) {
    var myErr = xml.parseError;
    WScript.Echo("You have error " + myErr.reason);
  } else {
    WScript.echo("Yay!");
    WScript.echo(xml.xml);
  }
}

pullXHtml();

If you don't have the update, you might get an error such as the following.

You have error The server did not understand the request, or the request was invalid.
Error processing resource 'https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'.

If you do have the update, you'll see the parsed XML. Run this with Fiddler open and you'll "see" the absence of network activity. 

Enjoy!