Getting "DTD is prohibited" from MSXML?

MSXML 6 is our recommended version of MSXML for various reasons, one of which is that is has a number of security enhancements, as mentioned here. An example is a more consistent 'secure by default' policy, which for example locks down the surface area by prohibiting DTDs by default.

Here is a JScript file you can use to look at some of the differences. As you can see, it'll use versions 3.0 and 6.0 of MSXML, leaving the prohibitDTD property to its default value, or setting it to a specific value.

// Save to xml.js, run with cscript //nologo xml.js
function createDocumentFromText(xmlText, version, prohibitDTD) {
var result = new ActiveXObject("Msxml2.DOMDocument." + version);
if (prohibitDTD != null) {
result.setProperty("ProhibitDTD", prohibitDTD);
}
result.async = false;
result.loadXML(xmlText);
return result;
}

function echoDocumentErrors(document) {
if (document.parseError.errorCode == 0) {
WScript.Echo("no errors found while parsing");
} else {
WScript.Echo("error: " + document.parseError.reason);
}
}

var xmlText =
"<?xml version='1.0'?>" +
"<!DOCTYPE a [ <!ELEMENT a (b+)> <!ELEMENT b (#PCDATA)> ]>" +
"<a><b>1</b><b>2</b></a>";

var versions = [ "3.0", "6.0" ];
var booleans = [ null, true, false ];

for (var i in versions) {
for (var j in booleans ) {
WScript.Echo("version [" + versions[i] + "] prohibit-dtd [" + booleans[j] + "]");
var doc = createDocumentFromText(xmlText, versions[i], booleans[j]);
echoDocumentErrors(doc);
WScript.Echo("XML: **" + doc.xml + "**");
WScript.Echo("");
}
}

The magic happens in createDocumentFromText. It sets the prohibitDTD flag using the setProperty property on the DOMDocument object before parsing.

Here is the output you'll get from running this script.

>cscript //nologo xml.js
version [3.0] prohibit-dtd [null]
no errors found while parsing
XML: **<?xml version="1.0"?>
<!DOCTYPE a [ <!ELEMENT a (b+)> <!ELEMENT b (#PCDATA)> ]>
<a><b>1</b><b>2</b></a>
**

version [3.0] prohibit-dtd [true]
error: Invalid at the top level of the document.

XML: ****

version [3.0] prohibit-dtd [false]
no errors found while parsing
XML: **<?xml version="1.0"?>
<!DOCTYPE a [ <!ELEMENT a (b+)> <!ELEMENT b (#PCDATA)> ]>
<a><b>1</b><b>2</b></a>
**

version [6.0] prohibit-dtd [null]
error: DTD is prohibited.

XML: ****

version [6.0] prohibit-dtd [true]
error: DTD is prohibited.

XML: ****

version [6.0] prohibit-dtd [false]
no errors found while parsing
XML: **<?xml version="1.0"?>
<!DOCTYPE a [ <!ELEMENT a (b+)> <!ELEMENT b (#PCDATA)> ]>
<a><b>1</b><b>2</b></a>
**

Note that the DTD will fail to parse in these cases: version 6.0 with default value (false), version 6.0 with value explicitly set to false, and version 3.0 with value explicitly set to false. The "gotcha" is for application authors that move from version 3.0 to 6.0 without setting this explicitly, as they'll get different behavior.

Of course, there's a reason why this change was made. A malicious DTD can mess with your application, so you'll want to enable DTD processing only when you trust the source you're about to load.

For more information, you can read the MSXML Security Overview.

Enjoy!