Validation of an XML document against an Xml Schema guarantees that the structure and content of the xml conforms to the types defined in the schema. Does this mean that we automatically elevate the trust level of a document that has passed schema validation? Can we use schema validation as a security layer to our application?
The general recommendation is that validation of an xml document should not preclude the need for secure coding practices in the application that consumes the validated data. That being the case, we know of applications where length facets are used to ensure that a input parameter is not longer than the specified length, pattern facets are used to verify that the input does not pose the risk of SQL/Command injection etc.
W3C Xml Schema is a complicated specification open to a lot of interpretation and we have not reached a stage yet where all the schema processors are 100% compatible. Consider the case where the regular expression implementation in a particular schema processor is different from that specified in the XSD specification. Suddenly, the pattern facet that is supposed to protect the application from injection attacks is no longer safe.
If you are one among the people who answered yes to the questions at the beginning of the article, read on for ways to tighten the security of a validation episode using the XmlSchemaValidationFlags in the System.Xml.Schema namespace in the .NET Framework 2.0
XmlSchemaValidationFlags was introduced in .NET Framework 2.0 in order to mitigate security threats and improve interoperability while performing schema validation using the validating XmlReader or XmlSchemaValidator.
The enumeration has the following values:
Identity constraints, Schema Location hints, Inline schemas and validation warnings will all be ignored
Perform validation for xs:ID, xs:IDREF, xs:key, xs:keyref, xs:unique
Load any inline schemas in the xml instance being validated and add the schema for validation of subsequent xml nodes
Load schemas by following the location hints specified in xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes and use the schemas for validation of subsequent xml nodes
Report any warnings encountered during the validation of the xml instance
Allow xml:* attributes even if they are not defined in the schema. The attributes will be validated based on their data type
Security Implication of XmlSchemaValidationFlags
· DO TURN ON the ReportValidationWarnings flag
By default this flag is not turned on while creating a validating XmlReader using the XmlReaderSettings. (This was done so that users can perform partial validation of an xml instance without having to deal with a large number of warnings for the portions that don’t have a schema. For eg: Validating a WordML document with user content against the user’s schema. Another reason was to improve performance as every warning entails creation of an exception object)
Consider the following example of an order schema and an instance order.xml
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://tempuri/Orders.org" xmlns="http://tempuri/Orders.org" elementFormDefault="qualified">
<xsd:element name="orderid" type="xsd:int"/>
<xsd:element name="item" type="xsd:string" maxOccurs="unbounded"/>
<xsd:element name="address" type="xsd:string"/>
<address>1234 wallaby way</address>
The <orderid> element is invalid according to the schema (A100 is not a valid xsd:int) but validation using a validating XmlReader with the default settings returns without any errors. This happens because the namespace in the xml and the namespace in the schema do not match (http://tempuri/Order.org Vs http://tempuri/Orders.org) and strict schema validation occurs only after finding the schema definition for an element whose name AND namespace match a definition in the schema (Schema-Validity Assessment (Element))
In this case, it is only a warning that schema information could not be found due to the namespace mismatch and since warnings are turned OFF by default, the user sees no evidence that the validation did not happen. If the flag is turned ON, the user should see the following warning:
Could not find schema information for the element 'http://tempuri/Order.org:order'.
An Error Occurred at: file:///E:/bugrepro/order.xml, (1,2)
Note: Warnings will be reported when this flag is turned ON AND a validation event handler is hooked up (to XmlReaderSettings, XmlDocument or XmlSchemaValidator)
· DO NOT TURN ON the ProcessSchemaLocation flag
If this flag is turned ON, Schema Location hints in the xml document (xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes) are followed by the validation engine using the default XmlUrlResolver (unless the XmlResolver property is specifically set to NULL or a secure resolver in which case it takes precedence)
The default resolver does not protect against cross-zone re-direction and adding schemas at validation time by the instance document might change validation outcome by adding new types, redefining existing types etc.
· DO NOT TURN ON the ProcessInlineSchema flag
In addition to allowing new types and redefining existing types by way of the schema allowed inline in the xml document, this will also pose a threat to users who are dependent on strict validation to map their XML into objects since any element can now contain a whole schema as its child node and might cause unexpected errors in the X-O mapping.
· MAY TURN ON the AllowXmlAttributes flag
If allowing attributes from the xml namespace (on any or all elements in the instance even though not specifically allowed by the schema) will not pose a risk to your application.
· MAY TURN ON the ProcessIdentityConstraints flag
If processing of xsd:key. xsd:keyref, xs:ID, xs:IDREF is important to your application and you have determined that the scope of the key/keyref is not such that it might cause a Denial of Service attack.
All of the above are merely guidelines for a secure validation episode using the System.Xml.Schema namespace. The flags you choose are greatly dependent on your application needs but keep in mind the security implications of validated data the next time you are tempted to completely trust such data.
Hope you enjoyed reading this and I greatly appreciate your feedback.