Recently I was researching an issue submitted by Daniel Cazzulino to the XML team about a problem that he ran into using XInclude and XML Schema together. As you probably know you can use XInclude in XML instances to bring XML into a document from different locations. After a piece of XML is XIncluded an xml:base attribute is left on the element to allow for round tripping in edit scenarios. The problem arises when attempting to validate the resulting XML instance. XML Schema treats xml attributes (attributes defined in the http://www.w3.org/XML/1998/namespace) just like any other attribute, meaning that the xml:base that XInclude leaves behind must be defined in the content model of the element that was XIncluded. Thus an XML Schema author would need to anticipate any place in an instance that could be XIncluded and add an xml:base (or perhaps anyAttribute with namespace=http://www.w3.org/XML/1998/namespace). The result of this, at least from what I have learned so far, is that the interaction between XInclude and XML Schema is substantially hampered (is broken an overstatement?).
I’ve talked to various people about this issue and no has been able to give me a very good justification why the xml attributes (currently xml:lang, xml:space, xml:base with xml:id coming) are not treated in more of a lax way by XML Schema. As you probably know the xml namespace is implicitly defined in an XML instance and the xml prefix is reserved for this namespace. Schema however has no special treatment of XML attributes. I posted about this to the W3C Schema Interest group (W3C member only list) and Henry Thompson gave a reasonable response that this was a carry over from XML 1.0 where DTDs required that xml attributes be in the appropriate ATTLISTs but also commented, in hindsight, this issue is probably something to look into further.
My thinking on this now is that it would be best if XML Schema would treat xml attributes laxly meaning something like:
- If an xml attribute is encountered in an instance and it is NOT defined in the content model then validate the data type only (e.g., validate that xml:base is anyURI) against the xml namespace schema. This data type validation would assure that it really is an XML attribute and that the instance data at least conformed to the data typed defined.
- If an xml attribute is encountered in an instance and it is defined in the content model of the schema then validate then validate the xml atribute against the content model definition as well as the data type.
Also, if this approach is taken then there would also need to be an implicit import of the xml namespace into XML Schemas similar to how the xml prefix is implicitly defined since a schema author wouldn’t know if an instance were to use an XML attribute.
Taking this approach for all XML attributes would solve the XInclude schema mismatch problem that Daniel pointed out but would also help deal with the fundamental issue for these xml attributes in other use cases.
In MS System.XML.Schema validation in the Widbey RTM we will most likely at least create a switch to allow instance validation allowing xml attributes (as Daniel requested) but as I research this issue it seems more and more that this should be the default way of dealing with xml attribute validation …