Understanding XML Schema Sets in the XSD Designer

We have recently blogged about the new XML Schema Designer and the various views over schemas offered by it. We visit here the concept of schema sets, which are actually the central organizing concept around what is shown in the designer for a buffer in Visual Studio for which we have XML Schema information (be it an XSD file or an XML file or Visual Basic project with associated schemas). A schema set can be thought of as a collection of pairs, the first part of each pair being an XML namespace and the second part being the location of an XSD file that is associated to that namespace in the set. A namespace can have multiple files associated to it in a particular set and a file can be part of a multiple namespaces (due to the XML Schema concept of chameleons, whereby a file without a targetNamespace attributed defined for its schema element automatically assumes the targetNamespace of any and all files that include it). We explore below the ways in which schema sets are built and computed and how they are visualized in the XML Schema Designer.

A schema set is basically built by walking the tree of external references from a particular set of root XSD files. An external reference is either an include element, which brings in the schema at the specified schemaLocation attribute into the schema in the same namespace as the current schema, or an import element, which specifies a particular XML namespace to import as well as, optionally, a particular schema location where the schema processor can find an XSD file for that namespace (there is also a redefine element defined in XML Schema, but we can treat as essentially equivalent to an include for this discussion). To illustrate these concepts and how they get applied, let’s see them in use in a particular sample industry schema (brainml.xsd, an XML Schema for neurological modeling defined at https://brainml.org ).

When we first open brainml.xsd in Visual Studio, this becomes the root of the XML Schema Set to be constructed and displayed in the Schema Explorer tree that comes up to help visualize the hierarchy of a schema set. Analyzing the external references (which must always be declared at the top before any globals are defined), we see the following two:

<xs:import namespace="https://www.w3.org/XML/1998/namespace" />

<xs:import namespace="urn:bml/brainml.org:internal/BrainMetaL/1" schemaLocation="citation.xsd"/>

This gets interpreted as a request to bring in schemas for the two namespaces “https://www.w3.org/XML/1998/namespace” and “urn:bml/brainml.org:internal/BrainMetaL/1”. In the second case, we are also given a location hint where to find an xsd file for this namespace (more on how the first one is resolved a bit later), so we bring in the citation.xsd file into the set and, analyzing it, we see the following externals:

<xs:import namespace="https://www.w3.org/XML/1998/namespace" />

<xs:import namespace="https://www.w3.org/1999/xlink" schemaLocation="xlink.xsd"/>

<xs:include schemaLocation="brainmetal.xsd"/>

The first import is like the one we had previously seen for the XML namespace and will get resolved the same way. The second again brings in a new namespace into the set and a new file (xlink.xsd) for this namespace. The third includes another file (brainmetal.xsd) that has the same namespace (“urn:bml/brainml.org:internal/BrainMetaL/1”) as the targetNamespace of the current file. This process gets repeated for the rest of the unprocessed files in the set, though no new references are introduced in any of these, so finally we end up with the following tree view in our schema explorer of the namespaces and files in the set.

clip_image002

So how did the xml.xsd file get found and associated to the xml namespace in the set? The answer is that, aside from schema location, Visual Studio also has other places that it looks in for schemas to resolve namespaces when a specific schema location is not provided (remember, the schemaLocation is just a hint to the schema processor, which can apply its knowledge of the environment to figure out how to resolve a namespace). Visual Studio will look in the current project, solution and even other open schemas to resolve schema references, and also comes preconfigured with a set of well known schemas, such as the schema for the xml namespace that is being referenced here.

So how would we know what the well known namespaces and associations that are available in a particular context are? This can be seen through the schema dialog. For example, in the case above, the property window for the brainml.xsd code buffer shows a “Schemas” property and, clicking on it, brings up the following dialog.

clip_image004

This dialog shows a table view of namespaces for which Visual Studio has a known association and the locations of the files that are known to provide schemas for those namespaces. The left hand column, labeled “Use”, allows us to control when or if these known associations are used. The default option is “Automatic”, which means use the schema if needed to resolve an import (such as the current scenario of finding a schema for the well known xml namespace). An option of “Use” says the schema is to be used in the current set. Note that this option is pre-selected for the files computed to be in the set; selecting it for a new file would essentially introduce a new root into the set computation described above. Finally, there is also a do not use option to allow us to exclude a file from a set that would otherwise be included in one of the above scenarios.

There are also two special values (localized strings that are not valid XML identifiers) used where a namespace name would normally appear in the schema explorer to help visualize and distinguish two error conditions that could arise in building a schema set. One is the “Not Found or Invalid” name that is used when the path specified in a schemaLocation is either not found (i.e. an include of a non-existent or non-readable file) or if the file is present but cannot be parsed as an XML schema (i.e. it is either not valid XML or we do not find a schema root element). For example, if we edit brainmetal.xsd in the set above to have the root element read “schemaInvalid” our schema explorer view of the schema changes as in the following diagram.

clip_image006

There is another special name that can appear in place of the namespace, and that is the “Unauthorized Zone” name. The files that appear under this name are files that were attempted to be imported or included into the schema set by a schema file in a different security zone that does not have permissions to access the zone that file is in. This is similar to the Internet Explorer policy whereby a web page cannot redirect or read from a location on the user’s machine or intranet (i.e. a different security zone) unless the machine’s zone policy has been configured to allow this. The schema processor is essentially acting as a proxy for the remote site when requesting included or imported files on their behalf and thus enforces the zone security policies that are in place. This prevents any possible attacks whereby processing schema externals can be used by a malicious external site to either force opening a file or be used in combination with other exploits to potentially post back or gleam information about the user system.

For example, imagine there is a schema available in an external web site named “Remote_Import_Local.xsd” that attempts to import a local file from the path “d:\schemas\Security\Local.xsd”. Even if this path exists in your local machine and there is a valid schema file there, it will not be included in the schema set (and will in fact not even be opened as part of building the set), and you will instead get the following view in the schema explorer.

clip_image008

The inclusion of Local.xsd in the “Unauthorized Zone” and the warnings in the error pane about not being able to resolve the schema location are an indication to the end user that the schema they were visiting attempted to bring in a schema from a zone that it is not authorized to access.

-Fred Garcia

SDE, XML Tools Team