DataSet does not validate XML Schema (XSD)

If you're working with the DataSet class and loading and saving data in XML format, you may have run into a bit of a surprise. Even when you specifically read an XSD schema into your DataSet, when you load data into it, it seems that validation doesn't work - you're still able to pass in data with extra fields that may get ignored or have found other surprising results.

The reason why this happens is that when you call ReadXmlSchema, for example, the DataSet only uses the information to figure out what tables and columns to expect, but it doesn't hold on to the schema to do validation whenever you load into it later. You can think of it as a structural thing rather than a validation thing. For example the data type, the column names, and any maxlength or length XSD constraints are all used to set up the DataSet structure.

Of course you can still use the XSD to validate the read, it just takes a bit more work. Let's walk through how to make this work. If you follow along, just copy the code into the main method of a console project to see how things go.

First, let's start with a basic setup.

// Set up a sample DataSet.
DataSet ds = new DataSet();
DataTable table = ds.Tables.Add("Categories");
table.Columns.Add("CategoryID", typeof(int));
table.Columns.Add("CategoryName", typeof(string));

table.Rows.Add(new object[] { 1, "Beverages" });
table.Rows.Add(new object[] { 2, "Condiments" });

Now, let's get the schema that this DataSet would produce just to see what it looks like, and write it out to the console.

// Let's get the schema and the data separately and write them out.
StringWriter schemaWriter = new StringWriter();
StringWriter dataWriter = new StringWriter();
table.WriteXmlSchema(schemaWriter);
table.WriteXml(dataWriter, XmlWriteMode.IgnoreSchema);

Console.WriteLine();
Console.WriteLine("Schema for DataSet with Categories:");
Console.WriteLine(schemaWriter.ToString());

Console.WriteLine();
Console.WriteLine("Data for DataSet with Categories:");
Console.WriteLine(dataWriter.ToString());

If you run this code now, you'll see this output (formatted a bit to fit better in this post).

Schema for DataSet with Categories:
<?xml version="1.0" encoding="utf-16"?>
<xs:schema id="NewDataSet" xmlns="" 
    xmlns:xs="https://www.w3.org/2001/XMLSchema
    xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
 <xs:element name="NewDataSet" msdata:IsDataSet="true"
     msdata:MainDataTable="Categories" msdata:UseCurrentLocale="true">
  <xs:complexType>
   <xs:choice minOccurs="0" maxOccurs="unbounded">
    <xs:element name="Categories">
     <xs:complexType>
      <xs:sequence>
       <xs:element name="CategoryID" type="xs:int" minOccurs="0" />
       <xs:element name="CategoryName" type="xs:string" minOccurs="0" />
      </xs:sequence>
      </xs:complexType>
     </xs:element>
    </xs:choice>
   </xs:complexType>
 </xs:element>
</xs:schema>

Data for DataSet with Categories:
<NewDataSet>
  <Categories>
    <CategoryID>1</CategoryID>
    <CategoryName>Beverages</CategoryName>
  </Categories>
  <Categories>
    <CategoryID>2</CategoryID>
    <CategoryName>Condiments</CategoryName>
  </Categories>
</NewDataSet>

Let's say that we're going to load the data back, but we're going to mess with the data a bit, and add a new record with a mis-named field.

// Now, let's create a version with a mis-named field.
XDocument doc = XDocument.Parse(dataWriter.ToString());
XElement broken = new XElement("Categories",
  new XElement("CategoryID", "3"),
  new XElement("CategoryNamez", "Confections"));
doc.Root.Add(broken);

Console.WriteLine();
Console.WriteLine("Document we've messed with:");
Console.WriteLine(doc.ToString());

If you run the code, this is the new output you'll see.

Document we've messed with:
<NewDataSet>
  <Categories>
    <CategoryID>1</CategoryID>
    <CategoryName>Beverages</CategoryName>
  </Categories>
  <Categories>
    <CategoryID>2</CategoryID>
    <CategoryName>Condiments</CategoryName>
  </Categories>
  <Categories>
    <CategoryID>3</CategoryID>
    <CategoryNamez>Confections</CategoryNamez>
  </Categories>
</NewDataSet>

First, we'll just load the data into the DataSet, using ReadXml.

// Load the slightly messed-up data in the DataSet.
ds = new DataSet();
ds.ReadXmlSchema(new StringReader(schemaWriter.ToString()));
ds.ReadXml(doc.CreateReader());

Console.WriteLine();
Console.WriteLine("DataSet loaded directly from messed data:");
ds.WriteXml(Console.Out, XmlWriteMode.IgnoreSchema);

See what happens - the mis-named field was silently ignored!

DataSet loaded directly from messed data:
<NewDataSet>
  <Categories>
    <CategoryID>1</CategoryID>
    <CategoryName>Beverages</CategoryName>
  </Categories>
  <Categories>
    <CategoryID>2</CategoryID>
    <CategoryName>Condiments</CategoryName>
  </Categories>
  <Categories>
    <CategoryID>3</CategoryID>
  </Categories>
</NewDataSet>

Now we'll create a validating XmlReader, using our original schema.

// Now, let's try putting a validating XmlReader in between...
XmlReaderSettings settings = new XmlReaderSettings();
XmlSchema schema = XmlSchema.Read(new StringReader(schemaWriter.ToString()), null);
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add(schema);

XmlReader validatingReader = XmlReader.Create(doc.CreateReader(), settings);

// ... and read it into a DataSet.
ds = new DataSet();
ds.ReadXmlSchema(new StringReader(schemaWriter.ToString()));
ds.ReadXml(validatingReader);

// We'll never get here!
Console.WriteLine();
Console.WriteLine("DataSet loaded from validating reader:");
ds.WriteXml(Console.Out, XmlWriteMode.IgnoreSchema);

This time, when you run the code, you'll see an exception with the following message:

An unhandled exception of type 'System.Xml.Schema.XmlSchemaValidationException' occurred in System.Xml.dll
Additional information: The element 'Categories' has invalid child element 'CategoryNamez'. List of possible elements expected: 'CategoryName'.

An interesting twist on this is covered in VB.NET syntax, in KB 811107. In this KB, the sample shows how to validate a DataSet that has alread been manipulated in memory: save it out to XML, then scan through it with a validating reader.

Enjoy!