Avoid Using PSVI in Web Service Contracts

When you create web services, you should use contract-first design.  However, this comes with a caveat: don't assume that PSVI data will be present in the message.  Design your messages based on schema, absolutely, just don't rely on PSVI elements in web services unless you are guaranteed validation.

Introduction to the PSVI

The PSVI contains the XML Infoset representation after an XML document has been validated.  The easiest way to understand this concept is to see it in action.  XML Schema 1.0 provides the means to define attributes using the xs:attribute element.  The production for xs:attribute:

 <attribute default = string 
 fixed = string 
 form = (qualified | unqualified) 
 id = ID 
 name = NCName 
 ref = QName 
 type = QName 
 use = (optional | prohibited | required) : optional 
  {any attributes with non-schema namespace . . .} > 
 Content:  (annotation ?,(simpleType?)) 
 </attribute>

Notice the first attribute that may be assigned to the definition of an attribute: default.  Assigning a value for this attribute means that you are defining a default value for the attribute, fairly self-explanatory at first glance.  The second part to notice is that the use attribute provides 3 values: optional, prohibited, and required, with the default value being optional.  A value of optional means that if the attribute is not present, it still valid according to the schema.  The problem occurs when you combine these two attributes together.  If an optional attribute is not present in a document when the schema defines a default value, the default value should be applied for the attribute.  When would you be able to read the value of this attribute?

I will spend a considerable amount of time explaining the issue from multiple aspects, but note that there is a workaround for optional default attributes mentioned at the end of this article.

Consider the following XML schema document.

 <?xml version="1.0" encoding="utf-8" ?>
<xs:schema id="Customer" targetNamespace="https://contoso.com/Customer.xsd" elementFormDefault="qualified"
   xmlns="https://contoso.com/Customer.xsd" xmlns:mstns="https://contoso.com/Customer.xsd" xmlns:xs="https://www.w3.org/2001/XMLSchema">
   <xs:complexType name="CustomerType">
      <xs:attribute name="defaultString" type="xs:string" default="test" />
     <xs:attribute name="defaultInt" type="xs:int" default="1" />
      <xs:attribute name="defaultBoolean" type="xs:boolean" default="true" />
   </xs:complexType>
 <xs:element name="Customer" type="CustomerType" />
</xs:schema>

This schema defines a type, CustomerType, having 3 attributes: defaultString, defaultInt, and defaultBoolean.  We did not define a value for the use attribute; that is, each of these attributes are optional by default. The following XML instance document is valid according to this schema document:

  <?xml version="1.0" encoding="utf-8" ?> <Customer xmlns="https://contoso.com/Customer.xsd" /> 

After the schema is validated, this document's PSVI would be serialized as the following:

  <?xml version="1.0" encoding="utf-8" ?>
<Customer xmlns="https://contoso.com/Customer.xsd"
defaultString="test" defaultInt="1" defaultBoolean="true" />

We can prove this concept through code.  We can use the classes in the System.Xml and System.Xml.Schema namespaces to process an XML document that uses the namespace defined by this schema document.

 using System; 
using System.Xml; 
using System.Xml.Schema; 
namespace Contoso 
{ 
    class Tester 
    { 
        [STAThread] 
        static void Main(string[] args) 
        { 
            string instanceLocation = "https://localhost/webservice3/xmlfile1.xml"; 
            XmlUtils util = new XmlUtils(); 
            Console.WriteLine("=====Without Validation====="); 
            XmlTextReader reader = new XmlTextReader(instanceLocation); 
            util.WriteXmlToConsole(reader); 
            reader.Close(); 
            Console.WriteLine(); 
            Console.WriteLine("=====With Validation====="); 
            reader = new XmlTextReader(instanceLocation); 
            util.ValidateXml(reader,"https://contoso.com/Customer.xsd", "https://localhost/WebService3/Customer.xsd"); 
            reader.Close(); 
         } 
      } 

      public class XmlUtils 
      { 
         public void WriteXmlToConsole(XmlReader reader) 
         { 
             XmlTextWriter writer = new XmlTextWriter(Console.Out); 
             writer.Formatting = Formatting.Indented; 
             while(reader.Read()) 
             { 
                writer.WriteNode(reader,true); 
             } 
             writer.Flush(); 
             writer.Close(); 
             Console.WriteLine(); 
         } 
         
         public void ValidateXml(XmlReader reader, string targetNamespace, string schemaLocation) 
         { 
            XmlValidatingReader validator = new XmlValidatingReader(reader); 
            validator.ValidationType = ValidationType.Schema; 
            validator.ValidationEventHandler +=new ValidationEventHandler(ValidationCallBack); 
            validator.Schemas.Add(targetNamespace,schemaLocation); 
            WriteXmlToConsole(validator); 
            validator.Close(); 
         } 

         private void ValidationCallBack(object sender, ValidationEventArgs e) 
         { 
             System.Console.WriteLine(); 
             System.Console.WriteLine("***************{0}********************",e.Message);  
             System.Console.WriteLine();  
         } 
    } 
}

This code results in the following output:

  =====Without Validation===== 
<?xml version="1.0" encoding="utf-8" ?> <customer xmlns="https://contoso.com/Customer.xsd" /> 
=====With Validation===== 
<?xml version="1.0" encoding="utf-8" ?> <customer xmlns="https://contoso.com/Customer.xsd" defaultstring="test" defaultboolean="true" defaultint="1" /> 

In the first set of output, the document is written exactly as the source document represented it.  Without validation, we have no way of knowing if a default value was specified for the attribute, let alone what the value was.  The only way to do this would be to load the schema up and perform validation against the schema, as shown in the second set of output where the default attributes are present.  We are unable to obtain the values for defaulted attributes without first validating the schema, obtaining the PSVI information.

Again, note that there is a workaround for optional default attributes mentioned at the end of this article.

PSVI and XML Serialization

Validation is typically skipped because it incurs a performance cost to parse both the XML schema document as well as the XML instance document and validate the instance document against the schema.  This is also the approach taken by ASMX in .NET: validation is not performed, so there is no PSVI information. And this is the source of a potential interop issue.

Let's run this schema through the .NET Framework utility xsd.exe, specifying the /classes option.

        xsd.exe /classes customer.xsd

The output of this operation is a class that can be included within our web service project.

  using System.Xml.Serialization; 

[System.Xml.Serialization.XmlTypeAttribute(Namespace="https://contoso.com/Customer.xsd")] 
[System.Xml.Serialization.XmlRootAttribute("Customer", Namespace="https://contoso.com/Customer.xsd", IsNullable=false)] 
public class CustomerType 
{ 
    [System.Xml.Serialization.XmlAttributeAttribute()]  
    [System.ComponentModel.DefaultValueAttribute("test")] 
    public string defaultString = "test"; 

    [System.Xml.Serialization.XmlAttributeAttribute()] 
    [System.ComponentModel.DefaultValueAttribute(1)] 
    public int defaultInt = 1; 

    [System.Xml.Serialization.XmlAttributeAttribute()] 
    [System.ComponentModel.DefaultValueAttribute(true)] 
    public bool defaultBoolean = true; } 

Before we look at web services, let's focus on how the XmlSerializer works. The XmlSerializer inspects the public properties and fields of a type and serializes them to elements unless otherwise specified.  The serialization characteristics are controlled using type metadata specified as attributes.  The above type maps to our schema definition, including the root element name "Customer" and the specification that the fields should be serialized as attributes. 

We will use the same XML instance document as above for de-serialization, and we will specify the same values as the defaults within the schema. 

 using System; 
using System.Xml; 
using System.Xml.Schema; 

namespace Contoso 
{ 
    class Tester 
    { 
        [STAThread] 
        static void Main(string[] args)  
        { 
           Tester t = new Tester(); 
           t.TestSerialization(); 
        } 

        private void TestSerialization() 
        { 
            //Serialize a CustomerType using default values to a MemoryStream 
            System.IO.MemoryStream mem = new System.IO.MemoryStream(); 
            System.Xml.Serialization.XmlSerializer ser = new System.Xml.Serialization.XmlSerializer(typeof(CustomerType)); 
            ser.Serialize(mem,GetCustomerType(true)); 

            //Read the MemoryStream to the Console 
            Console.WriteLine("========Contents after serializing type using defaults========="); 
            mem.Position = 0; 
            System.IO.StreamReader reader = new System.IO.StreamReader(mem);
            Console.WriteLine(reader.ReadToEnd());
            Console.WriteLine();
            
            //De-serialize the MemoryStream to a CustomerType 
            mem.Position = 0; 
            ser = new System.Xml.Serialization.XmlSerializer(typeof(CustomerType)); 
            CustomerType c = (CustomerType)ser.Deserialize(mem); 

            //Write the CustomerType contents to the Console 
            Console.WriteLine("========Contents after de-serializing type using defaults========="); 
            Console.WriteLine("{0}\n{1}\n{2}",c.defaultBoolean,c.defaultInt,c.defaultString);  
            //Close resources 

            mem.Close(); 
            reader.Close(); 
        } 

        public static CustomerType GetCustomerType(bool useDefaults) 
        { 
            CustomerType c = new CustomerType(); 
            if(useDefaults) 
            { 
                c.defaultBoolean = true; 
                c.defaultInt = 1; 
                c.defaultString = "test"; 
            } 
            else 
            { 
                c.defaultBoolean = false; 
                c.defaultInt = 5; 
                c.defaultString = "foo"; 
            } 
            return c; 
        } 
     } 
}

This code results in the following output to the console window:

  ========Contents after serializing type using defaults========= 
<?xml version="1.0"?> 
 <CustomerType xmlns:xsd="https://www.w3.org/2001/XMLSchema" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" /> 

========Contents after de-serializing type using defaults========= 
True 1 test

Even though our XML document did not specify the default attributes, the type created by XSD.exe includes definitions for them as public fields with pre-set values. When the XmlSerializer does not have type information for the missing attributes, so it does not affect those fields. 

PSVI and Web Services

As stated before, the ASMX pipeline does not perform validation, it leverages the XmlSerializer to serialize XML to CLR types and vice-versa. To reiterate, this implies that PSVI information is not available, but the act of de-serializing to a CLR type with initialized field values provides some level of type checking and structure validation.  There is very little magic happening, as we just coded an example using the XmlSerializer and demonstrated the behavior. For completeness, let's look at an example web service.We will see this behavior manifest itself in web services as well.

Once we include our class generated by the xsd.exe utility into our web service project, we can use it as a parameter or a return type for our web service methods. 

 using System; 
using System.Web; 
using System.Web.Services; 
namespace WebService3 
{ 
    public class Service1 : System.Web.Services.WebService 
    { 
        [WebMethod] 
        public CustomerType HelloWorld(bool useDefaults) 
        { 
           CustomerType c = new CustomerType(); 
           if (useDefaults) 
           { 
              c.defaultBoolean = true; 
              c.defaultInt = 1; 
              c.defaultString = "test"; 
           } 
           else 
           { 
              c.defaultBoolean = false; 
              c.defaultInt = 5; 
              c.defaultString = "foo"; 
           } 
           return c; 
        } 

        [WebMethod] public void HelloWorld2(CustomerType c) 
        { 
        } 
    }
}

The first web method is an example of returning a CustomerType from the server, showing how the type is serialized onto the wire from the server side.  The second web method accepts a CustomerType, allowing a demonstration of serializing a CustomerType to the wire from the client. Using Simon Fell's tcpTrace, we can inspect the SOAP as it appears on the wire. We first look at the SOAP that is received back from the first 2 web service calls, call 1 and call 2.

Call 1:

 <?xml version="1.0" encoding="utf-8" ?>
<soap:Envelope xmlns:soap="https://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
   xmlns:xsd="https://www.w3.org/2001/XMLSchema">
    <soap:Body>
       <HelloWorldResponse xmlns="https://tempuri.org/">
          <Customer xmlns="https://contoso.com/Customer.xsd" />
      </HelloWorldResponse>
 </soap:Body>
</soap:Envelope>

Call 2:

 <?xml version="1.0" encoding="utf-8" ?>
<soap:Envelope xmlns:soap="https://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
   xmlns:xsd="https://www.w3.org/2001/XMLSchema">
    <soap:Body>
       <HelloWorldResponse xmlns="https://tempuri.org/">
          <Customer defaultString="foo" defaultInt="5" defaultBoolean="false" xmlns="https://contoso.com/Customer.xsd" />
        </HelloWorldResponse>
 </soap:Body>
</soap:Envelope>

There is a potential interop problem in that the default attribute values are not serialized on the wire when their value is equal to the default value. If a client application accesses the web service, obtaining a serialized CustomerType, is it expected to perform validation against the schema to obtain the PSVI?

For some reason, the problem seems to be easier to conceive when we are sending a SOAP message rather than receiving one (confirmed by trying to explain this issue several times already to others before writing this).  We look at calls 3 and 4 as examples of sending SOAP messages.

Call 3:

 <?xml version="1.0" encoding="utf-8" ?>
<soap:Envelope xmlns:soap="https://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
   xmlns:xsd="https://www.w3.org/2001/XMLSchema">
    <soap:Body>
       <HelloWorld2 xmlns="https://tempuri.org/">
         <Customer xmlns="https://contoso.com/Customer.xsd" />
      </HelloWorld2>
    </soap:Body>
</soap:Envelope>

Call 4:

 <?xml version="1.0" encoding="utf-8" ?>
<soap:Envelope xmlns:soap="https://schemas.xmlsoap.org/soap/envelope/" 
    xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
   xmlns:xsd="https://www.w3.org/2001/XMLSchema">
    <soap:Body>
       <HelloWorld2 xmlns="https://tempuri.org/">
         <Customer xmlns="https://contoso.com/Customer.xsd" 
           defaultBoolean="false" defaultInt="5" defaultString="foo"/>
      </HelloWorld2>
    </soap:Body>
</soap:Envelope>

Changes to xsd.exe in Whidbey

I would be remiss if I did not address the public Whidbey Beta 1 and changes to serialization. When you run the xsd.exe tool in Whidbey Beta 1, the following type is generated (note the creation of private properties accessed via public properties instead of public fields).  Comparing this to the type generated in "PSVI and Serialization" above, you can see that the access to property setters provides some flexibility in controlling your types based on the code generated.

 [System.SerializableAttribute()]
[System.Xml.Serialization.XmlTypeAttribute(Namespace="https://contoso.com/Customer.xsd")]
public class CustomerType
{
    private string defaultStringField;
    private int defaultIntField;
    private bool defaultBooleanField;
    public CustomerType()
    {
        this.defaultStringField = "test";
        this.defaultIntField = 1;
        this.defaultBooleanField = true;
    }
[System.Xml.Serialization.XmlAttributeAttribute()]
[System.ComponentModel.DefaultValueAttribute("test")]
public string defaultString
{
   get
   {
      return this.defaultStringField;
   }
   set
   {
      this.defaultStringField = value;
   }
}

[System.Xml.Serialization.XmlAttributeAttribute()]
[System.ComponentModel.DefaultValueAttribute(1)]
public int defaultInt
{
   get
   {
      return this.defaultIntField;
    }
   set
   {
      this.defaultIntField = value;
   }
}

[System.Xml.Serialization.XmlAttributeAttribute()]
[System.ComponentModel.DefaultValueAttribute(true)]
public bool defaultBoolean
{
    get
    {
       return this.defaultBooleanField;
    }
    set
    {
       this.defaultBooleanField = value;
    }
}
}

The Workaround

Now we come to the real meat of the matter: schema design.  The schema design itself relied on the inclusion of PSVI information.  Without the guarantee of a validation pre-processor before the SOAP envelope is de-serialized to a type, relying on the PSVI is simply a poor choice of design.  Creation of an XML document based on this XML schema automatically assumes access to PSVI data, which most frameworks simply won't provide due to the overhead that validation imposes.

Now that I have belabored the point of schema design and optional default attributes, note that there is a workaround.  Instead of relying on the PSVI data to supplant missing optional attributes, it is better to mark those attributes as explicitly required and remove the default attribute.   Section 3.2.2 in XML Schema 1.0: Structures specifies that, if the default attribute is specified, then the use attribute must be optional or absent... the attribute cannot be required and have a default. The updated schema:

 <?xml version="1.0" encoding="utf-8" ?>
<xs:schema id=Customer xmlns="https://contoso.com/Customer.xsd" 
    xmlns:xs="https://www.w3.org/2001/XMLSchema" 
    xmlns:mstns="https://contoso.com/Customer.xsd" 
    elementFormDefault="qualified" 
    targetNamespace="https://contoso.com/Customer.xsd">
   <xs:complexType name="CustomerType">
      <xs:sequence />
       <xs:attribute type="xs:string" name="defaultString" use="required"  />
     <xs:attribute type="xs:int" name="defaultInt" use="required"  />
       <xs:attribute type="xs:boolean" name="defaultBoolean" use="required"  />
   </xs:complexType>
 <xs:element type="CustomerType" name="Customer"></xs:element>
</xs:schema>

We re-run xsd.exe in .NET 1.x Framework, and the following type is generated. Note the missing System.ComponentModel.DefaultValueAttribute specifying the default value for the attribute:

 using System.Xml.Serialization;   
  
[System.Xml.Serialization.XmlTypeAttribute(Namespace="https://contoso.com/Customer.xsd")]   
[System.Xml.Serialization.XmlRootAttribute("Customer", Namespace="https://contoso.com/Customer.xsd", IsNullable=false)]     
public class CustomerType {                
    [System.Xml.Serialization.XmlAttributeAttribute()]  
    public string defaultString;                 
   
    [System.Xml.Serialization.XmlAttributeAttribute()]      
    public int defaultInt;                 
 
    [System.Xml.Serialization.XmlAttributeAttribute()] 
 public bool defaultBoolean;   
}

By requiring the attribute (through use="required"), the attribute is guaranteed to be serialized to the wire by the XmlSerializer, and neither side of the contract needs to accomodate PSVI information to determine the attribute's actual value.  You can get around this issue through a very small change: if you are sending messages to a web service that includes an XML schema defining optional default attributes, simply change them on your side to required. 

Let's look back at the very first example in this article.  If the attribute was not specified in the document, the PSVI includes the value.  The enforcement is that you are required to specify an attribute lest one be supplied for you.  If you supply the attribute, then you need not fall victim to the default of the PSVI.  There is nothing stopping you from marking default optional attributes as required on the client, because this only affects what you send: the server can still leverage PSVI information if it is equipped to do so.  The key is to make sure you know what you are capable of serializing and ensure that the information is actually serialized as expected.

Summary

The lackof available PSVI information in SOAP messages based on XML Schemas further supports the fact that database concepts do not map well to XML.  Though, I agree with Aaron:  RelaxNG won't supplant XSD, and for the same reasons that Aaron cited.  Too many specs, companies, and individuals are wed to XML Schemas for RelaxNG to find a serious foothold.  At least, that's the current, we can only hope that XML Schema finds a means to break ties from the relational world. 

Despite its warts, I still see an inherent beauty in XML Schemas that is not present in either RelaxNG or XDR, but I also recognize that XML Schema contains overblown expressions that make aspects of it confusing beyond its utility.  The optional default attribute is one of those warts, as is the PSVI itself.  The common purpose for the default attribute is to provide a value in case the sender conveniently forgot to provide one:  Make them provide one.  If you are the sender, make sure that your version of the contract converts optional defaulted attributes to required attributes, ensuring that you serialize onto the wire correctly.