Validating XML characters in SOAP messages

I’ve written about using the SoapHttpClientProtocol subclasses generated by wsdl.exe several times over the last year, including handling authentication, HTTP response codes, and setting timeouts properly.  Today I needed to change the code in TFS to better handle characters that are not allowed in XML.

The problem is that if you have a method on your web service that takes a String parameter, someone may call that method with a string that contains characters that are not allowed in XML.  That input may come from a command line switch or a text box in a GUI.

The XmlWriter used by SoapHttpClientProtocol is XmlTextWriter.  XmlTextWriter doesn’t do any character validation.  If the string passed to WriteString() includes characters that are not valid for XML, your XML output will also.  The characters below 32, except for tab, carriage return, and new line, the UTF-8 BOM, and invalid surrogate pairs are not allowed by the XML standard.

The XmlReader used by the ASP.NET web services does do character validation.  If it finds an invalid XML character, the web service will respond with HTTP 400 Bad Request.  That doesn’t help the user figure out what’s going on.

Elena Kharitidi suggested overriding the GetWriterForMessage() method from SoapHttpClientProtocol in the subclass that was generated by wsdl.exe and providing a character-validating XmlWriter.

The documentation shows an example of creating a subclass of XmlTextWriter to check the characters.  However, it would be better to be able to use a framework class to do it without rolling our own.  Fortunately, there is such a class in the framework.

With a little poking around, I found the XmlCharCheckingWriter class that is internal to the framework.  Now we just need to get the framework to give us an instance of that class.  A little more poking around and experimentation resulted in the piece of code shown below.

If you run it under the debugger and put a breakpoint on line 5, you’ll see that the base SoapHttpClientProtocol method returns an XmlTextWriter.  If you step down to the XmlWriter.Create() call, you’ll see that the framework gives us the XmlCharCheckingWriter instance that we want in response to the CheckCharacters setting being true.

Now, if you add the following code to your wsdl.exe-generated subclass of SoapHttpClientProtocol, you’ll get an ArgumentException on the client when trying to write the invalid XML in the SOAP message.  The exception message will state that there is an invalid character.  The result is a significant improvement over getting a generic HTTP 400 Bad Request from the web service.

// Override this method in order to put in character validation.
protected override XmlWriter GetWriterForMessage(SoapClientMessage message,
int bufferSize)
XmlWriter writer = base.GetWriterForMessage(message, bufferSize);

// Choose the encoding the same way the framework code does.
Encoding encoding = RequestEncoding != null ? RequestEncoding :
new UTF8Encoding(false);

// We want the character validation to be done on the client side
// rather than getting an obscure HTTP 400 Bad Request message
// from the server (the XmlReader used by the web services does
// character validation, while the writer used in the base class
// does not).
// We create this second XmlWriter to get an XmlCharCheckingWriter
// instance. The Create(XmlWriter, XmlWriterSettings) code path
// does that (we don’t need the overhead in an XmlWellformedWriter).
XmlWriterSettings xws = new XmlWriterSettings();
xws.Encoding = encoding;
xws.Indent = false;
xws.NewLineHandling = NewLineHandling.None;
xws.CheckCharacters = true; // make sure char is valid for XML
writer = XmlWriter.Create(writer, xws);

return writer;

Comments (1)

  1. Cool, where was this post a few months ago?

    We ran into a similar issue when testing our web services. I guess performance trumps sometimes, but it’d be nice to see a note or similar in MSDN. Anyways, we’ll incorporate this into our system if you don’t mind ;).

Skip to main content