Using automatic format selection with a compression encoder in self-hosted scenarios

Although I haven't been working with WCF for over 4 years, every now and then I run into a question from either the forums or someone directly contacting me that catches my attention and I try to see if I can help. Last week I got one of those and I think it was interesting enough for me to get back to writing about it.

Basically, a customer had a WCF REST service (I know, if you want to be pedantic you'd say that it's a WCF service with a Web endpoint, but let's not go there), and they were using the automatic formatting feature for the response. This way, if the client wants a JSON response, they send the Accept: application/json, and receive that. If they want XML, they can send Accept: text/xml, and off they go to in the XML land. Things were working fine, until the moment when they decided to start compressing the data to save bandwidth. To do that, they went to the canonical example of compression in WCF, the "Custom Message Encoder: Compression Encoder" sample, replaced the web encoder of their binding with it. And just like that, the auto format selection stopped working - all responses, regardless of the incoming Accept header, were coming back in XML.

So what's the story here? The story for using compression (most specifically, GZip) on a WCF service is to host the service in IIS, and let IIS itself handle the compression (which it does beautifully). On self-hosted scenarios, the story has been to use an encoder that implements the compression itself (one could also implement that using a custom transport, but that's so complicated that it becomes a no-go for all but a handful of people). In most scenarios this works out fine, but in this case, there's an implementation detail on the WebHttpBehavior that it will only apply the AutomaticFormatSelectionEnabled property if the encoder that is being used is the one provided by the WebMessageEncodingBindingElement. Change the encoder (for example, to use the compression encoder), and the feature stops working (yes, it's a bug, but given the state of WCF in mostly maintenance mode, especially in the Web/REST functionalities, it's unlikely that it will be fixed anytime soon, so for all purposes, this is the behavior that we're stuck with).

In this post I'll show a workaround for this issue that one can use to have both ways - automatic format selection, and the compression encoder. As far as workarounds go, validate for your scenarios before deploying it into a production system: I tested it for a few scenarios and it worked, but I make no guarantees that it will work for all.

TL;DR: if all you want is the code, you can also find it on my WCFSamples GitHub repository, under MessageEncoder/GZipEncoderAndAutoFormatSelection.

Tweaking the compression encoder sample

First off, we'll need to change a little of the encoder sample so that it behaves better in a HTTP-centric world. The sample defines a new Content-Type for the encoder (application/x-gzip), which makes sense if we want to use the encoder in a transport-agnostic world (i.e., it can be used with the TCP or Named Pipes transport), but for HTTP, the compression is signalled by the Accept-Encoding and Content-Encoding headers, not by the content type itself (whether the content is compressed or not is orthogonal to the type of the content). Therefore we get rid of the new content type and delegate most of the calls to the inner encoder.

     public override string ContentType
    {
        get { return innerEncoder.ContentType; }
    }

    public override string MediaType
    {
        get { return innerEncoder.MediaType; }
    }

    public override bool IsContentTypeSupported(string contentType)
    {
        return innerEncoder.IsContentTypeSupported(contentType);
    }

    public override T GetProperty()
    {
        return innerEncoder.GetProperty();
    }

    public override MessageVersion MessageVersion
    {
        get { return innerEncoder.MessageVersion; }
    }

Next, when reading a message, we need to pass the content-type along to the inner encoder, since the web encoder uses it to determine how to deserialize the message (using JSON or XML).

     public override Message ReadMessage(ArraySegment buffer, BufferManager bufferManager, string contentType)
    {
        //Decompress the buffer
        ArraySegment decompressedBuffer = DecompressBuffer(buffer, bufferManager);
        //Use the inner encoder to decode the decompressed buffer
        Message returnMessage = innerEncoder.ReadMessage(decompressedBuffer, bufferManager, contentType);
        returnMessage.Properties.Encoder = this;
        return returnMessage;
    }

At this point, the encoder will work fine if you send compressed (via gzip) requests. But not all requests are compressed, so we must update the encoder so that it will only decompress the requests if necessary. The correct way to do that would be to check for the presence of the Content-Encoding: gzip header in the request. However, this information isn't available to the encoder in WCF - that's one of the pitfalls of a transport-agnostic framework, you cannot count on having information at the transport layer. We can work around this limitation, however, by checking for the GZip header in the bytes that indicate a GZipped content. From the GZip specification, it must start with the following magic bytes: 0x1F, 0x8B, 0x08. Updating the ReadMessage method will enable the encoder to consume both compressed and uncompressed payloads:

 
    public override Message ReadMessage(ArraySegment buffer, BufferManager bufferManager, string contentType)
    {
        ArraySegment decompressedBuffer = buffer;

        if (buffer.Count >= 3 && buffer.Array[buffer.Offset] == 0x1F &&
            buffer.Array[buffer.Offset + 1] == 0x8B && buffer.Array[buffer.Offset + 2] == 0x08)
        {
            //Decompress the buffer
            decompressedBuffer = DecompressBuffer(buffer, bufferManager);
        }

        //Use the inner encoder to decode the decompressed buffer
        Message returnMessage = innerEncoder.ReadMessage(decompressedBuffer, bufferManager, contentType);
        returnMessage.Properties.Encoder = this;
        return returnMessage;
    }

And we now have an encoder that can receive both compressed and uncompressed data, but always writes out compressed data. We still have the problem of the auto format selection not working, and we'll take a look at solving it next.

Adding an inspector to implement auto format selection and selective output compression

Now we need to change the encoder behavior depending on the client request, or more precisely the presence (and value) of the Accept and Accept-Encoding HTTP headers. The easiest way to implement this correlation is via a message inspector, which will look at the incoming request properties, and take some actions based on it.

The easiest part is to get the automatic format selection functionality back on. Although the property in the WebHttpBehavior doesn't work in the GZip encoder, setting the OutgoingResponse.Format property in the WebOperationContext does, and we can use that in the inspector.

 
    public class CompressionAndFormatSelectionMessageInspector : IDispatchMessageInspector
    {
        static readonly Regex jsonContentTypes = new Regex(@"[application|text]\/json");
        static readonly Regex xmlContentTypes = new Regex(@"[application|text]\/xml");

        public object AfterReceiveRequest(ref Message request, IClientChannel channel, InstanceContext instanceContext)
        {
            object propObj;
            if (request.Properties.TryGetValue(HttpRequestMessageProperty.Name, out propObj))
            {
                var prop = (HttpRequestMessageProperty)propObj;
                var accept = prop.Headers[HttpRequestHeader.Accept];
                if (accept != null)
                {
                    if (jsonContentTypes.IsMatch(accept))
                    {
                        WebOperationContext.Current.OutgoingResponse.Format = WebMessageFormat.Json;
                    }
                    else if (xmlContentTypes.IsMatch(accept))
                    {
                        WebOperationContext.Current.OutgoingResponse.Format = WebMessageFormat.Xml;
                    }
                }
            }

            return null;
        }

        public void BeforeSendReply(ref Message reply, object correlationState)
        {
        }
    }

To handle the Accept-Encoding header, we'll need to use the correlation object that is passed between the AfterReceiveRequest and BeforeSendReply methods in the inspector. If the request contains an Accept-Encoding: gzip header, then we'll pass this information along to the reply.

 
    public object AfterReceiveRequest(ref Message request, IClientChannel channel, InstanceContext instanceContext)
    {
        bool shouldCompressResponse = false;

        object propObj;
        if (request.Properties.TryGetValue(HttpRequestMessageProperty.Name, out propObj))
        {
            var prop = (HttpRequestMessageProperty)propObj;
            var accept = prop.Headers[HttpRequestHeader.Accept];
            if (accept != null)
            {
                if (jsonContentTypes.IsMatch(accept))
                {
                    WebOperationContext.Current.OutgoingResponse.Format = WebMessageFormat.Json;
                }
                else if (xmlContentTypes.IsMatch(accept))
                {
                    WebOperationContext.Current.OutgoingResponse.Format = WebMessageFormat.Xml;
                }
            }

            var acceptEncoding = prop.Headers[HttpRequestHeader.AcceptEncoding];
            if (acceptEncoding != null && acceptEncoding.Contains("gzip"))
            {
                shouldCompressResponse = true;
            }
        }

        return shouldCompressResponse;
    }

On the BeforeSendReply, we will now use that information to add the appropriate header (Content-Encoding: gzip) to the reply.

 
    public void BeforeSendReply(ref Message reply, object correlationState)
    {
        var useGzip = (bool)correlationState;
        if (useGzip)
        {
            // Add property to be used by encoder
            HttpResponseMessageProperty resp;
            object respObj;
            if (!reply.Properties.TryGetValue(HttpResponseMessageProperty.Name, out respObj))
            {
                resp = new HttpResponseMessageProperty();
                reply.Properties.Add(HttpResponseMessageProperty.Name, resp);
            }
            else
            {
                resp = (HttpResponseMessageProperty)respObj;
            }

            resp.Headers[HttpResponseHeader.ContentEncoding] = "gzip";
        }
    }

Finally, at the encoder's WriteMessage method, we can look at the message properties to see if that header is present. Only if that is the case that the response is compressed.

 
    public override ArraySegment WriteMessage(Message message, int maxMessageSize, BufferManager bufferManager, int messageOffset)
    {
        //Use the inner encoder to encode a Message into a buffered byte array
        ArraySegment buffer = innerEncoder.WriteMessage(message, maxMessageSize, bufferManager, 0);

        object respObj;
        if (message.Properties.TryGetValue(HttpResponseMessageProperty.Name, out respObj))
        {
            var resp = (HttpResponseMessageProperty)respObj;
            if (resp.Headers[HttpResponseHeader.ContentEncoding] == "gzip")
            {
                // Need to compress the message
                buffer = CompressBuffer(buffer, bufferManager, messageOffset);
            }
        }

        return buffer;
    }

And that's it. With those changes, the encoder will now only compress the responses if the client asked them to be so.

Wrapping up

This scenario shows a limitation of the WCF architecture - by trying to be transport-agnostic, it loses the ability to take advantage of some of the transport features, especially at the encoder level (in other extensibility points one can find the HTTP headers via the message properties). So if your implementation is tied to a particular transport, especially HTTP, WCF will probably do what you want, but with other frameworks (such as ASP.NET Web API) it will likely be easier to do.