[Intermission] Auf der Mauer, auf der Lauer sitzt 'ne kleine Wa! - or: When REST isn't REST - or: Why and How I Care About Standards-Compliance

seht Euch mal die Wa an, wie die Wa ta kann. Auf der Mauer auf der Lauer sitzt ‘ne kleine Wa!.

It’s a German children’s song. The song starts out with “… sitzt ‘ne kleine Wanze” (bedbug) and with each verse you leave off a letter: Wanz, Wan, Wa, W, – silence.

I’ll do the same here, but not with a bedbug:

Let’s sing:

<soap:Envelope xmlns:soap=”” xmlns:wsaddr=”” xmlns:wsrm=”” xmlns:wsu=”” xmlns:app=””>
<soap:Header>
<addr:Action>https://tempuri.org/1.0/Status.set</addr:Action>
<wsrm:Sequence>
<wsrm:Identifier>urn:session-id</wsrm:Identifier>
<wsrm:MessageNumber>5</wsrm:MessageNumber>
</wsrm:Sequence>
<wsse:Security xmlns:wsse=”…”>
<wsse:BinarySecurityToken ValueType="https://tempuri.org#CustomToken"
EncodingType="...#Base64Binary" wsu:Id=" MyID ">
FHUIORv...
</wsse:BinarySecurityToken>
<ds:Signature>
<ds:SignedInfo>
<ds:CanonicalizationMethod Algorithm="https://www.w3.org/2001/10/xml-exc-c14n#"/>
                      <ds:SignatureMethod Algorithm="https://www.w3.org/2000/09/xmldsig#md5"/>
<ds:Reference URI="#MsgBody">
<ds:DigestMethod Algorithm="https://www.w3.org/2000/09/xmldsig#md5"/> 
<ds:DigestValue>LyLsF0Pi4wPU...</ds:DigestValue>
</ds:Reference>
</ds:SignedInfo>
<ds:SignatureValue>DJbchm5gK...</ds:SignatureValue>
<ds:KeyInfo>
<wsse:SecurityTokenReference>
<wsse:Reference URI="#MyID"/>
</wsse:SecurityTokenReference>
</ds:KeyInfo>
</ds:Signature>
</wsse:Security>
<app:ResponseFormat>Xml</app:ResponseFormat>
<app:Key wsu:Id=”AppKey”>27729912882….</app:Key>
<soap:Header>
<soap:Body wsu:Id=”MyId”>
<app:status>Hello, I’m good</app:status>
</soap:Body>
</soap:Envelope>

Not a very pretty song, I’ll admit. Let’s drop a some stuff. Let’s assume that we don’t need to tell the other party that we’re looking to give it an MD5 signature, but let’s say that’s implied and so were the canonicalization algorithm. Let’s also assume that the other side already knows the security token and the key. Since we only have a single signature digest here and yield a single signature we can just collapse to the signature value. Heck, you may not even know about what that all means. Verse 2:

<soap:Envelope xmlns:soap=”” xmlns:wsaddr=”” xmlns:wsrm=”” xmlns:wsu=”” xmlns:app=””>
<soap:Header>
<addr:Action>https://tempuri.org/1.0/Status.set</addr:Action>
<wsrm:Sequence>
<wsrm:Identifier>urn:session-id</wsrm:Identifier>
<wsrm:MessageNumber>5</wsrm:MessageNumber>
</wsrm:Sequence>
<wsse:Security xmlns:wsse=”…”>
<ds:Signature>
<ds:SignatureValue>DJbchm5gK...</ds:SignatureValue>
</ds:Signature>
</wsse:Security>
<app:ResponseFormat>Xml</app:ResponseFormat>
<app:Key wsu:Id=”AppKey”>27729912882….</app:Key>
<soap:Header>
<soap:Body wsu:Id=”MyId”>
<app:status>Hello, I’m good</app:status>
</soap:Body>
</soap:Envelope>

Better. Now let’s strip all these extra XML namespace decorations since there aren’t any name collisions as far as I can see. We’ll also collapse the rest of the security elements into one element since there’s no need for three levels of nesting with a single signature. Verse 3:

<Envelope>
<Header>
<Action>https://tempuri.org/1.0/Status.set</Action>
<Sequence>
<Identifier>urn:session-id</Identifier>
<MessageNumber>5</MessageNumber>
</Sequence>
<SignatureValue>DJbchm5gK...</SignatureValue>
<ResponseFormat>Xml</ResponseFormat>
<Key>27729912882….</Key>
<Header>
<Body>
<status>Hello, I’m good</status>
</Body>
</Envelope>

Much better. The whole angle-bracket stuff and the nesting seems semi-gratuitous and repetitive here, too. Let’s make that a bit simpler. Verse 4:

         Action=https://tempuri.org/1.0/Status.set
Sequence-Identifier=urn:session-id
Sequence-MessageNumber=5
SignatureValue=DJbchm5gK...
ResponseFormat=Xml
Key=27729912882….
status=Hello, I’m good

Much, much better. Now let’s get rid of that weird URI up there and split up the action and the version info, make some of these keys are little more terse and turn that into a format that’s easily transmittable over HTTP. By what we have here application/www-form-urlencoded would probably be best. Verse 5:

         method=Status.set
&v=1.0
&session_key=929872172..
&call_id=5
&sig=DJbchm5gK...
&format=Xml
&api_key=27729912882….
&status=Hello,%20I’m%20good

Oops. Facebook’s Status.set API. How did that happen? I thought that was REST?

Now play the song backwards. The “new thing” is largely analogous to where we started before the WS* Web Services stack and its CORBA/DCE/DCOM predecessors came around and there are, believe it or not, good reasons for having of that additional “overhead”. A common way to frame message content and the related control data, a common way to express complex data structures and distinguish between data domains, a common way to deal with addressing in multi-hop or store-and-forward messaging scenarios, an agreed notion of sessions and message sequencing, a solid mechanism for protecting the integrity of messages and parts of messages. This isn’t all just stupid.

It’s well worth discussing whether messages need to be expressed as XML 1.0 text on the wire at all times. I don’t think they need to and there are alternatives that aren’t as heavy. JSON is fine and encodings like the .NET Binary Encoding or Fast Infoset are viable alternatives as well. It’s also well worth discussing whether WS-Security and the myriad of related standards that were clearly built by security geniuses for security geniuses really need to be that complicated or whether we could all live with a handful of simple profiles and just cut out 80% of the options and knobs and parameters in that land.

I find it very sad that the discussion isn’t happening. Instead, people use the “REST” moniker as the escape hatch to conveniently ignore any existing open standard for tunnel-through-HTTP messaging and completely avoid the discussion.

It’s not only sad, it’s actually a bit frustrating. As one of the people responsible for the protocol surface of the .NET Service Bus, I am absolutely not at liberty to ignore what exists in the standards space. And this isn’t a mandate handed down to me, but something I do because I believe it’s the right thing to live with the constraints of the standards frameworks that exist.

When we’re sitting down and talk about a REST API, were designing a set of resources – which may result in splitting a thing like a queue into two resources, head and tail - and then we put RFC2616 on the table and try to be very precise in picking the appropriate predefined HTTP method for a given semantic and how the HTTP 2xx, 3xx, 4xx, 5xx status codes map to success and error conditions. We’re also trying to avoid inventing new ways to express things for which standards exists. There’s a standard for how to express and manage lists with ATOM and APP and hence we use that as a foundation. We use the designed extension points to add data to those lists whenever necessary.

When we’re designing a RPC SOAP API, we’re intentionally trying to avoid inventing new protocol surface and will try to leverage as much from the existing and standardized stack as we possibly can – at a minimum we’ll stick with established patterns such as the Create/GetInfo/Renew/Delete patterns for endpoint factories with renewal (which is used in several standards). I’ll add that we are – ironically - a bit backlogged on the protocol documentation for our SOAP endpoints and have more info on the REST endpoint in the latest SDK, but we’ll make that up in the near future.

So - can I build “REST” (mind the quotes) protocols that are as reduced as Facebook, Twitter, Flickr, etc? Absolutely. There wouldn’t be much new work. It’s just a matter of how we put messages on and pluck message off the wire. It’s really mostly a matter of formatting and we have a lot of the necessary building blocks in the shipping WCF bits today. I would just omit a bunch of decoration as things go out and make a bunch of assumptions on things that come in.

I just have a sense that I’d be hung upside down from a tree by the press and the blogging, twittering, facebooking community if I, as someone at Microsoft, wouldn’t follow the existing open and agreed standards or at least use protocols that we’ve published under the OSP and instead just started to do my own interpretative dance - even if that looked strikingly similar to what the folks down in the Valley are doing. At the very least, someone would call it a rip-off.

What do you think? What should I/we do?