Binary Encoding, Part 4

Article
09/22/2009

Past parts in the series:

Now that you’ve gotten an introduction to the principles and capabilities of the binary encoding format, let’s jump into looking at some examples of messages to see how it works.
Here’s a very short but inefficiently encoded binary message. We’ll see later how to make it quite a lot shorter.

0x41, 0x01, 0x73, 0x08, 0x45, 0x6E, 0x76, 0x65, 0x6C, 0x6F, 0x70, 0x65, 0x09, 0x01, 0x73, 0x27, 0x68, 0x74, 0x74, 0x70, 0x3A, 0x2F, 0x2F, 0x77, 0x77, 0x77, 0x2E, 0x77, 0x33, 0x2E, 0x6F, 0x72, 0x67, 0x2F, 0x32, 0x30, 0x30, 0x33, 0x2F, 0x30, 0x35, 0x2F, 0x73, 0x6F, 0x61, 0x70, 0x2D, 0x65, 0x6E, 0x76, 0x65, 0x6C, 0x6F, 0x70, 0x65, 0x01

This is admittedly unenlightening so let’s start parsing out the pieces of the message. Here’s the procedure I described earlier in part 2:

Each record starts with a one byte record type value. The record type byte is then followed by binary content of variable format and size based on the type. Each record in the stream of records translates into a document fragment. By concatenating all of the fragments produced from the record stream together we can obtain a document based on the original XML infoset.

The first byte in the message, 0x41, is the record type for an XML element. The format of an element record is the length of the prefix string, the bytes for the prefix, the length of the element name, and the bytes for the element name.

The next byte in the message, 0x01, is the length of the prefix, which means that 0x73 are the bytes for the prefix. If you go to your standard UTF-8 or ASCII character table, you’ll see that 0x73 is the byte sequence for the letter "s".

Similarly, the next byte in the message, 0x08, is the length of the element name, which means that 0x45, 0x6E, 0x76, 0x65, 0x6C, 0x6F, 0x70, 0x65 are the bytes for the element name. Translating that byte sequence into characters we get the letters "Envelope".

That means we’ve just processed a record that gives us the XML fragment "<s:Envelope". We don't know yet whether we've seen the entire XML element yet. For example, the following records could be attributes or just as easily could be the content within the element. We'll find out the context when we get there.

The next byte in the message, 0x09, is the record type for an XML namespace declaration. Recall that we used the prefix "s" with the envelope element so we do owe the reader a definition of that namespace. The format of a namespace record is the length of the prefix string followed by the bytes for the prefix.

The next byte in the message, 0x01, is the length of the prefix, which means that 0x73 are the bytes for the prefix and we indeed are now defining that prefix we saw earlier. Then, the next byte in the message is 0x27 indicating that we have 39 bytes of character data coming. That character data forms the string "www.w3.org/2003/05/soap-envelope".

The XML fragment for this record is "xmlns:s="www.w3.org/2003/05/soap-envelope"".

Finally, the next and final byte in the message, 0x01, is the record type for closing an element. There's no ambiguity about the element to close because the binary encoding only supports well-formed XML and we always know the most recently started element. Therefore, the end element record doesn't have any data.

The XML fragment for this record is "</s:Envelope>".

That was the last record in the message so we can put the fragments together to see that the message spells out a very common portion of a SOAP message:

<s:Envelope xmlns:s="www.w3.org/2003/05/soap-envelope"></s:Envelope>

It would be nice to not have to carry all of that character data because the same strings are going to be in virtually every SOAP message that is sent. There was virtually no size advantage to using the binary format in this case as compared to a text encoding. Next time we'll look at how to squeeze down the size of the message.

Binary Encoding, Part 4

Additional resources