Binary Encoding, Part 3

Article
09/17/2009

Past parts in the series:

Today I’ll talk about the XML features that are and aren’t supported by the binary encoding format we use in WCF.

Since the binary format was designed for a specific purpose, round-tripping essentially the XML infoset being manipulated in memory as opposed to round-tripping the rendered XML documents, several features that are only relevant at the level of a rendered document are omitted. Similarly, features that only have significant differences from other features for rendered documents are omitted to canonicalize the representation.

Here’s the general list of XML features that are not supported:

- Documents that look a lot like XML but aren’t syntactically correct (many web pages don’t strictly follow the rules for XML because web browsers are generally very forgiving)

Processing instructions (including the XML processing instruction that contains the character set)

DTDs
Character references and expansions
The compact format for elements without content that is self-contained rather than having a closing tag (since we only support legal XML and human readability is not a goal, we can encode the end tag in a single byte token all the time already)
CDATA sections
Preservation of significant whitespace

That leaves almost every other XML feature you might think of as supported by one record type or another. The list includes structural features, such as elements, attributes, namespace declarations, and comments. The list also includes content features, such as booleans, integers, floating-point numbers, fixed-point numbers, strings, dates, time spans, byte arrays, guids, unique identifiers, and qualified names.

The encoding tricks of the binary format are primarily through the choice of supported record types, having variable-sized integers to reflect that most of the needed values are small, and using numerical references to interned strings rather than repeating the contents of the string each time it is used. Going over some examples of records next time should illustrate these common features.

Binary Encoding, Part 3

- Documents that look a lot like XML but aren’t syntactically correct (many web pages don’t strictly follow the rules for XML because web browsers are generally very forgiving)

Additional resources