XmlWriter, encodings and BOM

Today I want to talk about XmlWriter and the generation of a Byte Order Mark (BOM).

XmlWriter provides an API that generates, unsurprisingly, XML. This XML will typically end up as a managed string of characters or possibly a sequence of bytes. Of course, text transformed into bytes implies an encoding, as previously discussed.

Now XML has its own ways of determining the encoding that a document has, by peeking at the first bytes that make up an opening <?xml declaration or, more explicitly, with the encoding on this declaration.

Unicode is used for all sorts of puposes, not just XML encoding, and so it also has a mechanism to distinguish between small-endian and big-endian encodings, which determine which byte comes first in UTF-16 and UTF-32. It's also allowed for UTF-8, for that matter.

How do these mechanisms interact when using the .NET Framework classes? Let's write some code!

First, we'll write a short helper method to display the contents of a byte array.

private static void ShowBuffer(string linePrefix, byte[] bytes, long length) {
int bytesOnLine = 0;
for (long i = 0; i < length; i++) {
if (bytesOnLine == 0) {

Console.Write("{0:X2} ", bytes[i]);
if (bytesOnLine > 16) {
bytesOnLine = 0;

Next, let's write a method to write out some short XML.

private static void WriteXml(XmlWriter xmlWriter) {

Wel'll try different combinations of layering an XmlWriter with some encoding over a StreamWriter with a different encoding (or directly over a stream) to see what happens. These two methods will help us out.

private static long WriteEncodedXml(
Encoding streamEncoding,
Encoding xmlEncoding,
Stream stream) {
XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = xmlEncoding;
settings.Indent = false;

if (streamEncoding != null) {
using (StreamWriter writer = new StreamWriter(stream, streamEncoding))
using (XmlWriter xmlWriter = XmlWriter.Create(writer, settings)) {
return stream.Length;
} else {
using (XmlWriter xmlWriter = XmlWriter.Create(stream, settings)) {
return stream.Length;

private static void ShowXmlEncoding(
Encoding streamEncoding,
Encoding xmlEncoding) {

Console.WriteLine("Stream Encoding: " +
((streamEncoding == null) ?
"(no stream)" : streamEncoding.EncodingName));
Console.WriteLine(" XML Encoding: " + xmlEncoding.EncodingName);

MemoryStream stream = new MemoryStream();
long length = WriteEncodedXml(streamEncoding, xmlEncoding, stream);
byte[] bytes = stream.GetBuffer();
ShowBuffer(" ", bytes, length);

Finally, here is the method to drive it all.

public static void Main(string[] args) {
// First encoding is for stream writer, second is XML writer.
ShowXmlEncoding(null, Encoding.UTF8);
new UTF8Encoding(/* encoderShouldEmitUTF8Identifier */false));
ShowXmlEncoding(null, Encoding.Unicode);
ShowXmlEncoding(null, Encoding.BigEndianUnicode);

ShowXmlEncoding(Encoding.ASCII, Encoding.Unicode);

// Muhaha.
Encoding muhaha = Encoding.GetEncoding(
new EncoderExceptionFallback(),
new DecoderExceptionFallback());
ShowXmlEncoding(null, muhaha);

You can run this now and see what comes up. Tomorrow, a short analysis of some interesting results.


Skip to main content