What’s my Encoding Called?

There is a bit of confusion about the System.Text.Encoding names, primarily “Which name do I use for my Encoding?”

The Encoding class has 3 hame properties: BodyName, WebName and HeaderName, and the EncodingInfo objects returned by Encoding.GetEncodings have an additional Name property.  The examples in the MSDN documentation list a table.

EncodingInfo   Encoding
Name           CodePage  BodyName      HeaderName     WebName        EncodingName

shift_jis      932       iso-2022-jp   iso-2022-jp    shift_jis      Japanese (Shift-JIS)
windows-1250   1250      iso-8859-2    windows-1250   windows-1250   Central European (Windows)
windows-1251   1251      koi8-r        windows-1251   windows-1251   Cyrillic (Windows)
Windows-1252   1252      iso-8859-1    Windows-1252   Windows-1252   Western European (Windows)
windows-1253   1253      iso-8859-7    windows-1253   windows-1253   Greek (Windows)
windows-1254   1254      iso-8859-9    windows-1254   windows-1254   Turkish (Windows)
csISO2022JP    50221     iso-2022-jp   iso-2022-jp    csISO2022JP    Japanese (JIS-Allow 1 byte Kana)
iso-2022-kr    50225     iso-2022-kr   euc-kr         iso-2022-kr    Korean (ISO)

The short answer is that if you use the WebName in code that round-trips the encoding, like Encoding.GetEncoding(myEncoding.WebName()), you’ll end up with the same encoding you started with.  The WebName is also the same name that is used by EncodingInfo.Name.  The WebName is the name you should use if you need to recreate the same encoding later (fallbacks and other optional flags would be lost, but otherwise the behavior would be the same).

So if you’re “supposed” to use WebName, what’s the BodyName and the HeaderName for?

The idea behind the Header and Body names are to support email applications.  Not all encodings behave well in the body or header of an email, so these encodings shoudl be used instead.  For example, if you have the Encoding from Encoding.GetEncoding(“iso-2022-kr”) at the bottom of the list, then the WebName would allow you to round trip that name, however if you had data that you wanted to encode in the Header of an e-mail, then you should call Encoding.GetEncoding(myEncoding.HeaderName) and use that encoding for the header data.

The key is that the Header and Body names describe which encoding to use to support a similar set of characters to the Encoding in question.  Of course I’d recommend using UTF-8 &/or worst case UTF-7 whenever possible in e-mail.  For that matter I’d recommend using Unicode whenever possible, but sometimes older protocols or other limitations prevent that.

Comments (1)

  1. I was asked about our use of the windows "ansi" code page names, as used in things like MIME types, http