One Codepage to rule them all : Unicode, C#, XNA & Fonts

We're all familiar with the time-worn verse:

Three Codepages for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Four Hundred and Seventy-Three for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Redmond where the Shadows lie.
One Codepage to rule them all, One Codepage to find them,
One Codepage to bring them all and in the darkness bind them
In the Land of Redmond where the Shadows lie.

The One Codepage referred to is, of course, Unicode. Sauron created Unicode in secret during the Second Age and then tricked mankind into adopting such encoding monstrosities as EBCDIC, Latin I and Shift-JIS. The defeat of Sauron led to the release of Unicode 1.0 - the first serious attempt to unify the world's characters into a single encoding. Unicode has been enhanced over the years and the latest version (5.0) claims to cover "all the characters for all the writing systems of the world, modern and ancient".

So how is Unicode relevant to the current discussion of programming with XNA?

It's relevant because C# adopted Unicode as the internal format for all strings. This means that C# (and thus, XNA) internally supports (pretty much) all the world's languages. You may still need to worry about codepages when encoding your source files or when dealing with external data like reading/writing files, but once your code is running, all the strings are internally Unicode.

Which leads to the next question: Can we display this text on the screen?

On Windows, the answer is "of course", assuming you have the proper fonts installed on your machine. ...and assuming that the person you give your binary to also has the proper fonts installed. If the fonts are missing, then some sort of font substitution will take place (most likely messing up your carefully crafted interface).  This is why games typically use bitmap fonts that are shipped along with the game. With bitmap fonts, you know that the text will be displayed correctly on the end-user's machine since you're in control of the entire rendering process.

On the 360, note that there are no system fonts available and your only option is to use bitmap fonts. Thus, if you're at all concerned about cross-platform compatibility, you need to use bitmap fonts.

So, where can we get bitmap fonts?

There are a variety of bitmap font creation tools available. The ones we consider work by taking an outline font (provided by the OS) and producing a bitmap of the font at a fixed size. Of the many tools available, the ones that I've played with a bit include LMNOpc's Bitmap Font Builder and AngelCode's Bitmap Font Generator. AngelCode's in particular does an excellent job of arranging the glyphs tightly so that you're not wasting space in the texture.

However, all of the bitmap font creation tools that I've encountered suffer from one problem : lack of Unicode support. Even when selecting a font like "Lucida Sans Unicode" the only codepage options are ANSI, Eastern Europe, Greek, and so on...

Even worse, if you select a Japanese font you only get the first 256 glyphs in the font!

Even more worse, if you want to use these fonts in your XNA game, you need to create a mapping between Unicode (used in your code) and whatever codepage you used to extract the font. You can do this in code, or by tweaking the bitmap font to expose Unicode codepoints. Your choice. [1]

Ugh! This is not good!

Why can't we just have Unicode bitmap fonts?

Indeed, why not? I'll continue with this topic in the next post...

[EDIT] Added link to followup post


[1]: Actually, as long as you stick with 7-bit ASCII characters, you should be OK. But where, I ask you, is the fun in that? All the interesting characters are >= 0x0080