Small Basic: Strings and Characters in Non-English Languages


Strings and Character Encodings in Non-English Languages

Many of the strings you deal with are likely English characters. For example, you might write this:

treasureChest = “My booty!”

That creates the string “My booty!” and assigns it to the treasureChest variable. When you use this variable in your program, you didn’t have to think or worry about how this string is stored in your computer’s memory. You know that each character in this string is represented by a sequence of 1s and 0s, but you didn’t really care about the details of this encoding. But what if you want your program to display a greeting in French, Greek, or Chinese? To do that, it’s important to peek under the hood and learn how the characters are encoded!

For many years, characters in the English language were encoded using the American Standard Code for Information Interchange (ASCII, pronounced “ask’ee”) standard. Got a question? Just ASCII it.

The ASCII codes for the printable English characters are shown in the following figure. For example, the code for ! is 33, 0 is 48, A is 65, I is 73, and a is 97. ASCII codes 0 through 32 are assigned to nonprintable characters, such as code 8 is for backspace and code 9 is for horizontal tab.

Because it’s an American standard, ASCII doesn’t support characters in other languages. That’s why the universal encoding scheme (Unicode) standard was developed. Unicode supports millions of characters and symbols. Every character’s represented by a unique number (called code-point). For example, the code-point for lowercase a is 97, which is the same as the ASCII code for lowercase a. A Unicode string is a sequence of these code-points. The rule for translating code-point numbers into 1s and 0s is called a Unicode Transformation Format (UTF). A popular format is the UTF-8 used to encode email and web pages.

Unicode only provides mapping between characters and numbers; it has nothing to do with the font used to display a character. For example, look at the following statement:

ch = Text.GetCharacter(77824)

It assigns the hieroglyphic character  to the variable ch. However, if you try to display this character using GraphicsWindow.DrawText(), you probably won’t see this symbol. To see the symbol, you’ll need to install its font on your computer.

Additional Resources:

Do you have any questions? Ask us! We’re full of answers and other fine things!

Head to the Small Basic forum to get the most answers to your questions: 

http://social.msdn.microsoft.com/Forums/en-US/smallbasic/threads/   

And go to http://blogs.msdn.com/SmallBasic to download Small Basic and learn all about it!

Small and Basically yours,

   – Ninja Ed & Majed Marji

Comments (1)

Skip to main content