The code page on the server is not necessarily the code page on the client


It's not enough to choose a code page. You have to choose the right code page.

We have a system that reformats and reinstalls a network client computer each time it boots up. The client connects to the server to obtain a loader program, and the loader program then connects to the server to download the actual operating system. If anything goes wrong, the server sends an error message to the client, which is displayed on the screen while it's still in character mode. (No Unicode available here.)

Initially, we used FormatMessageA to generate the error message, but somebody told us we should use FormatMessageW followed by WideCharToMultiByte(CP_OEM). I'm not sure whether this is a valid suggestion, because the client hasn't yet installed Unicode support so it only is capable of displaying 8-bit text, and using CP_OEM will use the OEM code page on the server, which doesn't necessarily match the OEM code page on the client.

What is the correct way of generating the error message string?

Now, mind you, the argument against using CP_OEM is the same argument against using FormatMessageA! In neither case are you sure that the code page on the server matches the code page on the client. If CP_OEM is wrong, then so too is FormatMessageA (which uses CP_ACP).

The correct solution is to use FormatMessageW followed by WideCharToMultiByte(x), where x is the OEM code page of the client. You need to get this information from the client to the server somehow so that the server knows what character set the client is going to use for displaying strings.

There's really nothing deep going on here. If you're going to display an 8-bit string, you need to use the same code page when generating the string as you will use when displaying it. Keep your eye on the code page.

Comments (10)
  1. 640k says:

    7-bit codepage is enough for everyone.

  2. sandman says:

    Ick. Is there really such a thing as a 7 bit code page.

    I thought the 0-127 char positions of all the code pages where equivalent to what used to be coquailly called ASCII.

  3. Dean Harding says:

    sandman: that was his point. As long as you believe English is good enough for everyone…

  4. SvenGroot says:

    Depends on how you define 7-bit codepage. Many codepages allow multibyte characters, and an example of a multibyte character codepage that uses only 7 bits per byte is utf-7.

    And the first 128 positions of any given codepage do not need to match ASCII per se. The letters usually do, but the rest, not so much. A well-known example is 0x5C, which is the backslash in ASCII. This is different in many codepages, e.g. in JIS (Japanese) it’s a yen sign ¥. Which leads to the effect that many non-English versions of Windows use something other than a as the path separator. On a Japanese version of Windows, a path would like like C:¥Windows¥System32. This is still the case under Windows NT; although it probably doesn’t need to be the case for Unicode apps, people are used to it and changing it would mean Unicode and non-Unicode apps on the same machine would display the paths differently.

  5. Nathan says:

    One thing I’ve learned in my years of programming: if you want people to do the right thing, it should be the path of least resistance. The more hoops people have to jump thru, the more likely they are to botch it.

    Unicode is something that should have been sent back as half-baked, and let stew for a while until it’s as easy to use as ascii. It is NOT anywhere near as easy to use. And that’s bad for everyone — programmers, and end users.

  6. Dean Harding says:

    Err, in what way is Unicode not as "easy to use" as ASCII? This post, for example, would not even be required if everything had been Unicode.

  7. Michiel says:

    In all fairness, UTF-16 is just a bad idea. It turns a multi-byte encoding into a multi-word encoding, introducing endianness as an additional complexity. UT-8 is a much cleaner solution, if only for making all ASCII text also UTF-8.

  8. John Elliott says:

    According to IBM’s codepage list, codepage 367 is 7-bit US-ASCII: <ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00367.pdf&gt;. But I don’t think Windows includes support for it.

  9. KJK::Hyperion says:

    John Elliot: in Windows, US-ASCII is codepage number 20127

  10. Dmitry says:

    That’s why I hate when a computer (a program, OS, etc.) tries to talk to me in any language but English (which is not my mother tongue).

    Anything can go wrong: code page not supported, font does not have appropriate characters (ever seen ????? ????? instead of text in a critical error message?).

    In this particular case, there is a clear rule for client-server comms: "Never return a text. Return numeric error code, and let client display the text"

Comments are closed.