Why am I getting mojibake when I try to create a window?


A customer was compiling their program as Unicode, but since their data was almost all in ASCII, they were using the ANSI versions of the APIs. They registered their class with the Register­ClassA function and created it with the Create­Window­ExA function. But the window title came out as Chinese mojibake.

But not all strings were coming out corrupted. Strings they passed to Message­BoxA, for example, were displayed correctly.

The customer shared their code that registered the window class and created the window.

WNDCLASSA wc = { };
wc.style = 0;
wc.lpfnWndProc = AwesomeWndProc;
wc.cbClsExtra = 0;
wc.cbWndExtra = 0;
wc.hInstance = MyAwesomeInstance;
wc.hIcon = MyAwesomeIcon;
wc.hCursor = LoadCursor(nullptr, IDC_ARROW);
wc.hbrBackground = (HBRUSH)(COLOR_WINDOW + 1);
wc.lpszClassName = "MyAwesomeClass";
if (!RegisterClassA(&wc)) return FALSE;

hwnd = CreateWindowExA(
    0, "MyAwesomeClass", "My awesome title",
    WS_OVERLAPPEDWINDOW,
    CW_USEDEFALT, CW_USEDEFALT,
    CW_USEDEFALT, CW_USEDEFALT,
    nullptr, nullptr, MyAwesomeInstance, 0);

Chinese mojibake usually means that somebody took an ANSI string and misinterpreted it as a Unicode string. The puzzle is to figure out where it happened.

My psychic powers told me that the answer was in code they didn't provide:

Can you check that your window procedure is calling the correct Def­Window­Proc?

Indeed, that was the source of the problem. Their window procedure was registered with Register­ClassA, which means that it is an ANSI window procedure and will be given ANSI window messages. Their window procedure was finishing with a call to Def­Window­Proc, which due to the project's configuration as Unicode, meant that they were calling Def­Window­ProcW. They were passing ANSI window messages to a Unicode function, and that was the source of the mojibake.

The fix for this specific problem was to finish with a call to the Def­Window­ProcA function.

But really, the real problem is that they compiled a program as Unicode, while carefully avoiding every Unicode feature. If you miss a spot and accidentally do a Unicode thing, you might get lucky and trigger a compiler warning. Or you might be unlucky, and everything will compile, and something will subtly go wrong at runtime. And chasing down all those subtle errors will be time-consumingl.

I don't know why they set it to Unicode and then try to avoid all the Unicode stuff. If they dislike Unicode so much, they may as well be clear about it: Set the project to ANSI.

Comments (9)

  1. skSdnW says:

    It is possible that most of their program is Unicode but perhaps they are hosting a legacy component that requires a ANSI host window?

    I once wrote a utility that had to run on 9x and NT (with Unicode support) and WinMain looked something like this:

    if (IsNT()) return App<WCHAR>::Run(); else return App<CHAR>::Run();

    While making the whole app a C++ template helps a little bit I still ended up with some ugly macros and A/W functions without string parameters are easy to get wrong. You also double your code size.

    1. I’m curious as to why you didn’t use the Microsoft Layer for Unicode on Win9X ? I imagine that it was too much limited somehow ?

      1. skSdnW says:

        MSLU was only available in the early 2000s and this took place before that. It also had to be a single binary so I could not use tchar.h and produce two binaries.

    2. koro666 says:

      It’s all fun and games until you call Shell_NotifyIcon, and then your binary does not start on Win9x because Shell_NotifyIconW is not exported at all from SHELL32.DLL…

  2. Or they could use Unicode windows and convert their ASCII data to Unicode strings where needed to pass to Windows APIs.

    1. The way the customer phrased the problem, I got the impression that that they intensely disliked Unicode and had no intention of using it. They wanted to just use ASCII all day. Their project was Unicode because Visual Studio defaults new projects to Unicode and I guess it didn’t occur to them to change the default?

      1. UTF8 Everywhere explicitly has you turn on Unicode and not use it so you get cast errors trying to pass UTF-8 strings to the Windows API.

        http://utf8everywhere.org/

  3. DWalker says:

    THIS. Although, it would be good to know if the advice given at that site is outdated. The site gives good reasons why you should turn on Unicode and then not use any of the Unicode features.

    1. DWalker says:

      My comment was meant to be a reply to Joshua___’s comment.

Skip to main content