How do the common controls convert between ANSI and Unicode?


Commenter Chris Becke asks how the common controls convert ANSI parameters to Unicode, since the common controls are Unicode internally.

Everything goes through CP_ACP, pretty much by definition. The ANSI code page is CP_ACP. That's what ACP stands for, after all.

Now, there are some function families that do not use ANSI. The console subsystem, for example, prefers the OEM character set for its 8-bit strings, and file system functions can go either way, based on the setting controlled by the SetFileAPIsToANSI and SetFileAPIsToOEM functions.

In the scenario Chris describes, I suspect that the problem is not the ANSI-to-Unicode conversion but rather that the font selected into the listview didn't support the necessary characters.

Comments (10)
  1. Anonymous says:

    the ANSI code page is CP_ACP. That’s what ACP stands for, after all.

    Presumably you mean CODE_PAGE_CP_ACP?

  2. Anonymous says:

    > Presumably you mean CODE_PAGE_CP_ACP?

    CP_ACP is defined in winnls.h – I can’t find a declaration for CODE_PAGE_CP_ACP

  3. Dean Harding says:

    Actually, he does say it’s running on a Traditional Chinese system. I think my point is still valid, though — in my experience it’s usually that the program is expecting CP_ACP to be something when in reality it’s actually something else (my wife is Korean, so I see lots of programs expecting a Korean CP_ACP, when my computer is English).

  4. Dean Harding says:

    My guess for Chris’s problem is probably that the Chinese app was running on a English (US) computer (or something that *wasn’t* Chinese) so you end up getting gibberish. Basically, CP_ACP is set to English, but the application was assuming that CP_ACP was set to Chinese.

    You can use the AppLocale[1] tool from Microsoft to "trick" the application into thinking it’s running with CP_ACP set to Chinese.

    [1] http://www.microsoft.com/globaldev/tools/apploc.mspx

  5. Anonymous says:

    Hello,

    I’ve noticed in the MultiByteToWideChar documentation that Windowss 2000 and later support CP_THREAD_ACP additionally to CP_ACP.

    Do the common controls still use CP_ACP, or CP_THREAD_ACP instead ?

  6. Anonymous says:

    > Presumably you mean CODE_PAGE_CP_ACP?

    CP_ACP is defined in winnls.h – I can’t find a declaration for CODE_PAGE_CP_ACP

    Mike, I think that was a joke on the redundancy of CP_ACP :)

  7. Anonymous says:

    At the time I posted the question I had a bad understanding of how a fonts character-set selection would influence GDI’s later selection of codepage.

    The situation was, I was embedding a listview control in a standard dialog box. I can’t remember what font was chosen – which is a problem because it is likely meaningful to this question.

    I then loaded some strings froma string table and added the strings to some STATIC, EDIT and ListView controls using the relevent ANSI APIs.

    All the user32 controls displayed the text correctly. The ListView however displayed a string that was corrupt in places :- when I explored using LocaleExplorer the result was consistent with interpreting the string using the other chinese codepage.

    We solved the problem ultimately by fixing the font, but i was intreuged as to why the ListView – on that particular windows / language combo – was second guessing the codepage i’d use to be something other than CP_ACP.

  8. Anonymous says:

    The console subsystem isn’t even unicode for .net console output. You should be happy if it manage to output 8-bit ansi.

  9. Anonymous says:

    The console can’t support Unicode, not even UTF-8, since this would break some apps.  In DOS there is ASCII (byte values 0 – 127) which maps directly to UTF-8.  However with UTF-8 anything past there is different from ASCII.  UTF-8 only uses it for non-English letters, math symbols, etc, but in DOS text environments you didn’t have GUIs so there were also block and line characters you could use to create nice-looking text interfaces.

    Since some DOS programs use these still you can’t drop Unicode support into the console font and the console since output from Windows and DOS apps can be on the console at once, and some DOS apps use the "extended ASCII" character and your Windows apps would use UTF-8… one of them would end up looking ugly and broken.

    At least that’s what I understand it to be given my limited understanding of Unicode and NTVDM.

    I don’t know when Unicode was first widely used but MS only introduced Unicode support at an OS-level with NT.  With 9x/ME to provide Unicode support to an app you needed to bundle unicows.dll (Microsoft Layer for Unicode).

  10. Anonymous says:

    The other day, Raymond Chen blogged about How do the common controls convert between ANSI and Unicode?

Comments are closed.