Does version 6 of the common controls support ANSI or not?


I mentioned in passing a few years ago that version 6 of the common controls supports only Unicode. And then other people stepped in to say, "Well, XYZ uses ANSI and that works for me." So does it support ANSI or doesn't it?

It does and doesn't.

All of the controls in the common controls library are internally Unicode. But not all controls in the library are created equal.

The first group is the traditional common controls. List view, tree view, those guys. These controls were never part of the window manager and have been internally Unicode on all Windows NT platforms. The ANSI messages such as LVM_SETITEMA are implemented by thunking to and from Unicode.

The second group is the controls that were traditionally part of the window manager itself. If you aren't using version 6 of the common controls, you will continue to use the versions built into the window manager, and those versions, for the most part, are also internally Unicode.

The one weirdo is the edit control. The edit control uses black magic voodoo to tell whether you created it with CreateWindowExA or CreateWindowExW, and its internal edit buffer is ANSI or Unicode accordingly. (Regular window classes don't have access to this magic voodoo. It's one of the historical weirdnesses of the edit control that date back to the old days.)

The internal character set goes largely unnoticed since the window manager automatically converts between Unicode and ANSI as necessary. For example, if you call SetWindowTextA to a Unicode edit control, the window manager will convert the string from ANSI to Unicode and send the Unicode string to the edit control. The one place the internal character set becomes visible to the outside world is with the EM_GETHANDLE and EM_SETHANDLE messages, because these messages access the internal buffer of the edit control. You therefore have to know whether your edit control is a Unicode or ANSI edit control so you know the correct format of that internal buffer.

When these window manager controls were ported into the common controls library, the voodoo was lost, since that magic is available only to internal window manager classes, and the common controls aren't internal window manager classes. Since the common controls library uses RegisterClassW to register the window class, the edit control that comes with the common controls is a Unicode edit control. In other words, if you use CreateWindowA to create an edit control from the common controls library, and you send it a EM_GETHANDLE message, the buffer you get back will be a Unicode buffer, not an ANSI one.

This wacky behavior with EM_GETHANDLE, as well as other even more subtle weirdnesses that come from the edit control in the common controls library being always internally Unicode means that code that calls CreateWindowA and expects the result to be an edit control which is internally ANSI will be in for a bit of a surprise when they are using version 6 of the common controls library.

These and other subtle ANSI/Unicode discrepancies are why the common controls library, starting with version 6, requires a Unicode application. If you're an ANSI application and you create controls from the common controls library, you may encounter strange behavior. It'll mostly work, but things may be weird at the fringe.

Now, why not just get rid of all the ANSI support entirely? Why leave it in, even though it doesn't quite work perfectly? For the same reason the Windows XP common controls are not a separate library with separate window class names. As noted, there are programs that like to go hunting around into windows that don't belong to them. Some of those programs might stumble upon one of Explorer's list views and use various nefarious techniques to do things like stealing strings from another program's list view control. If support for the ANSI messages such as LVM_GETITEMA were removed entirely, then those sneaky programs would stop working.

You might say, "Well, tough for them." You'll say that until you discover that one of those sneaky programs happens to be one that you use every day, possibly even one that you wrote yourself. Oops. Now you're going to tell all your friends, "Don't upgrade to the next version of Windows. Its compatibility sucks."

Okay, so the common controls still have to be backward compatible with the ANSI messages that existed in version 5. But at least the new messages such as LVM_SETINFOTIP can be Unicode-only.

And it means that all you folks who are using version 6 of the common controls but haven't converted to Unicode are relying on a compatibility loophole. The ANSI support is there for the old programs that thought they were talking to a version 5 common control; it isn't there for you.

Comments (21)
  1. Anonymous says:

    Is this a tacit warning that things may not work so well in Windows 7?

    If so, I guess I’d better start porting hundreds of thousands of lines of code to Unicode, being careful not to break the old mcbs file format.

  2. Anonymous says:

    I have to admit that I never noticed that the second group of controls stopped being traditionally part of the window manager. I now can’t stop wondering why.

  3. Anonymous says:

    I guess it’s time to provide SxS USER32.DLL and other common DLLs. Solves compatibility problems. Unless the EXE is marked as Win6+ compatible, make it run as if it were Windows XP.

    And REMOVE ANSI SUPPORT from USER32. It’s about time. If those losers apps need to pull stuff out of third party windows, too bad.

    [Alexandre Grigoriev, meet Anthony Wieser. You two can duke it out. -Raymond]
  4. Anonymous says:

    Does this also imply that we should stop using all those _T("") macros, and _tcscpy etc, as effectively ANSI is dead?

  5. Anonymous says:

    Antony Wieser,

    You only need those TCHAR things if you write dual-purpose code – compileable with either ANSI or UNICODE. As long as you’re not playing with dead OSes, just use L"strings" and "W" chars and functions.

  6. Anonymous says:

    Raymond,

    My suggestion to remove ANSI from USER32 was for V7 SxS USER32, which would only be used for apps marked with V7 compatibility. Besides from that, you’ll finally be able to remove all those horrible compat hacks from it.

    [So a module linked with v6 USER32 v6 which performs operations on a window created with v7 USER32 will lose the compat hacks? Good luck writing programs that support plug-ins – each plug-in might be written with a different version of USER32 yet they all need to talk to each other. The interop boundary when doing SxS is nasty. A lot of the SxS work in comctl32 was at the boundary and we didn’t get it all right the first time). -Raymond]
  7. Anonymous says:

    "The ANSI support is there for the old programs that thought they were talking to a version 5 common control; it isn’t there for you."

    In the end, if it breaks enough important programs you’re just going to keep piling on the compatibility fixes anyway.

    If I worked at Adobe I would make sure Photoshop relied on every undocumented behavior or compatibility fix I could find just to annoy you.

  8. Anonymous says:

    Raymond, your link to Anthony Wieser under Alexandre Grigoriev’s first comment just comes right back here to this entry.  I’m guessing that’s a mistake.

    [It links to Anthony Wieser’s comment on this page. -Raymond]
  9. Tihiy says:

    Hmm, excuse me stupid, but why comctlv6 “EDIT” class couldn’t be a superclass of standard user32 edit control? Is hundreds KBytes of code duplicated only for non-client paint and balloons? User32 exports EditWndProc (perhaps for another purpose), why comctlv6 couldn’t play nice and use it (and may be other tricks)? Or it is this way really?

    This “side-by-side” WinSxS idea still frightens me (especially in Vista).

    [We tried that first. -Raymond]
  10. Anonymous says:

    “And it means that all you folks who are using version 6 of the common controls but haven’t converted to Unicode are relying on a compatibility loophole. The ANSI support is there for the old programs that thought they were talking to a version 5 common control; it isn’t there for you. “

    Unfortunately, I am sure there are lots and lots of apps that rely on this compatibility loophole, including, as mentioned in another blog article, Product X, in particular because of poor documentation.

  11. Anonymous says:

    Gosh darn it, it is called ASCII. ANSI is the American National Standards Institute, ASCII is the American Standard Code for Information Interchange. Somebody at Microsoft needs to figure this out and fix all the places where it is wrongly called ANSI in the documentation (I suppose asking for it to be fixed in publicly available source from Microsoft is too much to ask). I don’t want to be mean but I’ve seen the term ANSI misused a lot in computer circles, and I think a lot of that confusion is due to Microsoft mis-using the term for many years.

    [It’s not ASCII either. ASCII defines only the first 128 characters. The reason for the name ANSI is historical. You are welcome to propose another name, but ASCII is an even worse alternate name than ANSI. At least ANSI was correct at one point. ASCII never was. -Raymond]
  12. Anonymous says:

    "The interop boundary when doing SxS is nasty."

    I totally agree with this statement. I had to make a library recently and mess with Fusion to make sure the controls created by it were always v6 no matter the caller. You have to sprinkle xxxActCtx calls *everywhere*. Whenever the execution enters or leaves your code (and you better not forget one). USER32 does it for WndProcs for you, but for other callbacks you have to do it yourself.

    As for "removing the ANSI APIs from USER32", I’d dare not say to go to such lengths. However, here’s an idea I got recently and even submitted in the Win7 "Send Feedback": Why not put all ANSI-API-related code in their own page? Like .textA or something. So they would ONLY be swapped in when an ANSI call is made. Right now, I bet the ANSI code is interleaved with the rest, so swapping in a page for an Unicode function may swap in some ANSI functions around it too.

  13. Anonymous says:

    We need the non-Unicode support also to make older programs draw the new controls using a manifest file, do we not? If all ‘ANSI’ support were dropped, we wouldn’t be able to use those. Even if it were only dropped from (the interface to) the new controls, we wouldn’t be able to use them unless we were content with Windows-95-like controls.

    Oh, and in response to a post above, it isn’t called ASCII either. For almost all functions I’ve encountered it is what just happens to be the local system encoding, which is usually Windows-1252 but can even be something like SJIS. Yes, that is mixed single/double byte encoding. Have fun.

  14. Anonymous says:

    I wonder if this explains why I am *sometimes* sent TTN_NEEDTEXTA and *sometimes* TTN_NEEDTEXTW in my non-unicode application using the new common controls?

  15. Anonymous says:

    Mixing ANSI and unicode is fun!

    I do stuff in Windows CE, which is strictly Unicode for everything. However, some ANSI functions remain because we need it. When you’re dealing with stuff externally that used 8-bit characters, you need it. Now that’s easy since the external facing code has to deal with it.

    But if you have third party code that assumes a char is 8-bits wide… things get fun since your glue code is now littered with Unicode and ANSI. Hilarity ensues if you mix your buffer pointers.

  16. Anonymous says:

    Sorry, Raymond. I didn’t fully comprehend the historical reasons for calling it ANSI. I suppose the correct name is Windows-1252, not ASCII.

    [That’d be wrong on any system where the Language for non-Unicode Programs is not US-English. -Raymond]
  17. Anonymous says:

    The reason I’ve resisted so far is that everything appeared to work (well apart from CEditView in MFC, which doesn’t quite hack around it correctly) but does at least explain this:

    http://groups.google.co.uk/group/microsoft.public.vc.mfc/browse_thread/thread/9ffefbf89263d8dd?hl=en&ie=UTF-8

    So faced with a working program (that happens to interface to a lot of hardware via a serial port that uses 7 bit characters to communicate), I’ve resisted the temptation to have so much fun.

    What is the future of lstrcmpA and friends?  Deprecation?

  18. Anonymous says:

    Dear Raymond, btw –

    why version 6 of the common controls dosn’t support RealGetWindowClass ?

  19. Anonymous says:

    Why does windows use 16Bit chars for unicode at all? I see no real advantages when comparing to utf-8.

    * UTF-8 gains one byte per char for ascii, and loses one for chinese characters etc.

    * Both UTF-8 and UTF-16 have no fixed character size

    * UTF-16 requires a whole new apiset whereas UTF-8 only requires a new codepage as windows "ANSI" functions already support multi byte chars.

    Was UTF-8 simply unknown to them/too new at the time? Or did they think 16Bit is enough for a character(perhaps because unicode had <65k chars at that time) because no(resulting in fixed charsize)?

  20. Anonymous says:

    Why does windows use 16Bit chars for unicode at all? I see no real advantages when comparing to utf-8.

    * UTF-8 gains one byte per char for ascii, and loses one for chinese characters etc.

    * Both UTF-8 and UTF-16 have no fixed character size

    * UTF-16 requires a whole new apiset whereas UTF-8 only requires a new codepage as windows “ANSI” functions already support multi byte chars.

    Was UTF-8 simply unknown to them/too new at the time? Or did they think 16Bit is enough for a character(perhaps because unicode had <65k chars at that time) because no(resulting in fixed charsize)?

    [Check your history. Windows NT release date versus UTF-8 release date. -Raymond]
  21. Anonymous says:

    Three days into converting the 100,000 line project, and the code’s mostly running again.

    Along the way though, it’s amazing what you stumble across.

    First the good bits.  

    1. Serialized CStrings make it seamless to change between the formats.
    2. CStringA and CStringW classes make it simple to switch between the two character sets.

    Now the bits that made me write different code:

    1. Clipboard formats.  

       a. The registered "Rich Text Format" seems to be ANSI only.

       b.  CF_FILENAME changes width between the two versions if you use CFSTR_FILENAME.

  22. wfstream and fstream handle wide characters very differently, but ANSI the same.

  23. Handling the WM_GETTEXT message is ANSI/UNICODE dependent, but the parameter signature in MFC doesn’t indicate that.

  24. AnsiNext doesn’t really exist anymore-this code was really old-though CharNext and CharPrev do instead.  Michael Kaplan points out here http://blogs.msdn.com/michkap/archive/2005/01/14/352802.aspx

  25. that surrogate pairs still dont’t work with that API.  Maybe they do now?

    1. Fonts.  Well, Arial doesn’t do a very good job on some unicode characters.  So, what should the default font be?
  26. MutliByteToWideChar doesn’t do a very good job on weird stuff like: ⓇⓈⓉ, and Michaels example åüộåüộåüộåüộ gets expanded out.

  27. My head hurts.

Comments are closed.