How do I tell if a user’s entering Unicode data into my ANSI app?

#1:  Is there a really, really good reason you didn’t compile your app with the unicode flag?  Just the fact that you’re asking this question is a pretty big red flag that some user might want to enter Unicode data into your app.  Supporting Unicode is pretty painless (at least if you plan for it from the start), so it seems like the planning process took a wrong turn somewhere :)  See the Some Reasons to May Your Applications Unicode post.

Should you actually need to discover if Unicode data is being entered, you may be pretty stuck.  For input boxes and other system SDK stuff, the OS authors don’t usually want to write two versions of the code, so basically what happens is that often there’s a “Unicode” version of the whatever system think you’re using.  Then if you actually call the ANSI version instead of the Unicode version, the OS has to convert from ANSI to Unicode (for whatever parameters), and then convert back from Unicode to ANSI when it’s done.  That can cause bad things to happen, like data loss or corruption because the code page didn’t have the character.  Often, the application doesn’t even know that this happened, because it was lost before the application got the data.  The corruption is usually replacing characters with ? (though that can be different characters for different ANSI code pages, so if you just look for “?” your app won’t work in some locales).  Or, if best fit was used, then you could get similar things.  Eg:  If they typed ∞ (infinity U+221e), you get 8 for your input (∞ and 8 being quite obviously nearly the same thing, just tip your head a bit). 

So: If you really want to know if the data’s being lost, then call the *W Unicode APIs.  Then you can use WideCharToMultiByte() and do the conversion yourself, using the (slow) flag to see if anything was dropped/changed.  Of course if you’re calling the *W APIs anyway, you’re 99% of the way to being a Unicode app, so the easiest thing might be to just fix your app to be Unicode.

Comments (0)