If you misinterpret ANSI text as Unicode, you usually get nonsense Chinese text. If you misinterpret Unicode text as ANSI, why do you usually just get the first character?
Okay, this one is a lot easier.
The Latin alphabet fits in the range U+0041 through U+007A. If you're using the UTF16-LE encoding (which is what Unicode means in the context of Windows), then the first byte will be the correct character, and the second byte will be zero, which will serve as the string terminator.
(char*)L"Abc" will act like
I remember looking at the registry and finding a registry
key directly under
In other words, the program stored its settings under
This bugged me enough that I dove in to figure out how this happened.
The program in question had a Windows 95 version
and a Windows NT version.
They compiled both versions from the same code base by
so that when compiled for Windows 95, it was an ANSI program,
and when compiled for Windows NT, it was Unicode.
The program came with a helper DLL, which was also compiled
as ANSI for Windows 95 and as Unicode for Windows NT.
The name of the DLL was not inside an
so even though the code was compiled twice, both versions
of the DLL had the same name.
.def file and the internal
library's header file did not contain any
So the Windows 95 version of
had an exported function called
which accepted an ANSI string.
And the Windows NT version of
had an exported function called
but which accepted a Windows NT string.
The problem was that their Windows NT product shipped with the Windows 95 version of the helper DLL!
Since the DLL name was the same, and the function names were the same, the operating system happily loaded the DLL and imported the function name successfully, even though it was the wrong function.
As a result, the Windows NT version passed a Unicode string
to a function that interpreted it as an ANSI string,
and the registry key name
became misinterpreted as just
There are a few ways of avoiding the problem.
The obvious one is to abandon the Windows 95 version of the product. Because c'mon now.
Okay, but let's go back in time to a period when supporting Windows 95 was still a reasonable thing to do.
One option is to give the Windows 95 and Windows NT versions of the DLL
That way, when a program linked to
but you accidentally put
HELPERA.DLL in the product
you would get a "DLL not found" error instead of running ahead
with the wrong DLL.
Mind you, this solution would catch the problem only if it occurred
But if the problem was that the code linked together some object files
compiled in ANSI mode and some object files compiled in Unicode mode,
say because you used the wrong version of a static library,
then the error would go undetected because both sets of object files
will look for the function
and if the module was linked with (say)
then both sets of object files will link to
even though half of them thought they were linking to
What they should have done was change the names of the exports.
Export two functions
Use an inline helper function or a macro in the header file
so that ANSI clients are directed to
Unicode clients are directed to
The implementation of the helper DLL need only implement the
versions of the functions corresponding to the desired character set.
In other words,
(If you use macros, then this happens automatically when you
This design solves a few problems.
If you package the wrong DLL, the file names will not match and you'll get an error at load time.
If you have a mix of object files, you will get a linker error because
HELPERA.LIBwon't have entries for the Unicode versions, and vice versa.
If you really needed to support the mixed version, you could link to both
HELPERW.LIB. Each object file will pull the function it needs from the appropriate import library, and will bind to the corresponding DLL at runtime.
In the future, you might decide to merge the helper libraries into a single helper library that supports both character sets. Giving the functions distinct names allows this to happen. (This is what most of Windows does. For example,
kernel32.dllcontains both ANSI and Unicode implementations of many functions, distinguished by function name.)
Moral of the story: If two functions are different, give them different names. (If you use mangled names, then the names will already be different due to different mangling.)