What is __wchar_t (with the leading double underscores) and why am I getting errors about it?


The Microsoft Visual C++ compiler has a compiler option called /Zc:wchar_t which lets you control what the symbol wchar_t means.

According to the C++ standard, wchar_t is a distinct native type, and that's what the Visual C++ compiler defaults to. However, you can set /Zc:wchar_t-, and that suppresses the intrinsic definition of wchar_t, allowing you to define it to whatever you like. And for Windows, this historically means

typedef unsigned short wchar_t;

because Windows predates the versions of the C and C++ standards that introduced wchar_t as a native type.

So now you have a problem if you are writing a library that will be consumed both by old-school code written with wchar_t defined as an alias for unsigned short and by new-school code written with wchar_t as an intrinsic type. What data type do you use for your string parameters?

Well, if your library uses C linkage, then you're in luck. Since the intrinsic wchar_t is a 16-bit unsigned integer in Visual C++, it is binary-compatible with unsigned short, so you can declare your function as accepting wchar_t in the header file, and each client will interpret it through their own wchar_t-colored glasses: Code that is wearing the /Zc:wchar_t glasses will see the native wchar_t Type. Code that is wearing the /Zc:wchar_t- glasses will see an unsigned short. And since C linkage is not decorated, you can export one function that accepts a wchar_t, and it will interoperate with either definition.

That works for undecorated functions, but what about languages like C++ that use decoration to encode the types of the parameters? Which decoration do you use?

Let's do both.

What you do is write two versions of your function, one that takes an unsigned short and one that takes a __wchar_t. That double-underscore version represents "The native type for wchar_t that is used by /Zc:wchar_t."

In other words, /Zc:wchar_t results in the compiler internally doing the equivalent of

typedef __wchar_t wchar_t;

It makes the symbol wchar_t an alias for the internal __wchar_t type.

So let's say you have a function called DoSomething that takes a wide string, and you want to accept clients compiled with either setting for /Zc:wchar_t.

// Something.h

bool DoSomething(const __wchar_t* s);
bool DoSomething(const unsigned short* s);

This declares two versions of the function. The first will be matched by code compiled with /Zc:wchar_t. The second will be matched by code compiled with /Zc:wchar_t-.

Your implementation goes like this:

// Something.cpp
#include <Something.h>

bool DoSomethingWorker(const wchar_t* s)
{
 ... implementation ...
}

bool DoSomething(const __wchar_t* s)
{
    return DoSomethingWorker(reinterpret_cast<const wchar_t*>(s));
}

bool DoSomething(const unsigned short* s)
{
    return DoSomethingWorker(reinterpret_cast<const wchar_t*>(s));
}

As noted earlier, callers who compile with /Zc:wchar_t will match the first version of Do­Something; callers who compile with /Zc:wchar_t- will match the second. But both of them funnel into a common implementation, which we declare with wchar_t, so that it matches the /Zc:wchar_t setting used by the library itself.

Okay, so to answer the opening question: __wchar_t is the name for the intrinsic data type for wide strings. If you compile with /Zc:wchar_t, then that's the data type that wchar_t maps to. The funny name exists so that code compiled with /Zc:wchar_t- can access it too, and so that code which wants to be /Zc:wchar_t-agnostic can explicitly refer to the internal native type.

Comments (14)
  1. Pietro Gagliardi (andlabs) says:

    …I needed to do this just yesterday. Freaky o_O

    My code, however, is intended to run on more than just Windows, so it can’t use the name wchar_t except on Windows. (It also must work on both C and C++.) Instead, my base functions use uint16_t, and my code looks like

    #ifdef __cplusplus
    } // close the extern “C”
    #if defined(_MSC_VER) && defined(_WCHAR_T_DEFINED) && defined(_NATIVE_WCHAR_T_DEFINED)
    inline T func(…, wchar_t *arg, …)
    {
    return func(…, reinterpret_cast(arg), …);
    }

    I wonder if using __wchar_t directly has any advantages, or if what I’m doing has any gotchas that I don’t know about. __wchar_t works regardless of the setting of /Zc:wchar_t (as you said), so one advantage would be only needing to check for _MSC_VER…

    1. Karellen says:

      C has had wchar_t since C90, i.e. since it was first standardised, 26 years ago. What C compiler are you using that doesn’t support wchar_t?

      1. Pietro Gagliardi (andlabs) says:

        The size of wchar_t is implementation defined; there is no guarantee that wchar_t is 16-bit unsigned, and there is no guarantee that it represents UTF-16.

        1. The Windows ABI says that Unicode is represented as UTF-16LE. How you choose to express that to the compiler of your choice is a matter between you and your compiler.

          1. Pietro Gagliardi (andlabs) says:

            And of course, as a consequence, most compilers use 16-bit integers on Windows for wchar_t, so you should be able to take a Windows program written for one compiler and compile it for another. Therefore, I can safely use this on Windows. My point was that I needed to do more than just Windows, and then the C standard rules of wchar_t kick in.

            As another example, OS X is another system that uses UTF-16 for its internal string representation in both the Core Foundation and Cocoa APIs and its wchar_t is 4 bytes long. In fact, there is no function in either API that takes wchar_t or wchar_t* values! (Their names for the UTF-16 type are UniChar and unichar, respectively.)

            My original question though still stands: is there anything wrong with the code I wrote yesterday that would make switching to Raymond’s code in the post a better option?

          2. Mike says:

            “The Windows ABI says that Unicode is represented as UTF-16LE.”
            I think we both (hopefully we all) know that’s untrue history revisionism.
            Windows Unicode is, has always been, and can never become anything but UTC-2.
            Citation needed? NTFS $UpCase.
            Maybe _some_ API’s (GDI glyps?) can handle UTF-16 properly, but the filesystem most certainly can not, and since NTFS is _integral_ to Windows (it can’t even function nowadays without it – or so MS claims), that’s the defining part.

          3. @Mike
            Windows was UCS-2 originally, yes. It converted to UTF-16LE in Windows 2000. That was about 17 years ago, it might be time to re-examine a few of your long-cherished beliefs at this point.

          4. Joshua says:

            @Joshua Bowman: Would you like some filenames with malformed UTF-16 surrogates to chew on?

        2. Karellen says:

          Ah, you’re specifically dealing with UTF-16 code units, rather than unencoded unicode code points.

          Yeah, uint16_t is a much better portable cross-platform fit for that than wchar_t. Sorry I misunderstood what you meant about not being able to use wchar_t.

        3. wchar_t is an internal text-encoding. uint16_t is a binary encoding. You don’t write text to files, you write binary bits in a specified encoding, and you convert to the system’s or your library’s native text encoding (utf8, utf16le, wchar_t, etc) upon reading.

          Passing any non-specific C type between compilers, programs, or computers should be considered harmful. All real-time interfacing must be done with binary types (which is why COM is all binary types). Only compile-time interfacing can be guaranteed.

          Meanwhile, any internal function can work just as well with wchar_t as with uint16_t even if the host defines it as uint32_t, as long as you already dealt with the correct decoding during read, instead of just reinterpreting it later. (This is why Raymond’s post specifically deals with Visual Studio, and doesn’t claim to be applicable to any other compilers. It shouldn’t even be considered gospel for all Windows compilers.)

  2. MV says:

    It seems wasteful to have two versions of the DoSomething() function, but isn’t it also the case that the compiler (or maybe it’s the linker) will notice that the two functions are bit-for-bit equivalent once they’re compiled, and fold them back together anyway?

    1. Seis says:

      If you copy and paste the implementation, VC++’s /OPT:ICF will merge the two functions. If you make one a wrapper for the other then no merging is needed of course.

    2. Yep the compiler and linker should be smart enough to optimise away the functions entirely and you get left with 1 function with three exported names in the dll

    3. Martin Bonner says:

      My approach would actually be implement DoSomething(const __wchar_t*) in the library, and then have the header contain:

      inline bool DoSomething(const short* arg) { return DoSomething(reinterpret_castarg)); }

      The compiler will optimize that inline function away to nothing. (Note that Raymond’s implementation only had one real function, and a couple of short forwarding functions, so the overhead of that is not going to be high.)

Comments are closed.

Skip to main content