Going for the facts: Who is displaying this message box?


A customer wanted to know whether Create­Process had a problem with Unicode.

A problem with it? Quite the contrary. Create­Process loves Unicode! In fact, if you call the ANSI version, it converts the strings to Unicode and then finishes the work in Unicode.

Okay, here’s the customer’s problem.

We have a custom application written in managed code. When we launch the process from unmanaged code via Create­Process, we sometimes get a bogus error message:

WARNING! The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.

The filename is well under the 260-character limit and the directory name is well under the 248-character limit. We have isolated the problem to be related to whether we put Unicode characters in the command line arguments. If the command line arguments are all ASCII, then no message appears.

In case it matters, here’s our code to launch the custom application.

STARTUP_INFO si = { sizeof(si) };
PROCESS_INFORMATION pi;
if (CreateProcess(NULL, commandLine, 0, 0, FALSE,
                   NORMAL_PRIORITY_CLASS, 0, 0,
                   &si, &pi)  ...
 // (in our case, the call succeeds)

What do we have to do to get Create­Process to accept non-ASCII characters on the command line without display an error message?

(Note that Unicode is a superset of ASCII. All ASCII characters are also Unicode characters. The customer is making the common mistake of saying Unicode when they mean Unicode-not-ASCII.)

Actually, that error message is not coming from Create­Process. It’s coming from the custom application.

We have the source code for our custom application and it does not display this message. The custom application actually receives the command line just fine (be it Unicode or not), but if there is Unicode in the command line, we get the message above.

The message box may not be coming from your code, but it’s still coming from your application. Why not hook up a debugger when the message box is up, then take a stack trace to see whose idea it was to display the message box.

The customer connected a debugger and determined that the message was coming from a third-party library that their custom application uses. Now they know whom to talk to in order to solve the problem.

Comments (19)
  1. Andrew says:

    +1 for the correct use of "whom" in the last sentence.

  2. Joshua says:

    Hmmm, and if the message box window is owned by CSRSS? (Yes I actually got that.)

    [Then the message was being put up by Windows after all. Thanks for going for the facts. -Raymond]
  3. alegr1 says:

    >Hmmm, and if the message box window is owned by CSRSS? (Yes I actually got that.)

    Then do Ctrl+Shift+Esc, and see under what account TASKMGR will start.

  4. Joshua says:

    @alegr1: My own user.

    Apparently some people forgot that if you tag a call to MessageBox with MB_SERVICE_NOTIFICATION the resulting window is owned by CSRSS despite what process called MessageBox.

  5. Joshua says:

    @Joker_vD: I asked for the ability to set UTF-8 as the application local code page and got ignored.

    It turns out there's security reasons why you don't want UTF-8 as the system code page.

    (Buffer overflow when UNICODE -> ANSI results in a string that takes up more bytes than UNICODE.)

    Can't come up with a good reason not to allow it at application level. I've seen many programs

    that would immediately take advantage of full UNICODE once set.

    I'd fix your third-party supplied DLL by placing a modified C library in the application directory if it weren't for the fact that SxS's signing blocks any good way of providing that.

  6. Zan Lynx' says:

    @Joker_vD

    That function would be better named to_ucs2, but it does seem to do that job. It should probably assert(s[i]<128).

    Windows WAS UCS2 until it changed at some point (Windows 2000?) to UTF-16. So that program may be an older program, or written by a programmer from the NT days, or is a piece of code copied from an older program.

  7. Karellen says:

    AIUI, the trouble with allowing UTF-8 as a "real" codepage is that you need MB_LEN_MAX to be >= 4 in order for users of wctomb() and friends to work. Because the users of those functions have buffers of size MB_LEN_MAX to put the resulting bytes into, as they were guaranteed that that would be enough. And you can't change MB_LEN_MAX without breaking ABI compatibility with, well, every existing binary out there that uses it. Only, when Win32 was created, there were no multibyte character sets which used more than 2 bytes per code point, so MB_LEN_MAX on Win32 is a perfectly reasonable 2, and cannot be changed. :-(

    (I think there are some other equivalent backwards-compatibility ABI constraints which are more Win32-y than libc-y, but MB_LEN_MAX is the one that sticks in my head.)

  8. Joshua says:

    Wow. No wonder I never found why it was so touchy. Any program calling wctomb is doing it wrong already. Just let it fail if it would require > 2 bytes. If you're calling wctomb you have to be prepared to handle it failing.

  9. alegr1 says:

    @Joshua:

    There was one time, when I looked at Task Manager and saw that it runs as SYSTEM. I wonder if that was caused by some topmost window owned by SYSTEM-owned process, handling Ctrl_Shift_Esc

  10. Joshua says:

    @alegr1: I'll bet that was a security bug at some point.

  11. Joker_vD says:

    I am apalled at libraries that use MessageBox or whatever to report critical errors instead of… I don't know, returning an error code from their InitLibrary function?

    Also, will we *ever* have the ability to set CP_UTF8 as the system codepage? As of right now, working with all text strings as plain std::string with UTF8-coded text and calling widen()/narrow() when you need to call a Win32 API function works amazingly well, but man, fixing third-party libraries to, say, call wfopen() instead of fopen() is time quite time consuming. Given that you even have the sources to patch.

    And I personally will never forget a program that had this to_unicode function:

    wchar_t* to_unicode(char* s) {

    if (!s) return 0;

    size_t len = strlen(s);

    char* result = malloc((len + 1) * 2);

    for (int i = 0; result && (i <= len); ++i) {

    result[2*i] = s[i]; result[2*i+1] = 0;

    }

    return result;

    }

  12. Joker_vD says:

    Don't even start with the whole family of mbtowcXxx/wctombXxx functions, because they're useless. You can't convert a UTF16-coded string to a UTF8-coded string with them because they depend on LC_CTYPE to perform conversion, and you can't set your locale to ".65001". And even if you could, the locale is a process-wide setting, so you shouldn't touch it at all UNLESS you create a separate thread to perform all locale-dependent string operations on.

    So you write "std::wstring widen(const std::string&, UINT = 65001)" and "std::string narrow(const std::wstring&, UINT = 65001)" wrappers for WideCharToMultiByte/MultiByteToWideChar and use them when you need. At least you can write and compile them only once, then link them in.

    @Zan Lynx'

    "but it does seem to do that job. It should probably assert(s[i]<128)" — no, it doesn't. That program used this function to convert usernames and passwords so that it could perform NTLM authentication. Well, guess what? The default username for the administrator account on Russian Windows is Администратор, and we use non-transliterated usernames in "surname-dot-initials" format, like иванов.нп, so that program never worked at all. It's broken. It works only for languages using basic Latin, hell, it's broken even for Latin-1, because Windows-1252 is different from ISO-8859-1, although it will work for most of the time.

  13. ender says:

    If you don't know where the message box is coming from, an easy way to find out is to run Process Explorer and drag it's target toolbar button to the window – Procexp will then highlight the process that owns the window.

  14. Joshua says:

    @Mike Dimmick: I only want to change the codepage for programs that don't know what Unicode is and so definitely don't call wctomb.

  15. Mike Dimmick says:

    @Joker_vD: The point is that you cannot set your current codepage to UTF-8 because any legacy code that *does* use wctomb to convert to 'the user's current codepage' will break, since they didn't create a big enough buffer.

    @Karellen: Visual Studio 2013 apparently defines MB_LEN_MAX to 5 – see msdn.microsoft.com/…/296az74e.aspx . Using the 'Other Versions' drop-down shows that this changed from 2 to 5 in Visual Studio 2005. That of course doesn't help programs that don't use the constant, or haven't been recompiled with a newer compiler.

    Of course it would be possible to shim broken programs, setting the code page to something else, but that wouldn't help a user of those apps because they would presumably be using a script that didn't *have* an ANSI codepage. The advice for a very long time has been 'use Unicode APIs', so the Windows team has very little interest in making UTF-8 work (even if the convention on *nix has been to make the byte-oriented APIs UTF-8-aware, and code would be more easily portable if Windows were to follow suit).

  16. morlamweb says:

    @ender: ProcExp only shows you the process that created the window.  Sometimes you have to dig deeper.  Processes, after all, don't run code; threads do, and threads could easily call out to third-party code, and who knows what that code will do.  Process Monitor would be a better choice.  Start ProcMon, reproduce the error, then use it's target toolbar button on the message box.  It'll automatically filter the events to just that process, and you can also see the active thread IDs, and the stack for each event.  Even if you don't have symbols for the Microsoft modules (which you should!), you'll be able to see the module names, and and module names that aren't MS or your code should jump out as a red flag.  Symbols make it easier, too; if you see the third-party library in the stack right below "CreateWindow" then you've got your culprit right there.

  17. Cowardly Anon Moose says:

    @Joker_vD: that to_unicode function works fine for converting ASCII to UTF-16 or UCS-2. Was it documented as only doing that?

  18. Joker_vD says:

    @Cowardly Anon Moose: …it's from the source code of a (cross-platform) proxy that can perform NTLM-authentication to an upstream proxy (usually it's Microsoft ISA). It doesn't say anywhere in the manual that user names and users' passwords must be ASCII-only, and given that Windows usernames are rarely ASCII-only in this part of the Earth, I consider this program to be hopelessly broken.

    Okay, I understand, Linux doesn't have MultiByteToWideChar, but why not use iconv? Or ICU? Why not make this function to call MultiByteToWideChar on Windows?

  19. Joshua says:

    @Joker_vD: I've used exactly that, when converting known 7 bit ASCII to UTF-16. Anyway, given that it is open source, you could fix it easily. My guess it the program is so old that iconv didn't exist. The Linux world was very late in moving to Unicode, quite possibly because the console will never support it (VGA text mode …).

Comments are closed.