A shell extension is a guest in someone else’s house; don’t go changing the code page


A customer reported a problem with their shell extension:

We want to format a floating point number according to the user's default locale. We do this by calling snprintf to convert the value from floating point to text with a period (U+002E) as the decimal separator, then using Get­Number­Format to apply the user's preferred grouping character, decimal separator, etc. We found, however, that if the user is running in (say) German, we find that sometimes (but not always) the snprintf function follows the German locale and uses a comma (U+002C) as the decimal separator with no thousands separator. This format prevents the Get­Number­Format function from working, since it requires the decimal separator to be U+002E. What is the recommended way of formatting a floating point number according to the user's locale?

The recommended way of formatting a floating point number according to the user's locale is indeed to use a function like snprintf to convert it to text with U+002E as the decimal separator (and other criteria), then use Get­Number­Format to apply the user's locale preferences.

The snprintf function follows the C/C++ runtime locale to determine how the floating point number should be converted, and the default C runtime locale is the so-called "C" locale which indeed uses U+002E as the decimal separator. Since you're getting U+002C as the decimal separator, somebody must have called set­locale to change the locale from "C" to a German locale, most likely by passing "" as the locale, which means "follow the locale of the environment."

Our shell extension is running in Explorer. Under what conditions will Explorer call set­locale(LC_NUMERIC, "")? What should we do if the locale is not "C"?

As it happens, Explorer never calls set­locale. It leaves the locale set to the default value of "C". Therefore, the call to snprintf should have generated a string with U+002E as the decimal separator. Determining who was calling set­locale was tricky since the problem was intermittent, but after a lot of work, we found the culprit: some other shell extension loaded before the customer's shell extension and decided to change the carpet by calling set­locale(LC_ALL, "") in its DLL_PROCESS_ATTACH, presumably so that its calls to snprintf would follow the environment locale. What made catching the miscreant more difficult was that the rogue shell extension didn't restore the locale when it was unloaded (not that that would have been the correct thing to do either), so by the time the bad locale was detected, the culprit was long gone!

That other DLL used a global setting to solve a local problem. Given the problem "How do I get my calls to snprintf to use the German locale settings?" they decided to change all calls to snprintf to use the German locale settings, even the calls that didn't originate from the DLL itself. What if the program hosting the shell extension had done a set­locale(LC_ALL, "French")? Tough noogies; the rogue DLL just screwed up the host program, which wants to use French locale settings but is now being forced to use German ones. The program probably won't notice that somebody secretly replaced its coffee with Folgers Crystals. It'll be a client who notices that the results are not formatted correctly. The developers of the host program, of course, won't be able to reproduce the problem in their labs, since they don't have the rogue shell extension, and the problem will be classified as "unsolved."

What both the rogue shell extension and the original customer's shell extension should be using is the _l variety of string formatting functions (in this case _snprintf_l, although _snprintf_s_l is probably better). The _l variety lets you pass an explicit locale which will be used to format that particular string. (You create one of these _locale_t objects by calling _create_locale with the same parameters you would have passed to set­locale.) Using the _l technique solves two problems:

  1. It lets you apply a local solution to a local problem. The locale you specify applies only to the specific call; the process's default locale remains unchanged.
  2. It allows you to ensure that you get the locale you want even if the host process has set a different locale.

If either the customer's DLL or the rogue DLL had followed this principle of not using a global setting to solve a local problem, the conflict would not have arisen.

Comments (26)
  1. Anonymous says:

    The problem is that these global settings were from a time before threads, where they made more sense. In the current world with threads, they only make sense if you are sure your program will always have a single thread (not possible on Windows because some things like to inject threads into all processes), or if you only call them on your process startup, before you create other threads (again not possible on Windows because some things manage to inject threads into your process even before it starts up).

    On Unix (where setlocale came from) it makes more sense, since a lot of processes are still single-threaded and people do not run around injecting threads into other people's processes.

    [Even without threads, one component shouldn't run around changing global settings unless it knows that the rest of the process is okay with it. Imagine if calling dbopen() changed your locale. -Raymond]
  2. Anonymous says:

    (not possible on Windows because some things like to inject threads into all processes)

    BLODA. Do not inject into any processes you have not tested against.

  3. Anonymous says:

    BSD, as well as Mac OS X, which is based on BSD, also have the _l variants of many functions, but oddly other POSIX/*nix systems do not; they're not part of any major standard.

  4. jader3rd says:

    If there was a pattern for plugins, which would prevent plugins from changing settings for the executable, that would help avoid these situations.

    Of course that does break scenarioes where load the plugin with the explicit purpose of changing settings.

    [But how would you know that the settings change request was coming from the plugin rather than the main program? You can't trust the return address. -Raymond]
  5. Anonymous says:

    I had run into this exact problem when loading the system PDH DLL on NT 4.0 and Win2K (support.microsoft.com/…/884536). Not sure if later Windows version also exhibit this but I hope MS folks are also reading this blog!

  6. Anonymous says:

    It sounds like what's needed is a way for a process to say "I don't expect any global settings changes from here on", so that Explorer can indicate that it's done initialising, and any further global state change is a bug in an extension. You then need suitable magic to break into a debugger if a global state change is attempted, so that buggy extensions can be caught and their developers re-educated.

    Of course, that's bound to be a lot harder than it sounds, and cause other nightmares; I'm not sure how I'd justify it out of the minus 100 points area. For starters, I'm not sure I could list all the places that change global state like that.

  7. Anonymous says:

    "…developers re-educated."

    That sounds violent!

    :-)

  8. jader3rd says:

    "But how would you know that the settings change request was coming from the plugin rather than the main program?"

    From today's architecture I don't think that you can. I don't have a solution for this, I just wish there was one.

  9. Anonymous says:

    @Simon Farnsworth: easy. When done initializing these things, patch the functions in question so the first byte is int 3. Extension calls function, extension gets debugger breakpoint instead.

  10. Anonymous says:

    I immediately thought of the Chris Farley SNL spoof of the Folgers coffee commercial.

    http://www.hulu.com/…/saturday-night-live-schillervision-hidden-camera

  11. Anonymous says:

    Joshua:

    If I remember correctly, each executable module has its own import table and it uses that to look up the routines from the CRT, so the plugin would use a different import table. So how could explorer patch the functions from an unknown DLL. Also the DLL could change the locale in the DllMain function, and this is executed long before explorer could get control and patch the it. Also, what happens if the plugin was linked using the static CRT?

  12. Anonymous says:

    @Crescens2k: forgive me if I am wrong (I am more used to Unix which usually has a single C runtime per process), but isn't setlocale() and friends part of the C runtime? So a plugin linked with a static CRT would have its own separate view of the locale. The same for mixing CRTs (msvcrt.dll msvcr70.dll msvcr71.dll msvcr80.dll…), each one would have its own separate view of the locale.

    Or does setlocale() forward to something in kernel32.dll or other shared DLL used by all C runtimes?

  13. Anonymous says:

    If I remember correctly, each executable module has its own import table and it uses that to look up the routines from the CRT, so the plugin would use a different import table.

    You remember correctly. That is why I patch the first byte as returned by GetProcAddress().

    Also the DLL could change the locale in the DllMain function, and this is executed long before explorer could get control and patch the it.

    Shell plugins are loaded dynamically.

    Also, what happens if the plugin was linked using the static CRT?

    Fine. Either the static CRT has its own copy of the locale in which case we don't care or there's a lower down API we can patch and therefore do.

  14. Anonymous says:

    I think that all API calls that can change global system behaviour must be passed along a token that can identify the user really want to do that (just like another "Run As Administrator" one.) The executables calling DLL function need to pass the token to them. Make something like UAC setting to make the user decide whether to trust any legacy programs that doesn't implement token passing to change settings.

    Now you need not trust the return address to block rouge applications to change global system behaviour.

  15. Anonymous says:

    @immibis: My approach does not require a time machine. Just modify the origional API calls to accept and process with token with new names (list "*_withtoken"), then create another set of API with original API name as alias to the newly created function, with token = null.

    Now change the next platform SDK's header to include appropiate function headers. Of course making new programs call the new APIs takes work, but you are targeting new Windows version anyway and have to accept the change if you really want the behaviour change to be global. Afterall, you can always use the old function signature to let user decide whether to trust the things you done, so not really it'd be breaking things.

    Yet that'd be lots of (somehow unnecessary) works involved indeed.

    [You forget that we're talking about plug-ins here. Once the host program recompiles with the new APIs, all old-style plug-ins stop working, since they were written before the _withtoken API was invented. They didn't get a chance to opt in/out since they're just a plug-in. -Raymond]
  16. Anonymous says:

    "I think that all API calls that can change global system behaviour" – But it doesn't… oh, you saw "global setting" and stopped reading. The locale is "global" (at least, within a CRT dll) in the sense that it applies to the whole process [rather than just the shell extension], not the system or even the session.

  17. Anonymous says:

    Cheong, your idea is to pop up a dialog saying "Something inside of Explorer wants to change the locale. Do you want to let this happen?"  Users are going to go "Buh?" and click a random button.

    Worse, what does Explorer do if you click No?  Unload the extension that called setlocale?  The user's never going to connect the two events.

  18. "[But how would you know that the settings change request was coming from the plugin rather than the main program? You can't trust the return address. -Raymond]"

    Forgive me if I'm wrong, but it seems like the reason for changing the carpet is usually ignorance… In that case, any roadblock like that would *hopefully* make the developer(s) notice they're doing something wrong. The same argument applies to Cheong's idea (passing a token around).

    Of course, you sadly can't implement either, since you don't have a time machine.

    [Wishful thinking. Most roadblocks just make people frustrated that they hit another roadblock. (See: SetActiveWindow -> SetForegroundWindow.) All that'll happen is that people will do "#define setlocale(c, l) setlocalewithtoken(c, l, GetTokenForSettingsChange())" -Raymond]
  19. Anonymous says:

    @Joshua

    You've brushed over what (in my opinion) is the hardest bit of implementing such a function; identifying all the places you need to patch. It's trivial, given such a list, to patch all of them, such that they trap into the debugger when they would otherwise set shared state, but producing the list and identifying all the corner cases is hard. To take just one example, setlocale() should only trap if it's called with a non-NULL second argument; if the second argument is NULL, it's a query, and that might be legitimate.

    Your proposed patch mechanism therefore fails at the first hurdle; you can't just patch the first byte to 0xCC, as that doesn't check that the second argument passed to setlocale is non-NULL. You need something more sophisticated, which calls INT 3 if (and only if) the call tries to change the state.

  20. Anonymous says:

    @Simon Farnsworth: I didn't read the manual before I posted it. Anyway, I have written stuff that acts only if the patched function has a particular argument before. It's not that hard.

  21. Anonymous says:

    @Joshua

    You're still ignoring the hard bit; we don't have a full list of all places where global state is changed. If we knew exactly what to patch, and under what conditions we needed to break to the debugger, we'd have a trivial problem.

    You keep restating that if we ignore the hard part of the problem, it becomes simple. Well, duh. Of course the problem's trivial if we ignore the hard part. Restating it in terms that you might find easier to deal with:

    "We wish to patch all functions that can change global state, such that when they are entered with appropriate preconditions to trigger a change of global state, the code enters the debugger instead of changing global state".

    You've managed the "patch" bit. But you have no idea what the "all functions" bit is, nor what the "appropriate preconditions" are; it's not impossible that (for example) setlocale() is implemented as an inline function unconditionally changing a global variable, and you need to use heavy-duty techniques to catch anything writing to that location, compare the value they're writing to the value already in place, and if it's a change, break into the debugger. If that is the case for any of the functions we're talking about, it's possible that the runtime cost of this feature is sufficiently high that it's not worth implementing.

  22. Anonymous says:

    Joshua:

    You remember correctly. That is why I patch the first byte as returned by GetProcAddress().

    Hmm I'm not quite sure what you are wanting to patch here? What GetProcAddress returns is the pointer to a function, so it would be pointing at code in the code segment. This could be in a page which is read only so changing it would fail. Also, what happens if you overwrite an instruction like a stack push for the ebp/rbp, or part of the sub instruction it uses to allocate stack space?

    Shell plugins are loaded dynamically.

    Yes, and LoadLibrary doesn't return until after DllMain has returned, so no matter what you try it has already executed by the time you get a handle to it. It is in the LoadLibrary documentation.

    "If the specified module is a DLL that is not already loaded for the calling process, the system calls the DLL's DllMain function with the DLL_PROCESS_ATTACH value. If DllMain returns TRUE, LoadLibrary returns a handle to the module. If DllMain returns FALSE, the system unloads the DLL from the process address space and LoadLibrary returns NULL."

    Since CoLoadLibrary is documented to be equivalent to LoadLibraryEx, and since this is documented to be the same as LoadLibrary when just normally loading a DLL, then these functions will only return if there is no DllMain entry point or DllMain has run and returned TRUE.

  23. Anonymous says:

    Made one mistake in my last post. In the last paragraph I missed out some words and made it imply that LoadLibrary etc. don't return unless the DllMain executes successfully.

    It should have been

    "then these functions will only return a valid handle if there is no DllMain entry point or DllMain has run and returned TRUE."

    One more thing I meant to point out but forgot too. Explorer is linked against the system CRT, which is msvcrt.dll, extensions are written usually using VS, and they would thus link against msvcrxx.dll, so how could explorer use GetProcAddress to get setlocale and patch it so that plugins would be affected. Explorer only knows about msvcrt.dll after all.

  24. Anonymous says:

    Ok, so just add an checkbox option "Prevent and notify known buggy behaviour of shell plugins" to Explorer so that if any encounter any of the API with *_withtoken versions, suspend the loading of that plugin and wait until other plugin's loaded.

    The shell will show a icon in the notification area acknowledge user what happened. And when user click on it, it'll show a list of plugin that it had detected known buggy behaviour and ask you whether to disable them (much like what you see when starting IE9 with slow plugin). Shell will continue to load the plugins if the user select "Don't disable".

    [Explorer loads plugins on demand, not at startup like IE does. So you right-click on a .foo file, the FOO context menu plugin does something bad, and then… what exactly? (Remember, plugins also execute in no-UI scenarios, such as namespace extensions, so there's no guarantee that there's a human being around to answer any questions.) Oh, and as Crescens2k noted, the "bad API" isn't even something Explorer knows about. It's in a DLL that doesn't come with Windows. So Explorer wouldn't know about these "bad APIs" in order to patch them. -Raymond]
  25. Anonymous says:

    Figures. I only link to msvcrt.dll so I had a really dumb idea.

  26. Anonymous says:

    A command prompt application needs to use the standard VGA palette, as used by such apps since DOS-ages.

    In windows the cmd prompt uses some other other palette with very ugly results.

    Currently, the palette set by the app isn't applied permanently and user palette is restored when new prompt is opened. Earlier the palette was applied permanently. I have not figured out what the logic of whether the palette is permanently saved. There were some refactoring of the code between current and earlier but nothing stands out.

    In any case, either change the cmd to use VGA palette or face the risk of losing your cmd palette if you use my application. I believe in time this app will be the most popular cmd app, as it's 1:1 repro of the most popular DOS app which stopped working in 64 bit windows.

    [I don't understand. Cmd.exe does not change the palette; it uses whatever palette the console system gives it. Try it: use the Properties dialog to edit the color palette, then launch cmd.exe. Observe that the custom color palette remains intact. And what this has to do with shell extensions I have no idea. -Raymond]

Comments are closed.