Consequences of using variables declared __declspec(thread)


As a prerequisite, I am going to assume that you understand how TLS works, and in particular how __declspec(thread) variables work. There's a quite thorough treatise on the subject by Ken Johnson (better known as Skywing), who comments quite frequently on this site. The series starts here and continues for a total of 8 installments, ending here. That last page also has a table of contents so you can skip over the parts you already know to get to the parts you don't know.

Now that you've read Ken's articles...

No, wait I know you didn't read them and you're just skimming past it in the hopes that you will be able to fake your way through the rest of this article without having read the prerequisites. Well, okay, but don't be surprised when I get frustrated if you ask a question that is answered in the prerequisites.

Anyway, as you learned from Part 5 of Ken's series, the __declspec(thread) model, as originally envisioned, assumed that all DLLs which use the feature would be present at process startup, so that all the _tls_index values can be computed and the total sizes of each module's TLS data can be calculated before any threads get created. (Well, okay, the initial thread already got created, but that's okay; we'll set up that thread's TLS before we execute any application code.)

If you loaded a __declspec(thread)-dependent module dynamically, bad things happened. For one, TLS data was not set up for any pre-existing threads, since those threads were initialized before your module got loaded. Windows doesn't have a time machine where it can go back in time to when those threads were initialized and pre-reserve space for the TLS variables your new module needed. Nope, your module is just out of luck with respect to those pre-existing threads, and if it tries to use __declspec(thread) variables, it'll find that its TLS slot never got initialized, and there's no data there to access.

Unfortunately, there's an even worse problem, which Ken quite ably elaborates on in Part 6: The _tls_index variable inside the module arrived after the train left the station. All those TLS indices were assigned at process initialization. When it loads dynamically, the _tls_index variable just sits there, and nobody bothers to initialize it, leaving it at its default value of zero. (Too bad the compiler didn't initialize it to TLS_OUT_OF_INDEXES.) As a result, the module thinks that its TLS variables are at slot zero in the TLS array, leading to what Ken characterizes as "one of the absolute worst possible kinds of problems to debug": Two modules both think they are the rightful owners of the same data, each with a different concept of what that data is supposed to be. It'd be like if there was a bug in HeapAllocate where it returned the same pointer to two separate callers. Each caller would use the memory, cheerfully believing that the values the code writes to the memory will be there when it comes back.

What truly frightens me is that there's at least one person who considers this horrific data corruption bug a feature. webcyote calls this bug "sharing all variables between the EXE and the DLL" and complained that fixing the bug breaks programs that "depend on the old behavior". That's like saying "We found that if we use this exact pattern of memory allocations, we can trick HeapAllocate into allocating the same memory twice, so we will have our EXE allocate some memory, then perform the magic sequence of allocations, and then load the DLL, and then the DLL will call HeapAllocate to allocate some memory, and it will get the same pointer back, and now the EXE and DLL can share memory."

Whoa.

Mind you, this crazy "EXE and DLL sharing thread variables" trick is extremely fragile. You have to intentionally delay loading the DLL until after process startup. (If you load it as part of an explicit dependency, then you don't trigger the bug and the DLL gets its own set of variables as intended.) And then you have to make sure that the EXE and DLL declare exactly the same variables in exactly the same order and link the OBJ files in exactly the right sequence, so that all the offsets match. Oh, and you have to make sure your DLL is loaded only into the EXE with which it is in cahoots. If you load it into any other EXE, it will start corrupting that EXE's thread variables. (Or, if the EXE doesn't use thread variables, it'll corrupt some other random DLL's thread variables.)

If the feature had been intended to be used in this insane way, they would have been called "shared variables" instead of "thread variables". No wait, they would have been called "thread variables that sometimes end up shared under conditions outside your DLL's control."

I wonder if Webcyote also drives a manual transmission and just slams the gear stick into position without using the clutch. Yes, you can do it if you are really careful and get everything to align just right, but if you mess up, your transmission explodes and spews parts all over the road.

Don't abuse a bug in the loader. If you want shared variables, then create shared variables. Don't create per-thread variables and then intentionally trigger a bug that causes them to overlay each other by mistake. That's such a crazy idea that it probably never occurred to anyone that somebody would actually build a system that relies on it!

Exercise: A customer ran into a problem with the "inadvertently sharing variables between the EXE and the DLL" bug. Here is the message from the customer liaison:

My customer has a DLL that uses static thread local storage (__declspec(thread)), and he wants to use this DLL from his C# program. Unfortunately, he is running into the limitation when running on Windows XP that DLLs which use static thread local storage crash when they try to access their thread variables. The customer cannot modify the DLL. What do you recommend?

Update: Commenter shf gives the most complete answer.

Comments (32)
  1. Dan Bugglin says:

    I would recommend the customer secure the ability to modify the DLL, find a different [working] DLL that does what he wants, find a way to do what he wants to do in pure C#, or recode his app in C++ so he can use the DLL as intended.

  2. Owen Shepherd says:

    And if using a different DLL/modifying this one is absolutely impossible, I would suggest that make a C++ loader which links against the DLL, then loads the .net Framework…

    This is only for the /absolutely/ impossible case, however.

  3. configurator says:

    You can in fact drive a manual transmission without using the clutch. The car needs to have been thoroughly abused before by people who can't drive a manual, but eventually the clutch becomes just a convenience.

  4. configurator says:

    Also, I notice a recurring theme in your posts. "Windows doesn't have a time machine." I guess this is a major bug in Windows, because it's causing so many other problems.

    I'm sure they're working on it though.

  5. Vilx- says:

    I know this is nitpicking and missing the point, but…but… that's why we're programmers – we obsess about tiny details, right? :) OK, so with my conscience cleared:

    Actually for decades already all manual gearboxes have protections so that they don't spew parts all over the ground. You just can't put it in the gear if everything isn't aligned. That includes reverse gear while driving forwards and vice versa. Even Wikipedia writes about it: en.wikipedia.org/…/Manual_transmission

  6. Someone You Know says:

    @configurator:

    Yet another case where Microsoft sucks compared to Apple. Mac OS has had a Time Machine since 2007!

  7. SimonRev says:

    I suppose another (fragile) workaround in the case that the customer cannot use a non-bugged DLL would be to create a bunch of dummy thread local variables in his app as placeholders for the ones that the DLL will be using and just never access those.  That series of blogs tells how you would control the order they are allocated, although I am not sure how you could apply that to C#, if it is even possible.

  8. SimonRev says:

    I suppose another (fragile) workaround in the case that the customer cannot use a non-bugged DLL would be to create a bunch of dummy thread local variables in his app as placeholders for the ones that the DLL will be using and just never access those.  That series of blogs tells how you would control the order they are allocated, although I am not sure how you could apply that to C#, if it is even possible.

  9. Am I missing something says:

    (If you load it as part of an explicit dependency, then you don't trigger the bug and the DLL gets its own set of variables as intended.)

    Make the C# program load the dll on startup, instead of delay loading…

  10. Staffan Gustafsson says:

    If the customer writes a wrapper in C++/Cli, that exposes the needed methods as .net methods, the dll would be loaded as a dependency at startup.

    Wouldn't that work?

  11. Ivo says:

    Hey, wait a minute!

    It appears the __declspec(thread) problem is fixed in Vista. I didn't read Ken's article yet (it is covered in part 7, and I'm still reading part 4), but I'm fairly certain it involves a time machine that was added in Vista.

    What I don't understand is – now that you have a time machine, why can't you add the fix to XP???

    [Wouldn't that break existing Windows XP systems? Changing something this fundamental to the system in a service pack makes customers unhappy. -Raymond]
  12. Joshua says:

    I've been scolded before about any program that is not 100% correct deserves to be 0% correct.

    Unfortunately my experience is sufficient to know than anyone who depends on a bug deserves to be broken. When a bug causes problems for people who are trying to code as intended it needs to be fixed rather than adding that bug to the long list of things that don't work right because of some bug that has been preserved for backwards compatibility. This plays double for security bugs.

  13. Ivo says:

    [Wouldn't that break existing Windows XP systems? Changing something this fundamental to the system in a service pack makes customers unhappy. -Raymond]

    Won't be a problem. Use the time machine (from Vista) and fix it in the original XP. No need for a service pack. The incompatible software will not have been written. (I hope I got the tense correct. Time travel screws up the whole tense system).

  14. Gabe says:

    Ivo: Unfortunately, the time machine cannot go back in time to before the machine itself was invented. This means that XP cannot be fixed with the time machine from Vista (nor the one apparently in OS X).

  15. Steve says:

    "That's such a crazy idea that it probably never occurred to anyone that somebody would actually build a system that relies on it!"

    Perhaps your blog should be more widely read then. My first thought, verbatim, after you introduced the pathological behavior was "Oh God, someone probably depends on it".

  16. Ivo says:

    Gabe: Yes, I realize that now. Silly me.

    It would have been a bad idea anyway. If the problem got fixed in XP, the customer wouldn't have had the problem, Raymond wouldn't have written the article, I wouldn't have suggested to go fix XP, and XP probably wouldn't have been fixed. That's just a time paradox waiting to happen. Those are never fun. Side effects can range from a mild hangover to the destruction of the multiverse.

  17. acq says:

    Mentioned webyote question is on 30th page now:

    blogs.msdn.com/…/407234.aspx

    Is there really anybody who used this "feature" for anything but writing exploits? I'm not able to figure that out.

  18. Dmytro says:

    Make C++ wrapper application for this DLL. Wrapper application loads DLL in right time). Use IPC to communicate between C# app and C++ wrapper app.

    [The DLL interface may not be IPC-friendly. And redesigning the interface to an application is a lot of (risky) work. Can you think of something simpler? -Raymond]
  19. tobi says:

    I recommend the C# programm to allocate 1000 TLS slots and not use them. The DLL can then use the slots starting at index 0 als long as there are less that 1000 slots.

  20. Sam says:

    seems like you should use alternative methods instead of using __declspec(thread)

    that allow dynamic loading

    I've always thought TLS to be limited…

  21. Brooks Moses says:

    So, um, giving what I thought was the obvious answer to the exercise (but everyone else who's answered it has said other things): The solution that I'd recommend is simply to load the offending DLL at process startup.

    (Actually, I see that Dmytro was suggesting essentially the same thing.)

    [Incomplete solution. How to do specify that you want a C# app to load a DLL at startup? -Raymond]
  22. Semi Essessi says:

    What I don't understand is – now that you have a time machine, why can't you add the fix to XP???

    [Wouldn't that break existing Windows XP systems? Changing something this fundamental to the system in a service pack makes customers unhappy. -Raymond]

    No, not if you use the time machine to change it in the original XP, and any previous OS you want to retain compatibility with. :)

  23. scorpion007 says:

    The problem can be summarized as "static TLS doesn't work with explicit linking of DLLs in XP and older". Does C# support a moral equivalent of implicit linking? I.e. __declspec(dllimport)? I suppose not, otherwise that would be a trivial solution.

  24. Worf says:

    Hrm… isn;t this what those .manifest files are for? Can't he simply use that method to force-preload a DLL? The loader parses those files if present before it even bothers with executing code…

    (I admit, I didn't read it yet. It would be helpful to get this as homework the night before…)

  25. Alex Cohn says:

    IMHO, the exercise has internal self-contradiction: if they know that the crash happens when the DLL tries to access its TLS, why can't they just recompile the DLL to resolve the problem? Much more likely, the report would read as follows:

    "My customer has a DLL that works just fine, and he wants to use this DLL from his C# program. Unfortunately, he is running into the limitation when running on Windows XP that the DLL crashes quite often. The customer cannot modify the DLL. What do you recommend?"

    Isn't such scenario familiar to you?

  26. Myria says:

    I *really* do not recommend this trick, but it would actually work.  This is a disgusting, horrible hack that you should not use in production code.

    1. Hex edit the import reference to "mscoree.dll" in the C# EXE file to say some other DLL name for your hack, like "dllhack.dll".

    2. Write down the image directory information for the "COM+" descriptor in the IMAGE_OPTIONAL_HEADER32::DataDirectory.

    3. Zero the COM+ descriptor directory information in the main executable's IMAGE_OPTIONAL_HEADER32::DataDirectory[IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR].

    4. Create a DLL named dllhack.dll that statically imports the DLL that you need to load at process startup.

    5. In dllhack.dll's DllMain for DLL_PROCESS_ATTACH, use VirtualProtect and overwrite the main EXE's IMAGE_OPTIONAL_HEADER32::DataDirectory[IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR] with the original values you wrote down in step 2.

    Step 3 prevents ntdll.dll from using its special-case handling of COM+ (.NET) executables.  After step 5, the fallback mechanism in the main EXE's entry point will do the correct thing and load mscoree.dll.

  27. configurator says:

    @Worf: The manifest files aren't read until the .net framework reads them, i.e. after the process starter.

    Can't you just add a static import to the .net dll or exe after having compiled it?

  28. Medinoc says:

    I'm surprised that LoadLibrary() doesn't return an error value when encountering a DLL that uses implicit TLS.

    Of course, patching this now on Windows XP would break (already broken) programs that rely on the old behavior, but why did LoadLibrary not reject loading attempts in the first place?

  29. Anthony Wieser says:

    How to load in C#?  

    How about build a C++ Exe that statically links in the dll, and then it creates an AppDomain in the C++ program, loads in the target c# assembly and executes a named method to launch.  

    Hey presto!

  30. Shf says:

    Simplest method I can think of, others have made suggestions close to this, but not exactly:

    1. Create a C++/CLI Exe project.

    2. Statically link the unmanaged Dll to the C++/CLI project – that will cause the OS loader to initialize the Dll's TLS data.

    3. Convert the C# project with the entry point (i.e. Program.Main()) to output a DLL (not strictly necessary, but will reduce confusion of there being two Exe's)

    4. Add a reference to the C# assembly to the C++/CLI project.

    5. In the C++/CLI project's Program.Main() call the C# assemblies Program.Main() to pass control C# code.

    There a solution in one line of code (and one extra project)

    [This was the solution I had in mind (though you were much more thorough than me). The nice thing about this solution is that you don't need to change any existing binaries. -Raymond]
  31. dll hell says:

    [Wouldn't that break existing Windows XP systems? Changing something this fundamental to the system in a service pack makes customers unhappy. -Raymond]

    And that's why Vista users are unhappy!

  32. Bruno Martinez says:

    What's the correct way to get 'shared variables' as webcyote wanted?  I believe you can't export variables from an .exe in Windows.

    [Then export them from the DLL. -Raymond]

Comments are closed.