CRT Startup

In my previous blog Early Debugging, we've demonstrated how early can you get using a user mode debugger.

Normally we don't want to be such early, there are some other places we would want to start with:

  • OEP (Original Entry Point) of the EXE module. WinDBG has a predefined Pseudo-Register called $exentry which makes it a lot easier, as we already mentioned previously in Data Breakpoint.
  • The startup or initialization of runtime. I've covered the managed runtime startup in Yet Another Hello World.

Now let's talk a bit about the native C/C++ Runtime. When you start writing applications using C/C++ on Windows, normally you would be using CRT already, unless you explicitly tell the linker not to use it, like what I did in A Debugging Approach to IFEO.

The CRT (C Runtime Library) comes with Windows and Visual C++ Redistributable (let's not talk about the special version which serves CLR), also you can link a static version into your EXE/DLL.

CRT provides the fundamental C++ runtime support, some obvious features are:

  • setup the C++ exception model
  • making sure the constructor of global variables get called before entering main function
  • parse command line arguments, and call the main function
  • initialize the heap
  • setup the atexit chain

Let's get to the code:

 /* crtexport.cpp */

#define WIN32_LEAN_AND_MEAN

#include <Windows.h>

class CFoobar
{
public:
  CFoobar()
  {
    OutputDebugString(TEXT("CFoobar::CFoobar()\n"));
  }
  ~CFoobar()
  {
    OutputDebugString(TEXT("CFoobar::~CFoobar()\n"));
  }
};

CFoobar g_foobar;

__declspec(dllexport)
BOOL WINAPI Foobar()
{
  return TRUE;
}

BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvContext)
{
  switch(fdwReason)
  {
  case DLL_PROCESS_ATTACH:
    OutputDebugString(TEXT("DLL_PROCESS_ATTACH\n"));
    break;
  case DLL_PROCESS_DETACH:
    OutputDebugString(TEXT("DLL_PROCESS_DETACH\n"));
    break;
  case DLL_THREAD_ATTACH:
    OutputDebugString(TEXT("DLL_THREAD_ATTACH\n"));
    break;
  case DLL_THREAD_DETACH:
    OutputDebugString(TEXT("DLL_THREAD_DETACH\n"));
    break;
  default:
    DebugBreak();
  }
  return TRUE;
}

Note: don't put DebugBreak inside DLL entry point as I do, unless you understand that the loader lock would make JIT debugger unhappy.

 /* crtimport.cpp */

#define WIN32_LEAN_AND_MEAN

#include <Windows.h>

BOOL WINAPI Foobar();

int main()
{
  Foobar();
  return 0;
}

cl.exe /LD /Zi crtexport.cpp

cl.exe /Zi crtimport.cpp crtexport.lib

Set two breakpoints, one at DllMain and one at the main function, then launch the application in Visual Studio Debugger:

Since our DLL is statically imported, the entry point of DLL is executed before the entry point of EXE.

As you might have noticed, the actual OEP is _DllMainCRTStartup. You can double click on the crtexport.dll!_DllMainCRTStartup frame and bring up the CRT startup code to start reading - on my machine the startup code is located at C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\crt\src\dllcrt0.c.

Also, by taking a look at the Output window, we can see that CFoobar::CFoobar() has already been called, which means the global object was initialized before entering our DllMain. This is of course done by the CRT initialization code in __DllMainCRTStartup, which understands the contract between compiler and runtime.

Now you understand how the constructor of global variables gets called, think about the destructor semantic:

  1. Is it possible that global variable got destructed in a different thread?
  2. What if there is an exception thrown from the global variable constructor/destructor invocation?

The actual OEP for the EXE is __tmainCRTStartup. You can double click on the crtimport.exe!__tmainCRTStartup frame and take a look at the code - on my machine the startup code is located at C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\crt\src\crt0.c.

As we mentioned in The Main Thread Problem, __tmainCRTStartup runs in the "main thread" , and would kill all the other threads before it is going to destroy the global variables. One thing to mention is that CRT makes use of _endthreadex instead of calling ExitThread directly, since _endthreadex would destruct objects constructed on the stack and free the related TLS data, while ExitThread knows nothing about the _tiddata block.

A few more questions:

  1. What if different versions of CRT are loaded into a single process?
    1. mixing debug and release version of CRT
    2. mixing static and dynamic version of CRT
    3. mixing different major version of CRT
  2. What would happen if there is an exception thrown across module boundary (e.g. from a DLL function to the caller which belongs to EXE)?
  3. Can I use CRT functions without initializing CRT?