Porting C++ native code to managed (/clr) – Threads


In native world, one interacts with OS directly by calling Win32 APIs for managing resources (like allocating/deallocating memory,opening/closing handles). In managed world one  relies on CLR totally or at times partially for doing the same( eg GC does memory management for us).Due to lack of understanding of the way CLR does the work for us,  we may at times conclude false theories. I was recently working on a test application, which seemed to leak threads when compiled  with /clr switch . Following are the functions that were used to create threads.


unsigned int __stdcall WorkerThread(void * ptr)


{


      THREADINFO *ti             =  (THREADINFO*)ptr;


      ::Sleep(100);


      delete ti;


      return 1;


}


 


void CTest::CreateWorkers(void *ptr)


{


      int counter=0;


      while (_run)


      {


            ::Sleep(100);


           


            THREADINFO *ti = new THREADINFO;


           


            DWORD   ui;


            HANDLE threadHandle = CreateThread(NULL,4096,(LPTHREAD_START_ROUTINE )WorkerThread,(unsigned(_stdcall *)(void *))ti, CREATE_SUSPENDED,&ui);


            ti->threadID = threadHandle;


     


            ResumeThread(threadHandle);


            WaitForSingleObject(threadHandle,INFINITE);


            int a = CloseHandle(threadHandle);


           


            counter++;


      }


}


Note that it is  C++ code which was being compiled with /clr for testing purpose as a step towards porting the application to .net. This is not a recommended way to create managed threads in an application. If we run this application after compiling it with /clr it seems to leak thread handles. If an application seems to leak resources, first thing to do is to confirm that resource indeed is being leaked and to find out what resource the application is leaking.


STEP 1. Use process explorer/performance monitor  to see what type of resource application is leaking. For memory one should look at virtual bytes and private bytes counter in process monitor , for managed memory one should look at Total # bytes in GC heap.  For handles process explorer gives a good idea as to which type of handles application is leaking.  It was found that it looked like application was leaking thread handles.


STEP 2. Once it is established that application is leaking handles and we know which type of handles application is leaking. Next step is to enable stack trace for opening and closing of handles. Htrace extension command comes in handy to enable stacktrace for opening and closing of handles.Stack traces are generated by ntdll , htrace just enables it. Launch the application under windbg  and use !htrace –enable to enable stack tracing for handles.


0:002> !htrace -enable


Handle tracing enabled.


Handle tracing information snapshot successfully taken.


STEP 3. After running the application for some time , when you are sure that enough of the handles have been leaked ,  run !htrace –diff , Output of this command , is stack trace of handles that have been opened but not yet closed. In this case it showed us stack trace of  thread creation. We are creating threads using Win32 API createthread and then running managed code on them.  In order to proceed further ,since we take help of runtime to manage system resources for us , it is a good idea to have an understanding of how runtime manages resources for us, in this case , resource is threads.


There are two types of threads that CLR needs to keep track of , one which start execution  from within managed code  by calling Thread.Start , other type are threads which were created in native world but which later execute managed code . CLR needs to maintain information about either of these threads in-order to performs operations like GC, CAS security checks , etc.  In order to see a list of threads which belong to either of these categories , and some of the information that CLR maintains about them,  run the following command after loading sos.dll.


0:004> !threads


ThreadCount: 183


UnstartedThread: 0


BackgroundThread: 182


PendingThread: 0


DeadThread: 0


Hosted Runtime: no


                                      PreEmptive   GC Alloc           Lock


       ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception


   0    1 1fc6c 0028b500      6020 Enabled  02c750e8:02c75fe8 00255be0     0 STA


   2    2 21640 00294730      b220 Enabled  00000000:00000000 00255be0     0 MTA (Finalizer)


   3    3 abcc 002d01a8       220 Enabled  00000000:00000000 00255be0     0 Ukn


XXXX    4 abb0 002d1888  80010220 Enabled  00000000:00000000 00255be0     0 Ukn


XXXX    5 b7ec 002d2070  80010220 Enabled  00000000:00000000 00255be0     0 Ukn


XXXX    6 2177c 002d2880  80010220 Enabled  00000000:00000000 00255be0     0 Ukn


XXXX    7 abc0 002d30b8  80010220 Enabled  00000000:00000000 00255be0     0 Ukn


XXXX    8 ab98 002d38f0  80010220 Enabled  00000000:00000000 00255be0     0 Ukn


XXXX    9 216e8 002d4128  80010220 Enabled  00000000:00000000 00255be0     0 Ukn


XXXX    a 1fb78 002d4960  80010220 Enabled  00000000:00000000 00255be0     0 Ukn


XXXX    b 216b8 002d5198  80010220 Enabled  00000000:00000000 00255be0     0 Ukn


XXXX    c 216f8 002d59d0  80010220 Enabled  00000000:00000000 00255be0     0 Ukn


XXXX    d 216c8 002d6208  80010220 Enabled  00000000:00000000 00255be0     0 Ukn


</snip>


You may want to run !help threads in order to see what exactly all these columns indicate. ThreadObj  is the address of the data-structure , in which CLR keeps the necessary information about these threads.  PreemptiveGC  indicates whether the thread can be preempted by the any other thread on which  has to perform a garbage collection .  As a developer it is not required to understand how CLR maintains this information and it is an implementation detail that can change over time but having a fair understanding of it helps in debugging.


XXX in first column indicates that the thread is a dead thread ,ie it has been detached from its corresponding os thread . There were a lot of such dead threads in output of !threads. Note that these dead threads are cleaned up at a later time by finalizer thread of the application.


In this particular case  , GC was not yet triggered and finalizer thread was not getting invoked and hence these dead thread objects were getting accumulated giving an impression that thread handles are being leaked . In order to prevent dead threads in this test scenario we periodically called Gc.Collect. It is important to realize that GC is generational and self tuning in nature  and hence it is not a good idea to call GC.Collect in code , however at times it may be justified to call GC.Colect , this blog talks about it in detail.


In order to prevent dead threads in this test scenario we periodically called GC.Collect.  However for creating worker threads in managed applications we have following better options :


1)ThreadPool.QueueUserWorkItem.


2)CCR’s dispatcher object.


-Manish Jawa

Comments (1)

  1. Sheila says:

    Finally an explanation that makes sense.

    I am having this exact same problem with a server application which spawns native threads for each client connection. As part of the initialization it calls a very small amount of interop C++ code, but does the rest of its work in native code. When the client disconnects the thread exits.

    Since the clients don't typically connect and disconnect very often (it's too expensive) it took a long, long time for the handle count to get big enough to look out of place. However, at some point the client scripted an application that connected and disconnected once per minute. But worse, because the server didn't ever execute very much managed code, the GC never thought there was enough memory pressure to warrant any collection.  The dead threads started really building up, and the handle count kept increasing until eventually the server ran out of handles and failed catastrophically. This took weeks to happen, though.

    For a long time I was chasing what I thought were "missing" Dispose calls somewhere under the interop code, until eventually I realized that it was still leaking handles even if I commented out the interop code so that it didn't actually do anything. The simple act of crossing from native to managed code allocated 5 handles per thread.

    Sounds like forcing collection is the way to go…