Does the CLR really call CoInitializeEx on the first call to unmanaged code, even if you don't deal with COM at all and are just calling native code via p/invoke?


Some time ago, I called out this part of the documentation regarding managed and unmanaged threading:

On the first call to unmanaged code, the runtime calls Co­Initialize­Ex to initialize the COM apartment as either an MTA or an STA apartment. You can control the type of apartment created by setting the System.Threading.ApartmentState property on the thread to MTA, STA, or Unknown.

Commenter T asks, "Does it do this even if you don't deal with COM at all and call native code through a P/Invoke?"

Well, the documentation says it does, and we can confirm with an experiment:

using System.Runtime.InteropServices;

class Program
{
 public static void Main()
 {
  var thread = new System.Threading.Thread(
    () => {
   System.Console.WriteLine("about to p/invoke");
   GetTickCount();
  });
  thread.Start();
  thread.Join();
 }

 [DllImport("kernel32.dll")]
 extern static uint GetTickCount();
}

Run this program with a breakpoint on Co­InitializeEx.

First breakpoint is hit with this stack:

rax=00007ffebc529b70 rbx=00000000007c6100 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000001 rdi=0000000000000002
rip=00007ffebc529b70 rsp=000000000056f038 rbp=000000000056f0b0
 r8=0000000000000001  r9=0000000000000000 r10=0000000000000000
r11=0000000000000037 r12=0000000000004000 r13=0000000000000001
r14=0000000000000001 r15=0000000000000001

combase!CoInitializeEx
clr!Thread::SetApartment
clr!SystemDomain::SetThreadAptState
clr!SystemDomain::ExecuteMainMethod
clr!ExecuteEXE
clr!_CorExeMainInternal
clr!CorExeMain
mscoreei!CorExeMain
MSCOREE!CorExeMain_Exported
KERNEL32!BaseThreadInitThunk
ntdll!RtlUserThreadStart

This call is initializing the main thread of the process. The flags passed to this first call to Co­Initialize­Ex are 0, which means that the default threading model of COINIT_MULTI­THREADED is used.

The next time the breakpoint hits is with this stack:

rax=00000000ffffffff rbx=00000000007d1180 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000001 rdi=00000000007d1180
rip=00007ffebc529b70 rsp=000000001a6af9a8 rbp=000000001a6afa20
 r8=000000001a6af948  r9=0000000000000000 r10=00000000007f0340
r11=00000000007f0328 r12=0000000000004000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000

combase!CoInitializeEx
clr!Thread::SetApartment
clr!Thread::DoExtraWorkForFinalizer
clr!WKS::GCHeap::FinalizerThreadWorker
clr!ManagedThreadBase_DispatchInner
clr!ManagedThreadBase_DispatchMiddle
clr!ManagedThreadBase_DispatchOuter
clr!WKS::GCHeap::FinalizerThreadStart
clr!Thread::intermediateThreadProc
KERNEL32!BaseThreadInitThunk
ntdll!RtlUserThreadStart

From the name Finalizer­Thread­Start, this is clearly the finalizer thread.¹

Next.

rax=00000000ffffffff rbx=000000000039eb20 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000001 rdi=0000000000000000
rip=00007ffebc529b70 rsp=000000001a5af3d8 rbp=000000001a5af450
 r8=0000000000000000  r9=000000001a5af3f0 r10=0000000000000000
r11=0000000000000286 r12=0000000000004000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000

combase!CoInitializeEx
clr!Thread::SetApartment
clr!Thread::PrepareApartmentAndContext
clr!Thread::HasStarted
clr!ThreadNative::KickOffThread
clr!Thread::intermediateThreadProc
KERNEL32!BaseThreadInitThunk
ntdll!RtlUserThreadStart

Okay, this looks like it's kicking off a new thread. I inferred this from the presence on the stack of the function which is deviously named Kick­Off­Thread.

And the flags passed to this call to Co­Initialize­Ex are 0, which once again means that it defaults to MTA.

There, we have confirmed experimentally that, at least in this case, the implementation matches the documentation.

That the implementation behaves this way is not surprising. After all, the CLR does not have insight into the Get­Tick­Count function. It does not know a priori whether that function will create any COM objects. After all, we could have been p/invoking to SHGet­Desktop­Folder, which does use COM. Given that the CLR cannot tell whether a native function is going to use COM or not, it has to initialize COM just in case.

¹ Or somebody who is trying to mislead us into thinking that it is the finalizer thread. I tend to discount this theory because as a general rule, code is not intentionally written to be impossible to understand.

Comments (13)
  1. Zhila says:

    Yay! CLR Week! (Or at least a CLR day.)

  2. Boris says:

    A CLR Week would be pointless now that Raymond has begun using more and more C# examples in his other blog posts. Unless a C# example would only work on the Windows Runtime, every week is potentially a CLR week.

  3. kantos says:

    I cheated... github.com/.../search nice thing about open source... we can just go look now instead of having to badger Raymond

  4. Jason says:

    GetTickCount is a red herring. The first unmanaged call is in the internals of System.Console.WriteLine, not GetTickCount.

    The mscorlib classes use P/Invoke internally for almost everything involving the system.

  5. Roger says:

    RE: "code is not intentionally written to be impossible to understand"

    Serious question that I've been too afraid to ask for too long... what about the c++ STL?  Why hasn't new code that's been added to the STL followed more modern naming practices?  Why hasn't the older code been changed after re-work for c++11/14 anyhow?

    As it stands, it still looks like a bunch of academics sat around at a table 25 years ago and decided to code things up on compilers that only support 6 character names while being purposely obtuse and over protective of naming conflicts.

    Or does this fall outside of the "generally" bucket?

    [Yeah, that code is nuts. Part of it is that private identifiers must begin with an underscore and capital letter in order to avoid conflicting with anything defined by the application. Though I have to admit, after staring at it for a while, it actually becomes almost readable. -Raymond]
  6. vs says:

    Hello Raymond,

    sorry for the OFFTOPIC question, but the Suggestions Box 4 comments are long time closed.

    Would you be willing to explain the reasons behind MSI GUID Compression?

    http://www.symantec.com/.../brief-note-installer-guids

  7. kantos says:

    @Rodger The standard library code must be done extremely defensively. It will be (ab)used everywhere and tested to the limit. Common (stupid) things developers do that are allowed by the standard are overloading operator, and operator&. Or defining their own std namespace with conflicting names and then doing a using namespace std;

    Perhaps Raymond can get STL (Stephan T. Lavavej) to do a guest post about it.

    [Also, the standard library code has to be careful never to use an identifer that an app may have #define'd, which is why all the internal identifiers are horrible-looking. -Raymond]
  8. Mark says:

    vs:

    The GUID compression isn't the weirdo, GUIDs are:

     stackoverflow.com/.../37923

    The word DWORD 0xD0F23C3F is actually stored as the byte sequence 3F 3C F2 D0, so that's how the "compressed" GUID starts. The only thing compressed about it is removing the hyphens and braces.

  9. Matt says:

    @vs:

    > Would you be willing to explain the reasons behind MSI GUID Compression?

    "GUID compression" is pretty simple. A GUID is defined as a DWORD-WORD-WORD-WORD-WORD-BYTE-BYTE-BYTE-BYTE-BYTE-BYTE. In LittleEndian this means that the GUID "D0F23C3F-CA74-460F-9ADB-49CBD57F9688" is stored by the byte sequence 3D3CF2D074CA0F46DB9A49CBD57F9688, which is the format the MSI installer uses.

    Tl;dr: "GUID compression" within installers is just the little-endian bytewise representation of a GUID, expressed in hexadecimal form.

  10. Matt says:

    > That the implementation behaves this way is not surprising. After all, the CLR does not have insight into the Get­Tick­Count function. It does not know a priori whether that function will create any COM objects. After all, we could have been p/invoking to SHGet­Desktop­Folder, which does use COM. Given that the CLR cannot tell whether a native function is going to use COM or not, it has to initialize COM just in case.

    Wouldn't a better solution have been to say if you want to PInvoke SHGetDesktopFolder, you need to PInvoke CoInitializeEx first? If we can expect programmers in the Win32 context to call CoInitializeEx before they call SHGetDesktopFolder, why can't the CLR team expect C# programmers to PInvoke CoInitalizeEx before they PInvoke SHGetDesktopFolder?

  11. Medinoc says:

    @Matt: Except that according to the Symantec blog post, it swaps not just bytes, but nibbles as well.

  12. Random832 says:

    > Given that the CLR cannot tell whether a native function is going to use COM or not, it has to initialize COM just in case.

    These functions' native callers are smart enough to call CoInitializeEx if they need to. There are a million unsafe things you can do if you misuse P/Invoke - why does this one in particular have training wheels on?

    [The CLR already has to manage its own use of COM. So you're saying that there should be two different people both trying to manage COM? What if they disagree? (E.g., you p/invoke to CoInitializeEx(MTA), but the thread is marked [STA].) -Raymond]
  13. Mark says:

    Medinoc: you're right, I forgot that step. It actually swaps the nibbles of every byte, and as it doesn't swap the first two bytes of the second half, I'm guessing it's not just reversing strings). Here's an example from my machine:

    GUID {E5B21F11-6933-4E0B-A25C-7963E3C07D11}

    Bytes 11 1F B2 E5 33 69 0B 4E A2 5C 79 63 E3 C0 7D 11

    Product 11F12B5E3396B0E42AC597363E0CD711

Comments are closed.

Skip to main content