FCall and GC hole - first post about Rotor

An exsample of FCall

My friend Joel Pobar had a great post to demo how to add new code to Rotor which exposes more EE(Execution Engine) internal information to managed world. This is a very good example covers both BCL and EE, and how the two parts interact with each other. As showed in this example, BCL code could call into EE by a special type of method called “FCall“, like this:

FCIMPL1(MethodBody *, COMMember::GetMethodBody, MethodDesc **ppMethod)
{
MethodDesc* pMethod = *ppMethod;
METHODBODYREF MethodBodyObj = NULL;
HELPER_METHOD_FRAME_BEGIN_RET_0();
GCPROTECT_BEGIN(MethodBodyObj);
TypeHandle thMethodBody(g_Mscorlib.FetchClass(CLASS__METHOD_BODY));
MethodBodyObj = (METHODBODYREF)AllocateObject(thMethodBody.GetMethodTable());
Module* pModule = pMethod->GetModule();
COR_ILMETHOD_DECODER MethodILHeader(pMethod->GetILHeader(), pModule->GetMDImport(), TRUE);
MethodBodyObj->maxStackSize = MethodILHeader.GetMaxStack();
GCPROTECT_END();
HELPER_METHOD_POLL();
HELPER_METHOD_FRAME_END();
return (MethodBody*)OBJECTREFToObject(MethodBodyObj);
}
FCIMPLEND

As I said before I'd try to explain some Rotor code in my blog, so let me start by analyzing a small problem in that piece of code. I won't call it a bug because it happens to be harmless here. But it does violate some CLR coding rules.

Some basic bricks of an FCall

First let's take a glance of those amazing macros:

  • FCIMPL1/FCIMPLEND: defined in vm/fcall.h. They just tweak calling convention between managed code (BCL) and unmanaged code (EE). You can see different flavors of FCIMPL, which serve calls with different argument numbers or types (to match the argument passing and enregistering rules in managed world).

  • HELPER_METHOD_FRAME_BEGIN_RET_0/HELPER_METHOD_FRAME_END: defined in vm/fcall.h. For operations like GC (Garbage Collection) and EH (Exception Handling) to work correctly, some frames have to be set up in the stack, especially at the boundary between managed part and unmanaged part. Frames are a topic could take a blog entry itself, so I won't cover it too much here. What you need to know now is that for performance reason, an FCall doesn't set up a frame by default. So when an FCall wants to throw an exception or allow a GC to happen, it has to set up a HelperMethodFrame (vm/frames.h) first. This job is done by HELPER_METHOD_FRAME_BEGIN* macro.HELPER_METHOD_FRAME_END is to tear down the frame from stack. All GC or exception throwing has to happen in the range guarded by this frame. In the code above, some operations could trigger a GC (at least AllocateObject could do so, not sure about others. It would be a hard job to trace into each code path to find out whether a GC could happen), so a HelperMethodFrame has to be established before that call.  

  • GCPROTECT_BEGIN/GCPROTECT_END: defined in vm/frames.h. Similar to HELPER_METHOD_FRAME_BEGIN*, GCPROTECT_BEGIN is used to set up a GCFrame (vm/frames.h) and GCPROTECT_END pop the frame out. When a GC happens, it needs to find out all object references in stack to trace which objects in the managed heap are still alive, and when it moves the object (to compact the heap) it needs to update the references in stack with the new location of the objects. For managed code, JIT generates all information needed by GC. But for unmanaged part of CLR, the code whoever has references to managed objects is responsible to report all references itself. A GCFrame serves for this purpose. If an unmanaged method pushes a GCFrame to stack, the frame will report the protected reference (the argument to GCPROTECT_BEGIN) during GC. In our example, MethodBodyObj is an object reference so we set up a GCFrame for it.

  • HELPER_METHOD_POLL: defined in vm/fcall.h. This macro is meant to do a GC poll in range of a HelperMethodFrame. GC poll is another complicated thing I don't want to talk here. Basically it allows GC to happen in another thread, without a poll another thread that wants to perform a GC might be blocked, thus all managed threads in the application will be blocked.

  • OBJECTREFToObject: defined in vm/vars.hpp. It's used to take a pure object pointer out from an ObjectRef. An ObjectRef is a naked pointer in free build, but a wrapper with some very useful checking in debug build.

The problem

1. GC hole. In COMMember::GetMethodBody, a HELPER_METHOD_POLL is put after GCPROTECT_END. As I explained above, GCPROTECT_END will pop up the GC frame which is protecting the object reference, but HELPER_METHOD_POLL allows a GC to happen in another thread. So there are chances (although very small) that after the GC frame is popped up, another thread performs a GC. In such a GC, MethodBodyObj won't be reported. So GC might not know the object referenced by MethodBodyObj is still alive (if there's no other reference to the object) and collect it; or (if there are other references) GC might move the object but not update MethodBodyObj with the new address thus MethodBodyObj would hold a “stale” object pointer. Either case, COMMember::GetMethodBody will return a bogus object and the program might crash later in an unexpected way. We call this kind of errors “GC holes” in CLR. They are hard to detect because GC is non-deterministic. The funny thing is that nothing bad would happen in this method because HELPER_METHOD_POLL is actually defined as a no-op in this version:

// This is the fastest way to do a GC poll if you have already erected a HelperMethodFrame
// #define HELPER_METHOD_POLL() { __helperframe.Poll(); INDEBUG(__fCallCheck.SetDidPoll()); }

#define HELPER_METHOD_POLL() { }

I don't know why we use an empty macro for HELPER_METHOD_POLL but I'm sure it's supposed to be the version which is commented out. In later versions we may uncomment the above line to make HELPER_METHOD_POLL take effect. So although this FCall doesn't cause any trouble for now, it might later. The corrected version should be:

HELPER_METHOD_FRAME_BEGIN_RET_0();
GCPROTECT_BEGIN(MethodBodyObj);
...

HELPER_METHOD_POLL();
GCPROTECT_END();
HELPER_METHOD_FRAME_END();

2. If you dig deep into the code of  HELPER_METHOD_FRAME_BEGIN*, you will find that those macros do a GC poll themselves. So unless the FCall does some very time consuming work, there's no need for another poll. Thus a refined version of our sample would be:

HELPER_METHOD_FRAME_BEGIN_RET_0();
GCPROTECT_BEGIN(MethodBodyObj);
...

GCPROTECT_END();
HELPER_METHOD_FRAME_END();

3. Because setting up HelperMethodFrame usually means the code wants to allow GC, for convenience we have versions of HELPER_METHOD_FRAME_BEGIN* to protect object references. Then a GCFrame is not needed. So the FCall could be written this way:

HELPER_METHOD_FRAME_BEGIN_RET_1(MethodBodyObj);

...

HELPER_METHOD_FRAME_END();

I hope you already got a taste how FCall works in CLR by reading this blog. Actually most thing I talked here can be found in comments at the beginning of vm/fcall.h. And if you want to see more examples of FCall, just search FCIMPL in vm directory, you will get plenty of them. Then you will see how CLR build a beautiful object-oriented world by the old fashion and kinda dirty way.

 This posting is provided "AS IS" with no warranties, and confers no rights.