ASP.NET Tip: How to avoid creating a GC Hole

There are only a few things that can make a .NET process crash.  The most common one is an Unhandled Exception getting raised.  Another way that is can happen is by creating a GC Hole.

What is a GC Hole

So first a little background on what I mean by a GC Hole.  A GC Hole is any corruption that happens inside of the managed heaps.  Under normal circumstances, this cannot happen as you don’t have pointers that reference objects in the heap so you can’t corrupt them.  This corruption is generally seen by the GC (Garbage Collector) when it is trying to compact the heaps and release objects no longer referenced.

How does a GC Hole get created

So if we don’t have access to pointers (not counting managed C++ and that is a whole different conversation), how can this occur?  Well the most common way is by making a native call (P/Invoke) that returns data.  There are actually two things that can happen during this process to cause a GC Hole.

The first is a buffer overrun.  You can imagine passing a byte array to a native function and having that array be 200 bytes.  If the native function returns 400 bytes worth of data, it will write past the end of the object and corrupt the next object in the managed heap.

The second happens if the object being passed to a native function isn’t pinned first.

Pinning is a whole topic on it’s own.  There are lots of problems that can happen from pinning object in memory, a great deal of information can be found here on the most common problem you can have while pinning.

Back to the GC Hole issue.  If the following events occur, it will cause a GC Hole:

  1. Allocate a byte array to be passed to a native call, do not pin it
  2. Make the native call passing the byte array
  3. While the native call is processing the request, a Garbage Collection occurs
  4. Native call returns

So what is the problem with the GC running before the native call returns?  Well, the GC has the ability to compact the heap (move object around so they are next to each other) and if it moves our byte array to a new location, the native call will still have the old location as to where to write it’s data to.  So when it returns, it will write the result into memory where the byte array used to be, thus corrupting anything in that memory location currently.

For example, if we had this when making the native call:

_______________________________________
|             |             |                  |            |             |
|     int    |    int     |      Free     |   byte  |   Free   |
|             |             |                  |            |             |
_______________________________________

Then after the GC runs, it will compact the ints and byte next to each other like:

_______________________________________
|             |             |            |                               |
|     int    |    int     |   byte  |       Free                 |
|             |             |            |                               |
_______________________________________

Now assume objects get created inside that free block, we would then have something like:

_______________________________________
|             |             |           |           |           |         |
|     int    |    int     |  byte |  String | String | Free |
|             |             |          |            |           |         |
_______________________________________

Then when the native calls returns, it will write to the original location of the byte array and thus corrupt the two strings.

How to find a GC Hole and fix it

This is something I will address shortly in another blog post.  We will look into troubleshooting a GC Hole and see what we can do to find the problem.

 

I hope that this post will help developers understand the areas where they need to be careful and help to keep everyone from creating them in the first place.  For information on how to pin an object correctly, take a look at GCHandles, Boxing and Heap Corruption.

More great information can be found at, Asynchronous operations, pinning