NTDebugging Puzzler 0x00000007: Interlocked functions


Today, we will have some fun with interlocked functions.


The following section of code is reentrant.  A “well meaning” developer used interlocked functions to avoid serializing on a global table lock.


Initial smoke testing shows that the code works fine.  Sometimes things are not as they appear when doing initial code review.  After several hours of heavy stress testing, the developer finds the machine has bugchecked.  Analysis of the dump showed that the caller of this function had steamrolled through nonpaged pool writing stacks on top of everyone’s pool memory.


The goal of today’s puzzler is to find the bug and describe how it could be fixed with minimal timing impact.


Here are a few details before you begin.


1.       The variable gIndex is an unsigned long global.

2.       The gLogArray memory was allocated from nonpaged pool and the size of this allocation is correct.




00:   PLOG_ENTRY GetNextLogEntry ()


01:         ULONG IterationCount = MAX_RECORDS;


02:         PLOG_ENTRY pEntry;


03:         do


04:               if (InterlockedCompareExchange(&gIndex, 0, MAX_RECORDS) == MAX_RECORDS)


05:                     pEntry = &gLogArray[0];


06:               else


07:                     pEntry = &gLogArray[InterlockedIncrement(&gIndex)];


08:               –IterationCount;


09:         } while(InterlockedCompareExchange(&pEntry->Active, 1, 0) != 0 && (IterationCount > 0));


0a:         return (IterationCount > 0) ? pEntry : NULL;





Happy hunting,


Dennis Middleton “The NTFS Doctor”


  [Update: our answer. Posted 6/10/2008]


Thanks for all the great posts!  I saw many interesting answers, and a few unique solutions that I hadn’t considered.


Bug Description


A slight time gap exists between the InterlockedCompareExchange and the InterlockedIncrement.  For this reason, several threads could pass the check for MAX_RECORDS prior to doing the increment.


Assume that N is the number of threads that pass the check for MAX_RECORDS while gIndex is at a particular value.

If N or more threads are able to pass the check while gIndex is equal to MAX_RECORDS-(N-1), then gIndex would be incremented beyond MAX_RECORDS.


For example, let’s assume that 3 threads passed the check while gIndex was at MAX_RECORDS-2.  Then after the three increments occur, gIndex would be equal to MAX_RECORDS+1.  From that point, invalid pointers would be passed out to the caller.


Possible Solutions


There are several ways to solve this problem.  Some are more efficient than others.  I would avoid doing checks for MAX_RECORDS-1, MAX_RECORDS, or MAX_RECORDS+1 (interlocked or not) since there could potentially be more than two threads involved in the race condition.  Such a solution would only reduce the likelihood of an overflow.


There were a few posts suggesting a lock or critical section for the section of code between the compare and increment.  That would be one solution, but it would also do away with the benefits of using interlocked functions.


In keeping with the philosophy of keeping the code fast and simple, here’s a solution that gives a very good result with minimal impact.


1.       I removed the if () {} else {} and simply allowed gIndex to increment unchecked.  With this change, gIndex can approach its max unsigned long value and possibly wrap – we need to keep the array index in check.

2.       The modulus operator (added to line 4 below) will divide the incremented gIndex by MAX_RECORDS and use the remainder as the array index.  When dividing, the resultant remainder is always smaller than the divisor (MAX_RECORDS).  For this reason, the array index is guaranteed to be smaller than MAX_RECORDS.  As even multiples of MAX_RECORDS are reached, the array index resets back to zero mathemagically and no interlocked compare is even necessary.



00:   PLOG_ENTRY GetNextLogEntry ()


01:         ULONG IterationCount = MAX_RECORDS;


02:         PLOG_ENTRY pEntry;


03:         do



04:               pEntry = &gLogArray[InterlockedIncrement(&gIndex) % MAX_RECORDS];


05:               –IterationCount;


06:         } while(InterlockedCompareExchange(&pEntry->Active, 1, 0) != 0 && (IterationCount > 0));


07:         return (IterationCount > 0) ? pEntry : NULL;




With the fix in place, the code is smaller, faster, easier to read, and most of all – the bug is gone.  When developing code, always try to think small.             


Best Regards,

NTFS Doctor




Comments (19)

  1. Matthew says:

    Seems to be a simple race condition between the InterlockedCompareExchange at line 0x04 and the InterlockedIncrement at line 0x07.  At MAX_RECORDS-1, two threads could both pass the InterlockedCompareExchange and then both call InterlockedIncrement pushing gIndex up to MAX_RECORDS+1 and killing memory from then on.

  2. JM says:

    Scenario: gIndex == MAX_RECORDS-1. Thread 1 passes through line 4. Context switch. Thread 2 passes through line 4. Thread 2 passes through line 7. gIndex == MAX_RECORDS. Context switch. Thread 1 passes through line 7. gIndex == MAX_RECORDS + 1. Threads will now start scribbling past the end.

    The obvious fix would be to use a second loop to set gIndex to the correct value:

    ULONG index;

    do {

      index = gIndex;

    } while (InterlockedCompareExchange(&gIndex, (index + 1) % MAX_RECORDS, index) != index);

    pEntry = &gLogArray[index];

    This is probably not what you have in mind when you want a "minimal timing impact" fix, though. So I’ll leave it to people with more experience in writing error-prone code like this.

  3. Tal Rosen says:

    Issue: gIndex might pass MAX_RECORDS.

    Lets assume gIndex = (MAX_RECORDS-1).

    its possible for two (or more) threads to execute InterlockedIncrement(&gIndex) on line 07.

    Possible fix:




    } while( (pEntry >= &gLogArray[MAX_RECORDS] || InterlockedCompareExchange(&pEntry->Active, 1, 0) != 0) && (IterationCount > 0));

  4. Tal Rosen says:

    Oops 🙁 my "fix" got things worse.

    I forgot to zero out gIndex.

  5. Mo says:

    The problem is between line 4-7. gIndex boundary checking and incrementing are not protected.

    Illustrated with two threads:

    T1: line 4, gIndex=MAX_RECORD-1

    T2: line 4, gIndex=MAX_RECORD-1

    T1: line 6,7, gIndex=MAX_RECORD

    T2: line 6,7, gIndex=MAX_RECORD+1

    I would fix it to:

    pEntry = &gLogArray[InterlockedIncrement(&gIndex)];

    if (InterlockedCompareExchange(&gIndex, 0, MAX_RECORDS+1) == MAX_RECORDS+1)

       pEntry = &gLogArray[0];

  6. Tal Rosen says:

    My second try…

    Keep old_value from line 4 call to InterlockedCompareExchange.

    Replace line 7 InterlockedIncrement with:

    if (InterlockedCompareExchange(&gIndex,old_value+1, old_value) == old_value)

       pEntry = &gLogArray[old_value+1];


       goto line_04; //some other thread messed up…lets try again

  7. Myth says:

    i think this is related to index point out the allocated gLogArray boundary, cause when gIndex==(MAX_RECORDS-1) and execute to Line07, it will increaase to MAX_RECORDS and continue to return the &gIndex[MAX_RECORDS] which is unallocated;

    setting Line04 to check with (MAX_RECORDS) should solve the problem

  8. eranb says:

    Suppose gIndex = MAX_RECORD -1.

    Two consumers come:

    -Both compare exchange, no go since no wraparound.

    -Both increment. Boom, one of them is given an out of boundary memory location.

    To watch for this race, you must not unconditionally increment the counter. Instead,

    do the following:

    localVar = compareExchange(…).

    if (localVar != MAX_RECORDS){

     localVar1 = compareExchange(&gIndex,localVar,localVar + 1);

     If (localVar1 != localVar + 1){





    against the return value of the original compareExchange and try to put

  9. Slavo T. says:


     when gIndex is MAX_RECORDS-1, between line 4-7 another processes could increment gIndex. Then gIndex could cross MAX_RECORDS value, and at line 9 InterlockedCompareExchange could rewrite memory IterationCount times. Line 4 alone is ineffective for guarding array range.

     How to fix? FastMutex for lines 4-7?

    That’s idea.

    Slavo Tomascik

  10. On the while condition, if the first condion:

    InterlockedCompareExchange(&pEntry->Active, 1, 0) results on true (meaning that pEnty active is 1 and owned by the thread), we have a final condition on the return:

    return (IterationCount > 0) ? pEntry : NULL;

    but if iterationCount is zero (and pEntry was owned by the thread) the function will return NULL, but the log entry will be owned by the thread and lost.

    To fix this we can swap the while condition:

    while((IterationCount > 0)&& (InterlockedCompareExchange(&pEntry->Active, 1, 0) != 0));

    With the Swap, the thread that reaches 0 on iteration count could miss a log entry, but at least will not leak it.

  11. calin_iaru says:

    The IterationCount starts from MAX_RECORDS and continues up until is 0; that means that there are at most MAX_RECORDS loops. This leads to the conclusion that the array size is MAX_RECORDS and not MAX_RECORDS+1.

    When gIndex is MAX_RECORDS-1, the  pEntry pointer takes the address of gLogArray[MAX_RECORDS] which means that it passes the memory boundary.

    The fix is to compare the gIndex with MAX_RECORDS-1 as in:

    if (InterlockedCompareExchange(&gIndex, 0, MAX_RECORDS-1) == MAX_RECORDS-1)

  12. zzz says:

    Two bugs:

     1- it uses gLogArray[MAX_RECORDS]

     2- it returns NULL even if it founds a valid log array record.

    it should be like below:

    PLOG_ENTRY GetNextLogEntry ()


         ASSERT (MAX_RECORDS > 0);

         ULONG IterationCount = MAX_RECORDS;

         PLOG_ENTRY pEntry;      



               if (InterlockedCompareExchange(&gIndex, 0, MAX_RECORDS – 1) == MAX_RECORDS – 1)

                     pEntry = &gLogArray[0];


                     pEntry = &gLogArray[InterlockedIncrement(&gIndex)];

         } while(InterlockedCompareExchange(&pEntry->Active, 1, 0) != 0 && (–IterationCount > 0));

         return (IterationCount > 0) ? pEntry : NULL;


  13. Niclas Lindgren says:

    Just a quick stab at it. But it seems to be a variation of the double check problem. You check something and then update it outside the locked path.

    Above you check if you have reached the end of the circular buffer. Let’s say 3 cores do that simultanously when gIndex is MAX_RECORDS-1, then those 3 cores will increment gIndex passed MAX_RECORDS and then next time around the loop will go haywire on you update whatever memory it can find when setting the Active flag for IterationCount location passed this circular buffer.

    So two problems, double check and ill controlled invariant (>MAX_RECORDS is a safer bet, still completely wrong, but..)

  14. Niclas Lindgren says:

    Didn’t see that we were supposed to give a slight modification that could solve the problem.

    One way might be to do the increment before the check, a bit ugly (all 3 cores will contend for index 0) but safe (you are guarding the buffer usage, not the actual index), at least I think it is safe. I usually don’t play with these kind of things as it is very easy to get it wrong. Just grab the spinlock and create a critical path and be done with it (unless this is some really crucial kernel code path)

    Also I just realized I was talking about cores and not threads. Are the interlocked barriers? I think so, so it should be safe.

  15. Infro says:

    #1 Not multi-thread safe, This line can be paused before execution, queing it up

    thread one pauses right before pEntry = &gLogArray[InterlockedIncrement(&gIndex)]; thread two executes InterlockedCompareExchange(&gIndex, 0, MAX_RECORDS) == MAX_RECORDS)

    #2 IterationCount’s usage isn’t thread safe —

    #2 is really bad though, because it would typically assemble int a

    mov eax,[value]

    sub eax,1

    mov [value],eax

    instead of

    sub DWORD PTR [value],1 which means that thread switching is three times more likely to screw with the value, need to use interlock to decrement interationcount (among other things)

  16. Infro says:

    Part 2: How it could be fixed

    (Yay, I have some extra time to spare!) (Lies, I know)

    PLOG_ENTRY GetNextLogEntry () {

       LONG lIndex;

       LONG IterationCount = MAX_RECORDS + 1; \Don’t Cycle Through Entire List More than Once \Assuming we allocated, MAX_RECORDS+1 and MAX_RECORDS is the actualy max index

       PLOG_ENTRY pEntry; \Pointer to our Data

       do {

           lIndex=InterlockedIncrement(&gIndex) % MAX_RECORDS;

    if(InterlockedDecrement(&IterationCount) < 0) break;

           pEntry = &gLogArray[lIndex];

       } while( InterlockedCompareExchange(&pEntry->Active, 1, 0) );

       return (IterationCount > 0) ? pEntry : NULL;


    Sidenote, noticed ULONG, should be LONG, or else you can’t be sure to break on (IterationCount > 0)

  17. eranb says:

    The problem happens when gIndex equals MAX_RECORDS -1. If more than one thread come in the same time and do the compare, all will fail and increment. The first will get a good record, the rest will walk over others memory. One way to solve this is by using module operation and never compareExchange to make sure you never overflow.


  18. Tal Rosen says:


    So, how about totally omitting InterlockedIncrement calls?

    We don’t need gIndex, let’s just try a random number.

    04  pEntry = &gLogArray[rand() % MAX_RECORDS];

    Even with your InterlockedIncrement version, we are not guaranteed to test all slots.


  19. Jason Evans says:


    This blog entry might be of interest to someone whi’s been having a go at solving this puzzle: