Jinx: Visual Studio plug-in for debugging multi-threaded code

Today I’m going to introduce a plug-in for Visual Studio (still in beta) that helps to speed up finding concurrency bugs in multi-threaded applications.

Example of a concurrency bug

Consider an application that has two threads (“Thread A” and “Thread B”) that share a common stack. Each thread reads (pops) one value off the stack and then writes (pushes) one value back onto the stack; and between pushes and pops these threads do other work. During testing the application almost always works correctly; however occasionally the application crashes. The test team records data inputs, machine configurations, and use of the application, unfortunately after hours of this they are still unable to reliably reproduce the crash. It turns out that this a concurrency bug (a bug that occurs only if the order of events that produce the crash occur with the exact “right” timing) in the following stack_push() function , which makes reproducing the bug very unreliable.

//Push the given node onto the stack
void stack_push (struct stack *stack, struct node *node){

         struct node * old_top;
static int count = 0;

while(1) {
old_top = stack->top;
node->next = old_top;

if (stack->top == old_top){
stack->top = node;
break;
}
}

// only valid if there are no concurrent pops
assert_stack_contains(stack,node);
}

If the threads execute in the following order

Thread A

Thread B

Read stack->top

 
 

Read stack->top

 

Write stack->top

Write stack->top

 

Thread B’s push operation is overwritten by Thread A’s push operation resulting in the bug. During ordinary execution this exact sequence of events can take hundreds or thousands of runs to occur; this is where the Jinx concurrency debugger comes in.

Introduction to Jinx

Jinx works by making a copy of the application’s state while it is being executed, and then runs multiple "simulations" of the application in the background trying to force concurrency bugs to appear. Since concurrency bugs normally occur in or around code that accesses shared data, Jinx adds artificial wait states to the simulations so that shared memory accesses occur as close together as possible. In this way, it can potentially reproduce concurrency issues such as the one demonstrated above in far fewer runs than waiting for the correct order of events to naturally occur on the system.

Unfortunately once the bug is reproduced, locating the problem code can be much harder. One issue that can interfere in correctly locating the problem is called overshoot. Overshoot occurs when one thread causes another thread to crash, the problem thread then continues to execute for a short period of time before the processor halts all of the threads. The problem thread is now at location that is nowhere near where the bug occurred, making discovery of the faulty code difficult. To address overshoot, Jinx introduces a feature called SmartStop, which holds the problem thread on the last line of code to communicate with the shared data, making discovery of the offending code much easier. In the example above, SmartStop would stop thread A in the stack_push() function - since this was the last point of communication before the crash.

Getting starting with Jinx

To get started with Jinx, download and install the beta from https://petravm.com/beta/jinx.msi

Review the Jinx documentation and code samples (comes with samples in C and C#), and work through the short C tutorial in the documentation.

NOTE: Jinx gives three debugging options, including the ability to debug the running operating system. The default mode is to debug the most recently registered program, which is the recommended setting for application debugging.

clip_image001