Getting the most out of CHESS

Hi, this is Shaz Qadeer from the CHESS team.  If you have visited our blog before, you have probably read our articles about systematic concurrency testing using CHESS, design of the CHESS scheduler, and the advantages of CHESS testing over stress testing.  Hopefully, these articles have convinced you that there is substantial value in using CHESS to test your concurrent programs.  In this article, I will try to explain the best practices in using CHESS.  In other words, I will attempt to answer the question:  How do I get the most out of CHESS?

Before I answer that question, I would like to review the anatomy of a test suitable for being used with CHESS.  A CHESS test comprises three functions: ChessTestStartup, ChessTestRun, and ChessTestShutdown.  Given these functions, CHESS performs the following computation:

ChessTestStartup();
while (exists unexplored schedule) {
    ChessTestRun();
}
ChessTestShutdown();

After executing ChessTestStartup once, ChessTestRun is executed repeatedly, each execution taking a different schedule of the multithreaded scenario in ChessTestRun.  If an execution results in a crash, manifested either as an assertion failure or an uncaught exception, then CHESS terminates.  At this point, CHESS provides you with the capability to execute ChessTestRun along the thread schedule that caused the crash. If no error is found, the test finishes after executing ChessTestShutdown.

The most important aspect of using CHESS is to identify the concurrency scenario to be tested.  For example, suppose we want to test a concurrent non-blocking queue data structure that supports two operations Enqueue and Dequeue.  There are several interesting concurrency scenarios for this data structure, e.g., two Enqueue operations in parallel, two Dequeue operations in parallel, an Enqueue and a Dequeue in parallel, etc.  Each of these scenarios should be given to CHESS as a separate test.  Remember that CHESS is systematically searching the huge set of executions of a concurrent program and the number of executions increases exponentially with the size of the program.  You can help CHESS by factoring out your large test into a collection of smaller scenario tests. 

To test the first scenario of two parallel Enqueue operations, ChessTestStartup and ChessTestShutdown should be empty and ChessTestRun should perform the following sequence of operations:

1.  Allocate and initialize the queue.
2.  Create two threads each calling the Enqueue operation once, wait for the threads to finish, and reset the queue.
3.  Free the queue.

Two aspects of step 2 in ChessTestRun are worth noting.  First, it makes each thread perform an operation only once.  This is in sharp contrast to the approach of stress testing, where each thread is usually made to repeat the same operation many times.  Repetition is built into the CHESS framework (see the pseudo-code above), so your test does not have to do it.  Second, ChessTestRun waits for all child threads to finish and resets the queue state.  This is important because CHESS expects every execution of ChessTestRun to end in a state that is equivalent to the state at the beginning.  You can do this by simply waiting for all threads, releasing all handles and resetting any global state used by the test. If ChessTestRun does not satisfy this property then CHESS might be unable to reproduce a bug discovered during its search.  We have observed that most scenarios naturally satisfy this property.  But sometimes global or static variables are initialized lazily during the first execution of ChessTestRun, thereby making the states at the beginning and end of the first execution of ChessTestRun different from each other.  This problem is easily fixed though; just call ChessTestRun a couple of  times at the end of ChessTestStartup.  This is often sufficient to trigger all lazy initialization in the test. With this trick, subsequent calls to ChessTestRun have the same begin and end states.

For testing the non-blocking queue, we didn’t really use ChessTestStartup and ChessTestShutdown.  These functions tend to be used relatively infrequently for the expensive part of test setup and teardown.  For example, if the test connects to a database server, then the connection to the server could be established in TestStartup and torn down in TestShutdown.

Now, imagine that CHESS successfully generates all executions of the test with two threads each calling the Enqueue operation without finding an error.  What does that say about the executions of a test with three such threads?  Theoretically, the test with three threads may uncover a bug that does not occur with two threads.  At the same time, the number of possible executions grows exponentially with the number of threads; consequently, testing with large number of threads will consume exponentially more testing resources.  So what should our testing strategy be? 

The systematic testing approach of CHESS is predicated upon the small-scope hypothesis: the likelihood of finding new errors in a parameterized system goes down as the parameter values are increased.  In other words, systematic exploration of executions for small parameter values is highly effective in flushing out many bugs.  A large amount of empirical evidence supporting this hypothesis has been obtained by us and by others working in the broad area of formal analysis of computer systems. An implication of the small-scope hypothesis is that an iterative strategy works best.  Start with a small number of threads and increase the number gradually depending on your testing resources.  This strategy also has the advantage that you get partial yet quantifiable coverage as exploration for a given number of threads is finished.

Another way to look at this testing strategy is as follows. If a bug requires a minimum of three threads, run CHESS with exactly three threads. Unlike stress, CHESS is guaranteed to find this bug. By iteratively increasing the number of threads, you are increasing the complexity of the bugs you find and the coverage achieved. Another advantage of this iterative strategy is that for every bug found, you find the simplest scenario that triggers the bug. This tremendously improves your debugging experience.

Finally, I would like to mention some common difficulties our users have encountered in creating a test suitable to be used with CHESS.  For each issue, we also describe a possible workaround.

1.  A background thread starts during ChessTestStartup and finishes during ChessTestShutdown.  This thread executes outside the control of CHESS, since CHESS controls the scheduling of only the threads created during the execution of ChessTestRun.  If this thread modifies the state of the test, then CHESS’ mechanism for reproducing executions may get confused.  The simplest workaround is to make ChessTestStartup and ChessTestShutdown be a part of ChessTestRun, thereby bringing the background thread under the control of CHESS.  Of course, you will now pay the performance penalty of having the startup and shutdown code exceuted during each iteration of ChessTestRun.

2.  ChessTestRun uses a custom threadpool, perhaps implemented using System.Threading or WIN32 threads, to schedule tasks asynchronously.  In such threadpool implementations, threads are often allocated dynamically and the number of threads may not be the same at the beginning and end of ChessTestRun.  In this case, the fix is to implement a shutdown function that tears down all internal threads in the threadpool and ensure that this function is called at the end of ChessTestRun.

Well, that is all I have to say for now.  There is a lot of information here on how to use CHESS effectively.  I hope it helps all you CHESS enthusiasts out there.  Please do let me know if you have any questions.

Best regards,
Shaz