A Lesson in Multi-Threaded SQL Programming

I don't think there's a developer alive today who hasn't, at one time or another, struggled with a bug that, no matter how much you review your code and decide the code is correct, trace through it and debug it, you just can't figure it out.  A bug so insidious that you just swear there's gotta be something horribly wrong with your operating system, your debugger, your hardware, etc.  One where you're about ready to tear your hair out, give up and completely redesign the entire program to work around it... then by sheer dumb luck or inspiration, you stumble upon the answer and you can't resist the urge to thwack yourself in the forehead and go "DUH!"

I ran into one of these on my quest to develop the ultimate build automation solution.  I slaved away until 9:30pm on a Friday night trying to figure out why my very simple SQL queries were getting unpredictable results.  "SELECT COUNT(*) FROM table WHERE condition", right?  Should return a number greater than or equal to zero, and in my particular test case, I should always get 1 as a result.  But my query was sometimes returning an empty result set, and at other times was getting 0 or 2 as a response.  Whenever I tried the query manually in Management Studio, I got the correct result, so there's no way this should have NOT worked in the program.  I beat my head against it for hours, even enlisting the help of another dev to look over my shoulder and see if he could spot anything obvious.  Everything I tried failed to fix the problem, but moreover, I couldn't actually see anything wrong with my code.  Additionally, the debugger was telling me that my program was throwing occasional InvalidOperationExceptions and FormatExceptions, neither of which should ever happen with all the try/catch blocks and safety checking code I had in place.  Definitely some odd behavior here.

Here's where I need to explain a little more about how the program is structured: With its current (and admittedly inefficient) structure, my build backend service runs two separate timers at different times: One timer runs every ten seconds to query for new jobs and send a heartbeat to the database that allows the frontend to notify users if a backend has gone down for some reason.  Another timer runs more frequently while a job is being processed to see if a flag has been set by the frontend stating that the job should be aborted.  While not true multi-threaded programming, this event-driven system has the effect of being multi-threaded, and the "main" thread would take care of reading job data and updating the job's status periodically when individual tasks finished processing.

All of these events were running on a single SQL connection using objects in the System.Data.SqlClient namespace (.NET 2.0).  And it didn't hit me until I left for the night and was on my way home what the problem was.  I realized that my empty result set was the correct response to the "Check for Abort" query, and a value of 2 would be returned when I'd query for the current status of the job.  It turned out that the multi-threading issue was causing the returns for my queries to get mixed up!  None of the MSDN documentation I'd come across had covered this particular aspect – either there's a genuine .NET bug I discovered here, or everyone assumes that you'll only ever use a SQL connection in a single-threaded app.  But once I'd figured out how the buffers were getting crossed, I logged back in as soon as I got home and gave each "thread" its own connection, to ensure that no connection would ever have to serve more than one purpose and would only ever process things linearly.

Voila, it worked!  Now, not only was I getting the correct results to every query, but the debugger no longer reported ANY exceptions being thrown, ANYWHERE in the program.  Usually, these "Duh" moments come when you realize you have a misplaced comma or an order-of-operations issue, or an off-by-one counting problem, that sort of thing.  In this case, it was slightly more complicated, but something I should have figured out much sooner considering this wasn't my first foray into multi-threaded programming.  (The program does have actual background threads as well, but they weren't affected by this particular problem.)

So, if you ever find yourself needing to run SQL queries and commands from multiple threads, don't try to share a connection between them.  One thread per connection.

- Matt

Technorati Tags: XPe,Embedded,SQL