I’ll never forget the first time I heard someone mention a “Race Condition”. In my younger years when I was more worried about what people thought of me I was to embarrassed to admit that I didn’t really know what they were talking about. I knew it had something to do with multi-threaded programs and the unpredictability of order and locking and blah, blah, blah… To be honest it seemed like so much noise to me at the time. After all, most of the programs I wrote never had to contend with this.
Then when multi-core hyper-threaded systems became common place and I started working on Windows Workflow Foundation and dealing with multiple threads and race conditions became unavoidable. My current computer is an Intel I7 6 Core hyper-threaded system which means that in Windows Task manager it shows 12 CPUs. When I started using NCrunch to run my unit tests in parallel I found many threading bugs and race conditions in my code.
Why is this so difficult?
“With great power, comes great responsibility” – Spiderman
It just is… What can you say? Let’s just agree that your life would be simpler if you always use a request per thread model and avoid access to shared state. You can do a lot with this type of programming but then one day you will have to cross that line and when you do… don’t hurt yourself.
Why is this a race?
Imagine each of the runners in the picture above is a CPU Core. Each of them is working all at once. If they don’t interfere with each other or depend on each other then no problem but what if…
What if one runner has to wait for another runner to hand off some data before it can continue? Now you have to coordinate this race. There are two ways to do this, locking and waiting. Jeff Richter taught me that waiting is better than locking. In fact, the best designs avoid locks entirely.
Imagine a relay race where the runners approach the stadium to find a guard there who will allow one and only one runner on the track at a time. Each runner, enters the stadium, runs their leg of the race, exits the stadium and then the other runner is allowed to enter. Sure, it works but there are problems… performance, deadlocks, live locks, recursion and others that I won’t get into here.
Now imagine that all runners enter the stadium, each takes their place and then the race begins. Our relay team has only one runner active and the others… wait. They wait until a signal (in this case the baton) tells them that the other thread is done and they can now begin.
Fine but… what if there were three runners and three batons and our runner must begin when one of two of the three complete and if the other one completes the race is over with a disqualification?
Where you will get into trouble is when you assume that you know the outcome of a race. You may notice that when you run your program in the debugger that the race always ends up in the order [a, b, c]. So you ship your program only to find that once in a while a customer reports an error that can only occur when the order is [c, a, b]. How can this be? You setup a test environment and run the program a million times and it never does [c, a, b] so you close the bug as “not repro”. But is the customer just crazy? Or unlucky? Or did you create a program that has a threading bug because of a race condition?
In a race, there are only three possible outcomes
Given a point in time t
- None – none of the racers have completed the race
- All – all have completed the race
- Any – some have completed the race
If you make any other assumption about the order of the race participants you will be wrong sooner or later.
As I’m currently thinking about API design, I’m asking this question.
“Can I make Windows Workflow Foundation easier to use?”
One reason that it is difficult is that we have created a race and asked you to solve it. Any time you run a workflow, you are engaging in multi-threaded programming. You found this out the first time you called WorkflowApplication.Run() only to find that Run just starts the workflow.
How do you know when the workflow is done? Well that depends…
Imagine that there is a race
- In lane 1 – Idle
- In lane 2 – Faulted
- In lane 3 – Completed
- In lane 4 – Canceled
Run is like the starting pistol. Now you have to wait for the outcome of the race. You have to write some complicated code to take into account the possible outcomes, code which most developers have never written previously.
I’d like to propose that the solution to making this easier is to use a Task to represent some of the runners in our race and exceptions to represent others. For example, we can assume that typical outcomes would be Idle or Completed and that exceptional outcomes would be Faulted and Canceled. Once we have taken that step Task has some helpful methods.
- If I want a task that will complete when the workflow becomes idle or completes I can call Task.WhenAny(idleTask, completedTask)
- If I want to wait with a timeout I can call Task.WaitAll() or Task.WaitAny()
Now instead of pushing the problem of waiting onto you, Workflow can give you tools that make it easy to do the right thing.
Once again, let me say I’m thinking out loud here. This is not a promise, just a dream for now…