Concurrency, part one.

Last week, I mentioned I was going to start a series of articles on concurrency, here goes.  I do feel a need to give a caveat: I'm sort-of winging it here - I know where I want to go on this one, but I'm not sure how I'm going to get there :)  Please bear with me, because this is a relatively complicated topic.  Also remember that I'm trying to start this from the ground up - the reality is that a huge number of the issues that show up can't be addressed unless you've covered the fundamentals.

The interesting thing about concurrent programming is that it can be as nightmarishly difficult as you want to make it, or as easy as you want to make it.  The reality is that the more you push the envelope of concurrent programming the harder it is.

On the other hand, with a few relatively simple steps, you can achieve a "reasonable" level of concurrency in your application without too much pain.

At this point, the people who really get high performance, highly scalable computing are cringing.  And they're right to be cringing, high performance scalable concurrent computing is REALLY hard.  There are only a couple of dozen people at Microsoft who know enough to do it and get it right consistently (and I'm absolutely NOT among them).  Getting high performance concurrent computing (the kind of thing that is needed to get your server application scaling evenly to more than 10 or so CPU cores) is HARD.  It literally ranks among the hardest things we know how to do in computing.  There are reams and reams of PhD theses that have been written about how to do it by people way smarter than I am.

So you can think of this series as being a series about "good enough" concurrent programming - it's not designed for high-end systems, it's about making the stuff that runs day-to-day work well enough.

So, with all that said, on with the show.

 

First off, what IS concurrent programming?  It's a big phrase, and it's not clear what's meant by it.  In a nutshell, concurrent programming is about trying to make your application more scalable by making your application do more things at once.  As I mentioned in my post about Herb Sutter's Free Lunch article on CPU speed limitations in Dr. Dobbs, concurrency is going to be more and more important in computing in the future.

How can your application do more than one thing at once?  Well, the reality is that if your computer has only one CPU core, then it can't.  But most machines sold these days come with more than one CPU core (heck, the $700 machine I bought my daughter for Christmas came with a HT P4 CPU).  So the reality is that MP machines are more mainstream than you might think.  And they're going to become more and more mainstream as time goes on.

But even with that, your machine can only do as many things at once as it has CPU cores.  The key thing to realize is that just because your machine can only do one thing at a time, doesn't mean it is actually DOING something most of the time -in fact, 99% of the time, your CPU is literally turned off, especially if your machine has a Celeron CPU core.  Even if your application does a lot of stuff, it STILL idle much of the time - every time you read from a file, you block your process waiting on the disk to return your file's data, every time you allocate virtual memory, you potentially wait until the memory management system flushes dirty pages to disk and maps those pages into your address space.  And all the time that your application is idle, you could potentially be using your CPU for other things.  And if you do have a machine with more than one core), then your machine CAN do more than one thing at once.

Since your machine is idle most of the time, then its useful to identify ways of taking advantage of that idle time - if you can do stuff in the time your app is blocked, that's stuff that doesn't have to slow down the main thread of your application.  Simple examples of this are the spelling check (or background printing) in a word processing application - checking the spelling of words in a document (or formatting the document for printing) don't need to be done in the main thread, so they can be offloaded into a worker thread that would perform the actual work. 

In Win32, there are two major systems for implementing concurrent programming, Threads and Fibers.  I won't touch on Fibers, because I don't have enough experience with Fiber programming to write authoritatively on the issues of programming with Fibers, but I can talk a bit about concurrent programming with threads.  And there's a wealth of content related to concurrent programming with threads so...

Ok, so that's what concurrent programming is, why does everyone keep saying that it's so hard?

The big deal with concurrent program is that it introduces a huge set of issues that you don't see normally.  First off, you need to ensure that your application can take advantage of concurrency - I can't think of the number of times that someone's naively added multiple threads to their application and made their application (or system) slow down.  The other problem is that multithreaded applications introduce a series of failures that aren't seen in single threaded applications.

The basic problem is that many (most) operations are so-called read/alter/write operations.  For example, most (if not all) arithmetic operations in C are R/A/W operations.  For example, in C, b = b + c reads the value of b from memory, reads the value of c from memory, adds those two values and then writes it back to memory at location b.  If multiple threads can access any of those memory locations, that value in memory, then the results are, as they say, "undetermined".  The language doesn't help you because neither the C or C++ language make any guarantees about the behavior of operations in the presence of multiple threads.

So if you're not really careful, you can get truly strange and interesting data corruption issues - even simple operations like incrementing a variable can misbehave in strange and mysterious ways.

 

Ok, having said all that, tomorrow I'll start discussing some relatively simple rules for writing concurrent applications.