CCR vs Task Parallel Library

I was reading this about profiling gotchas and couldn't resist the urge to download the samples from the book and implement a few of them using CCR to compare performance against TPL. I played around mostly with the parallel quick-sort. The first observation is that CCR takes a little more setup than using TPL but most of that would be something you do once in your application anyway so I would consider them similar in readability. Both implementations looked very similar and straight forward. I guess that is what I like with both TPL and CCR is that your code looks almost like regular single threaded code, but with the benefits of concurrent execution.

While executing these two variants and tweaking thresholds I noticed that CCR almost always ended up executing 10-20% slower than the equivalentTPL implementation. Looking a little closer at what was happening I think it all made sense. First of all this was on a twin core machine so there are really only two threads running. CCR is built around message passing and the way I implemented the sort algorithm was to post a message when a part of the array was sorted signaling when all parts had been sorted. Since quick-sort uses a pivot number putting it in the right place it also means that a lot of these partial sorts were completed by just putting a single number in place. That is a lot of messages being sent. The handler also must be a run once handler that then puts itself back in the queue so that I don't update the remaining unsorted counterconcurrently. Hence a lot of time is relatively spent in the CCR dispatcher just to decide when everything is done rather than sorting. Did that make sense without a code listing?

Anyway... My gut feeling is that TPL is more well suited for parallel computations than CCR. The code looks a little cleaner and performance is a little better. However CCR is more well suited for event based implementations where TPL is not that big of an advantage. As usual it is a matter of choosing the right tool for the right problem.

Preemptive comment: I think I could have made my CCR implementation a little more efficient and I suspect the differences would have been smaller with more cores but I didn't test that. I'll let you do that!