Rambling … loops and multi-processing. When things literally slow down…
Article
Continuing from Rambling … why is my dual processor colleague not twice as fast? I thought I should have a quick look at some basic looping and multi-processor support, released with the Parallel Extensions to the .NET Framework in .NET 3.5. Again I developed some simple and probably ‘silly’ code to experiment with loops and parallel extensions. The code simply increments a long variable many, many times and performs some meaningless maths, just to keep the processor occupied. What I am trying to figure is whether the parallel extensions bring any simple relief.
With the last test, TestThree(), we went one step further and implemented the parallel extensions on the inner loop. Normally not needed, but as part of the test I decided to go where no-one would normally venture :)
Pop-Quiz
“Pop-Quiz”, which test will be the quickest and which will be problematic? I was surprised …
Analysis
Let us first look at the processor view, using the good old trusted task manager:
With both TestZero() and TestOne() the total processor utilisation remains around 50% and the processing seems to plod along quite sequentially. With TestTwo() and TestThree() you notice an uprising of threads and the total processor utilisation practically maxes on both processors as expected.
But who is the quickest and who is the problematic case?
Figure 2 – Results
TestZero() and TestOne() basically arrive at the finish line at the same time. I expected TestOne() to be quicker, but with all the sceptical re-runs it always proved to be slightly slower than TestZero(). I used the profiling tools to get a more detailed picture, whereby the results are also not making me feel 100% happy although TestOne() is faster than TestZero() with instrumented code. Using sampling we get a less accurate picture, but with less impact on the application behaviour.
TestTwo() was faster as expected and TestThree() proved to be the champion in terms of speed. It is important to test your application before making extensive use of parallel processing, because if we remove the first two statements of the SillyProcessing() method, the parallelism actually proves slower than the classic counterparts. Also TestOne() speeded up as expected, when the calculation statement and especially the heavy string statement were removed as shown:
What is also evident is that one of the tests has a problem … yes, TestThree() seems to suffer from lost increments. Before we start debugging and analysing the anomaly in great detail, we can emphasise that the code is not thread safe and that the data counters are simply overridden by the inner loops competing for the battle zone.
So what, if anything have we learnt in this chat?
Throwing processors and threads at a problem will not necessarily speed up your application. There are exceptional profiling tools in VSTS 2008 and even better ones in VSTS 2010, which will enable you to profile and analyse your application in a number of environments. Remember that too much of a good thing, i.e. too much parallelism or too much salt, can destroy an exceptional application or an exceptional meal respectively q;-)