This week, I am writing a series of blog posts on the support for parallel programming in .NET 4.0 and Visual Studio 2010.
Let’s start with PLINQ. Parallel LINQ (PLINQ) is a query execution engine that accepts any LINQ-to-Objects or LINQ-to-XML query and automatically utilizes multiple processors or cores for execution when they are available.
The Parallel Framework team at Microsoft has released some great code samples at http://code.msdn.microsoft.com/ParExtSamples. (If you haven’t played with these samples yet, I highly recommend it. They demonstrate the parallel programming functionality in .NET 4.0 very well. Stephen Toub also briefly describes each of the samples here.) Let’s take a look at the Baby Names example, which uses PLINQ.
At a high level, the Baby Names program queries a data set of baby name popularity information using both LINQ and PLINQ, and displays the results as a graph as well as the time taken to run the search for each, so you can compare the performance of LINQ vs. PLINQ.
I will run the program. First, you need to enter a name and state, which determines the popularity of that baby name in that particular state over a constant set of years. (It’s actually randomly generating the popularity data, so don’t make baby-naming decisions based on this.) I will enter my name “Jennifer” and my home state “MI” as the inputs. Then, I will click the “LINQ” button and wait. It searches over the data, generates a graph of the results, and displays the amount of time that the search took. Next, we can compare LINQ’s performance to that of PLINQ. Click the “Parallel LINQ” button and wait. Again, the search is performed, but it uses PLINQ rather than LINQ this time. Again, a graph of the results and the search time is displayed. In addition, the speedup between LINQ and PLINQ is also displayed. On my machine with 2 cores, the speedup is usually around 2x.
Now that you understand what the sample does, let’s look at the relevant code. In Queries.cs, you will find the following:
// SEQUENTIAL QUERY _sequentialQuery = from b in _babies where b.Name.Equals(_userQuery.Name, StringComparison.InvariantCultureIgnoreCase) && b.State == _userQuery.State && b.Year >= YEAR_START && b.Year <= YEAR_END orderby b.Year select b; // PARALLEL QUERY _parallelQuery = from b in _babies.AsParallel().WithDegreeOfParallelism(numProcs) where b.Name.Equals(_userQuery.Name, StringComparison.InvariantCultureIgnoreCase) && b.State == _userQuery.State && b.Year >= YEAR_START && b.Year <= YEAR_END orderby b.Year select b;
These are the two queries that drive the “LINQ” and “Parallel LINQ” buttons. Note that they are identical except for one thing: in the sequential query (using LINQ), it is querying over _babies, while in the parallel query (using PLINQ), the code is _babies.AsParallel().WithDegreeOfParallelism(numProcs).
The WithDegreeOfParallelism is optional. Without it, it will by default execute with the number of processors the machine has (up to a limit of 64).
Therefore, it’s pretty simple to write a PLINQ query – just add .AsParallel() to your existing LINQ-to-Object or LINQ-to-XML queries. The syntax is straightforward, and you don’t have to be a threading guru to use it. It can greatly improve the performance of your LINQ queries.
However, do keep in mind to always test your applications (if your LINQ queries have side effects, executing them in parallel can be a bad thing) and measure their performance (there is some overhead to parallelism, so in some scenarios it’s actually faster not to use it). There is a good writeup of scenarios when you probably shouldn’t use PLINQ at http://msdn.microsoft.com/en-us/library/dd997403.aspx.
Stay tuned for tomorrow’s post, when we will look at the static Parallel class.