MPI.NET – MPI for Managed Codes

In this post, I would like to introduce you MPI.NET from Indiana University. You can download MPI.NET here including its runtime, SDK and source codes then follow this post. You do not need to have a Windows cluster or even a multi-core/multi-processor workstation to develop MPI programs: you can use any desktop machine with Windows XP, Vista or Windows Server 200x to develop MPI programs with MPI.NET.

In development, you will also need Visual Studio 2005/2008 and MS-MPI. Why MS-MPI? Because MPI.NET can only work with MS-MPI that you can get freely from Microsoft HPC Pack 2008 SDK (or Microsoft Compute Cluster Pack SDK). I prefer to use the HPC Pack 2008 SDK because it has newer version of MS-MPI (version 2). In order to use your future MPI.NET in a cluster, MPI.NET runtime must be installed on the cluster nodes.
 
Now let see a C# program to calculate PI with the following algorithm:

Inscribe a unit circle within a unit square, and then randomly throw darts within the unit square. The ratio of the number of darts that land within the circle to the number of darts that land within the square is the same as the ration of the area of the circle to the area of the square, and therefore can be used to compute PI.

Using this principle, the following sequential program computes an approximation of PI: 

using System;

class SequentialPi
{
static void Main(string[] args)
{
int dartsPerProcessor = 10000;
Random random = new Random();
int dartsInCircle = 0;
for (inti = 0; i<dartsPerProcessor; ++i)
{
double x = (random.NextDouble() - 0.5) * 2;
double y = (random.NextDouble() - 0.5) * 2;
if (x * x + y * y <= 1.0)
++dartsInCircle;
}

        Console.WriteLine("Pi is approximately {0:F15}.",
4*(double)totalDartsInCircle/(double)dartsPerProcessor);
}
}

What MPI Can Do To Parallelize The Codes?

In a message-passing system like MPI, different concurrently-executing “processes” communicate by sending/receiving messages from one to another over a network. Each of the MPI processes has its own, local program state that cannot be observed or modified by any other process except in response to a message.  Most MPI programs are written with the Single Program, Multiple Data (SPMD) parallel model, where each of the processes is running the same program but working on a different part of the data. MPI allows us to launch the same program across many nodes with single command. Initially, each of the processes is identical, with one distinguishing characteristic: each process is assigned a rank, which uniquely identifies that process. The ranks of MPI processes are integer values from 0 to P-1, where P is the number of processes launched as part of the MPI program. MPI processes can query their rank, allowing different processes in the MPI program to have different behavior, and exchange messages with other processes in the same job via their ranks.

Now look on PI sequential program, the more darts we throw, the better the approximation to PI. To parallelize this program, we'll utilize MPI to run several processes, each of which will throw darts independently. Once all of the processes have finished, we'll sum up the results (the total number of darts that landed inside the circle on all processes) to compute PI. The complete code for the parallel calculation of PI is as following:

using System;
using MPI;  // add reference to message passing interface namespace before add this line

class ParallelPI
{
static void Main(string[] args)
{
int dartsPerProcessor = 10000;
using (new MPI.Environment(ref args) )
{
if (args.Length > 0)
dartsPerProcessor = Convert.ToInt32(args[0]);
Intracommunicator world = Communicator.world;
Random random = new Random(5 * world.Rank);
int dartsInCircle = 0;
for (int i = 0; i < dartsPerProcessor; ++i)
{
double x = (random.NextDouble() - 0.5) * 2;
double y = (random.NextDouble() - 0.5) * 2;
if (x * x + y * y <= 1.0)
++dartsInCircle;
}

            if (world.Rank == 0)
{
int totalDartsInCircle = world.Reduce<int>(dartsInCircle, Operation<int>.Add, 0);
System.Console.WriteLine("Pi is approximately {0:F15}.",
4*(double)totalDartsInCircle/(world.Size*(double)dartsPerProcessor));
}
else
{
world.Reduce<int>(dartsInCircle, Operation<int>.Add, 0);
}
}
}
}

1st step in MPI program is to initialize MPI environment by creating a new instance of MPI.Environment within Main method, passing the new object a reference to our command-line arguments. The entire MPI program should be contained within the using statement, which guarantees that the MPI environment properly created and disposed. All valid MPI programs must both initialize and finalize the MPI environment. Reference to command-line arguments, args, is ok because some MPI implementations are permitted to use special arguments to pass state information in to the MPI initialization routines.

After MPI environment initialized, then we can create Communicator object.  MPI communicators are the fundamental abstraction that permits communication among different MPI processes, and every non-trivial MPI program will make use of some communicators. Each communicator representations a self-contained communication space for some set of MPI processes. Any of the processes in that communicator can exchange messages with any other process in that communicator, without fear of those messages colliding with any messages being transmitted on a different communicator.

The PI program above is very simple because we used basic skeleton of MPI program to separate the process from rank 0 and other ranks. The skeleton of giving one of the processes (which is often called the "root", and is typically rank 0) a slightly different code path than all of the other processes is relatively common in MPI programs, which often need to perform some coordination or interaction with the user. Then there are two properties of communicators we used in our PI program: the rank of the process within the communicator, which identifies that process (world.Rank), and the size of the communicator, which provides the number of processes in the communicator (world.Size).

Intraommunicator world = Communicator.world;
if (world.Rank == 0)
{
// program for rank 0 (root process)
}
else // not rank 0
{
// program for all other ranks
}

As mentioned before, to parallelize PI program, we used MPI to run several processes, each of which will throw darts independently. Once all of the processes have finished, we'll need to sum up the results. In MPI, there are several parallel reduction operations that combine the values provided by each of the processes into a single value that somehow sums up the results. The most basic reduction operation is the Reduce collective, which combines the values provided by each of the processes and returns the result at the designated root process.  We used world.Reduce() in our PI program, which is reduce collective algorithm, to combine values stored in each process into a single value available at root process (rank 0) :

world.Reduce<int>(dartsInCircle, Operation<int>.Add, 0);

Now we can compile PI program then try to run it using MPIEXEC launcher (inside HPC Pack 2008 or MPICH2). For example we want to run the codes in 10 processes, just type:

MPIEXEC –n 10 ParallelPI.exe

Cheers – RAM