Debugging, Profiling and Analyzing Parallel Applications

Any time a programming model is introduced, developers need robust tooling support for learning, writing, debugging and optimizing their code to make use of it.  This is particularly true for parallel programming, which adds a set of new variables to the equation.


Visual Studio 2010 has made great strides in the parallel debugging experience.  Many features are also available as add-ins for Visual Studio 2008.  Here’s a brief tour of the parallel programming, debugging, and diagnostic features available in Visual Studio 2008 and upcoming in Visual Studio 2010.



Although Visual Studio 2005 had a simple built-in debugger for MPI programs, it did not provide a full “F5” experience.  The new add-in for Visual Studio 2008, which is also integrated into Visual Studio 2010, allows you to select a cluster head node, how many cores you want, and hit F5 to debug your MPI program. 


Debugging MPI programs


In addition to the great core work that the debugger team has done, Allinea, a leader in parallel debugging technologies, has ported their environment to Visual Studio.  Allinea’s add-in enables even further streamlined MPI-specific debugging, including rank based context switching, group-wise step, pause, and run, parallel stack view, and lamination.  Below is Allinea’s MPI debugging environment:




Service Oriented Architecture Debugging

One of the key new programming models introduced in Windows HPC Server 2008 was Cluster SOA, built on WCF with advanced scheduling and load balancing provided by HPC’s scheduler/broker.  Up until now, debugging Cluster SOA was limited to basic WCF/.Net style debugging with no cluster integration.  In Visual Studio 2010, an add-in for Cluster SOA enables the SOA Settings tab, allowing you to choose a head node, debug nodes and services, deploy runtime libraries and clean up automatically. Here’s a peak at the new SOA debugger in Visual Studio 2010:


SOA Debugging 



Integrated MPI-aware profiling was not available in Windows Server HPC 1.0.  With Windows HPC Server 2008, tools such as XPerf enabled MPI profiling as well as system-level profiling and troubleshooting.  But even XPerf really didn’t know much about the details of MPI message traffic, and no message traffic viewers existed.  Since then, Vampir, the premier MPI message traffic viewer, has been ported to Windows and fully integrated with ETW.  Vampir allows you to troubleshoot message ordering and delays.  Various open source HPC tools are available as well, such as JumpShot, a free Java-based MPI message viewer.


Often times, the built-in VS Profiler can offer insight into performance issues.  In Visual Studio 2010, this capability has been fully integrated with the HPC job scheduler to help analyze the behavior of a particular MPI rank or node.  The Visual Studio MPI profiler shows line-level profile information, including a temperature view of execution, side-by-side with source view:


Visual Studio MPI Profiler 


The profiler also shows a comparison report across multiple runs or builds so you can easily see the effect of your changes.


 Comparison Report


MPI Runtime Analysis

Beyond debuggers and profilers, sometimes you need specialized analysis tools to help with the complexities of large scale parallel programs.  HLRS/ZIH at Stuttgart, a leading institute in Germany, has ported Marmot, their dedicated MPI analysis tool, to Visual Studio 2008.  Marmot can be used to check the validity of parameters passed to MPI calls and detect irreproducibility, deadlocks, and incorrect management of resources.  Below is Marmot in action:




From Printf to Integrated Profiling and Debugging

In a world where printf-style debugging was the norm not long ago, state-of-the-art debugging and profiling tools have taken a major step forward. 


From within Visual Studio, you can debug and profile native as well has high performance MPI and Cluster SOA applications that scale from hundreds to thousands of cores.  You can use XPerf and ETW to get a truly holistic view of the application in the context of the whole system.  The new multi-core profiling and debugging tools that were introduced in Visual Studio 2010 can be effectively used on a cluster at the node-level as well.  


Visual Studio is becoming a rich and productive environment for writing parallel programs of all types.  To find out more about Windows HPC programming models, visit the Windows HPC Server Developer Resource Center.  You can find a suite of samples that use various parallel programming models on the CodePlex Parallel Dwarfs site.



Comments (8)

  1. rammohan says:

    images are not getting downloaded

  2. phuff says:

    Rammohan, can you be more specific? The images aren’t showing up in your browser?  In your RSS feed?

    Polita Paulus

    Developer Division


  3. Goutam says:

    The images are not getting downloaded in the browser except the profiling image. I am using IE 6.0 SP2

  4. Thejas says:

    Installed the new MP1 add-in for Visual Studio 2008. For some reason it didnt work for me. And I wanted to uninstall this add-in and guess what I had to uninstall the entire VS 2008 itself. There is no uninstall provision. For some reason Add-In manager didnt show this, There is no good documentation in msdn that helps how to fix/remove and add-in

  5. Hi Thejas,

    There are two easy ways to uninstall the feature:

    1. From the control panel, under programs and features if you click on the "Microsoft MPI Cluster Debugger Launch" item a "Uninstall" button will appear above the list.  Click "Uninstall" to remove the feature.

    2. Run the installer again.  This will detect the presence of the MPI Cluster Debugger Launch feature and offer to repair or remove the feature.  Select the option to remove and press continue to remove the feature.

    If you tried the above and they did not work for you, I would love to understand your setup in greater detail so we can get this fixed.

    Best regards,

    Robert Palmer

  6. phuff says:

    Goutam, this post has lots of large images, so it’s more likely to take a while to download or timeout on image download over slow connections than other posts.  The profiling image is the smallest of the images, so this seems consistent.  Are you connected via a fast connection?  Do you have bandwidth limits?

    Polita Paulus

    Developer Division


  7. calcium says:

    Beowulf systems, and other proprietary approaches, are placing systems with four or more CPUs in the hands of many researchers and commercial users. In the near future, systems with hundreds of CPUs will become commonly available, with some programmers dealing with tens of thousands of CPUs. The debugging methods used on these systems are a combination of the traditional methods used for debugging single processes and ad-hoc methods to help the user cope with the multitudes of processes. Programmers are usually familiar with a single-process debugger and would like to use it (with minimal user-visible extensions) to debug their distributed program.