So what does super-computing on Windows look like?

If you are the more visual type, check-out the case-study video at https://www.microsoft.com/hpc (direct WMV download link = https://www.rocketscientism.com/MS_NCSA/NCSA_FINAL.wmv).  

We're very excited to share with you the news of a major achievement for the Microsoft’s HPC program.  The HPC Performance Team recently completed a Top500 run on the fastest Linux HPC cluster at National Center for Supercomputing Applications (NCSA).   This specific cluster had previously established a Top500 entry at 62.7 Tflops and 70% efficiency ranking #14 on November 2007 Top500 list.  NCSA offered to submit a Windows benchmark entry using the same cluster for the June 2008 Top500 list.

The HPC Performance Team worked with the NCSA staff to deploy the April CTP build of Windows HPC Server 2008 on 1184 nodes and achieved the Linpack benchmark result of 68.48 TFlops and 77.7% efficiency on 9472 cores.  This blows-away many expectations and catapults Windows HPC Server 2008 into a new realm of commodity platform supercomputing.  The result places the pre-release build of HPC Server 2008 at #23 in the June 2008 Top500.   Quite a substantial improvement over the November 2007 Linux entry on the same cluster.

In a separate independent benchmark activity, Aachen University deployed a 262 node Windows HPC Server 2008 cluster and achieved 18.81 TFlops with 76.5% efficiency, placing Aachen at #100 on the Top500 list. 

Even more exciting is the capability that Windows HPC Server 2008 demonstrated during these runs:

  • The 1184 node cluster at NCSA was deployed in 4 hours from bare metal to running Linpack!  Meanwhile, our internal 256 node Rainier test-cluster is being deployed daily from bare metal in under 1 hour.
  • The job scheduler took 35 seconds to allocate resources, authenticate user, deploy binaries and start Linpack run on 9472 cores.  The feedback from Aachen was that “it takes >5 minutes to start a job on Linux on 2000 cores”.
  • MPI over the new Network Direct RDMA technology achieved 77.7% on the Infiniband-connected NCSA cluster, compared to 70% on Linux, allowing us to deliver 10% more effective computational performance. 
  • During the NCSA run, 831,816 unique process pairs were created, 921 million messages and 53 TB of data exchanged without a single failure!

Watch for a heat-map video of the NCSA Top500 benchmark run coming soon on The HPC Show.