Congratulations to Alex Szalay and his amazing team at JHU for winning the SC’08 Storage Challenge – with the entry GrayWulf:Scalable Clustered Architecture for Data Intensive Computing. GrayWulf – is implemented with SQL Server 2008
Data intensive computing presents a significant challenge for traditional supercomputing architectures that maximize FLOPS since CPU speed has surpassed IO capabilities of HPC systems and BeoWulf clusters. We present the architecture for a three tier commodity component cluster designed for a range of data intensive computations operating on petascale data sets. The design goal is a balanced system in terms of IO performance and memory size, according to Amdahl’s Laws. GrayWulf pays tribute to Jim Gray who stimulated the system and its design. The hardware currently installed at JHU exceeds one petabyte of storage and has 0.5 bytes/sec of I/O and 1 byte of memory for each CPU cycle. The GrayWulf provides almost an order of magnitude better balance than existing systems. Our benchmarks are based on date from the petascale Pan-STARRS project, building the largest sky survey to date. The benchmarks involve sequential searches over hundreds of terabytes.