Microsoft Research Team wins the MinuteSort Challenge

Article
05/25/2012

Check this out for a cool last minute paper for High School classes.

Sorting data over a network using a new approach developed by a Microsoft Research Team. To see the results go to the Sort Benchmark Home Page.

This impacts the search results in Bing, since the code generated will likely be put into Bing, although I have no input from anyone that this is the case.

From a quick review of the Distributed System papers I think the paper that describes the posible process is this paper:

John R. Douceur, Jeremy Elson, Jon Howell, and Jacob R. Lorch, The Utility Coprocessor: Massively Parallel Computation from the Coffee Shop, in Proceedings of the 2010 USENIX Annual Technical Conference, Association for Computing Machinery, Inc., 22 June 2010

https://research.microsoft.com/en-us/um/siliconvalley/projects/sortbenchmark/dmsort.pdf

I could be wrong, and I am just guessing. But seriously why would you expect any better, let me know if you figure something else out.

The rules for this contest are:

All the sort benchmarks share the following ground rules:

Must sort to and from operating system files on secondary storage.
No raw disk usage allowed since we are trying to test the IO subsystem.
File or device striping (RAID 0) are allowed (encouraged) to get bandwidth. If file striping is used then the concatenated files must form a sorted file.
The output file must be created as part of the sort.
Time includes the launching of the sort program.
The sort input records must be 100 bytes in length, with the first 10 bytes being a random key.
Use the gensortrecord generator to create the input records.
The sort output file must be validated for correct key order and checksum.
The hardware used should be commercially available (off-the-shelf), and unmodified (e.g. no processor over or under clocking).

Microsoft Research Team wins the MinuteSort Challenge

Additional resources