This is a guest post by Sriram Rao, Principal Scientist Lead in the Cloud and Information Services Lab (CISL) at Microsoft. His current research interests are in building storage and compute infrastructure for processing big data. Some of the results of his research have been released as part of two open source projects that he started: (1) Sailfish, a compute infrastructure for big data, and (2) Kosmos file system (KFS), a distributed file system for storing massive amounts of data. KFS is currently deployed in a production setting, on the scale of 1000’s of nodes, where it is used to manage multiple petabytes of data.
Here is his post:
About six years ago, long before "big data" was a common term, I was fortunate to be involved in the conception, design, and implementation of the Kosmos file system (KFS). Originally conceived and developed at Kosmix (now Walmart Labs) in mid-2007, I took the lead in releasing the system we had built as an open source project. I then joined Quantcast to begin the journey towards getting KFS production-ready. We adopted KFS in 2008 as the file system to store their treasure trove of logs. Over the next couple of years, we made substantial improvements to KFS in terms of both functionality and performance. KFS slowly became the backbone of the massive amount of data processing that was happening at Quantcast. Starting at 100TB per day in 2008, the daily processing volume grew to about 600TB per day in mid-2010 (much of the data was flowing through KFS). In keeping with the spirit of open source, the changes we made to KFS were committed back to the KFS repository. Quantcast continued to extend the codebase and build a file system reliable enough for its main data processing. In 2011, they fully committed to it by migrating all their production data and turning off HDFS. The volume of data it’s handling now—to the tune of 20PB/day—is truly staggering.
Rechristened the Quantcast File System (QFS), KFS has now been released an an open source project. I am thrilled that the file system the Quantcast team and I built has reached this level of maturity. As we discovered, achieving production reliability at a multi-petabyte scale on thousands of nodes is a long, difficult effort. It is very satisfying to be part of a project that adds this much value and has shining track record in a tough production environment. I’m delighted the broader community will be able to benefit and continue to evolve it.
On a personal note, the KFS journey has been very rewarding experience. A system that we started with a modest objective of running on a 32-node cluster and then subsequently scaling it to run on a 1000-node cluster while handling multiple petabytes of storage is far beyond my wildest dreams for the project. I would like to thank Kosmix Corp. for giving me the opportunity to start KFS and release it as an open source project. I would also like to thank Quantcast for adopting KFS and encouraging us to substantially scale the system. It is great that someone has taken my work and built on it. I am even more excited that they have chosen to share the results of their work with all involved.
For more information about QFS, see the following articles: