Dealing with Data: Defining the Components to Tune

I've been reading a fascinating article about the Large Hadron Collider, or LHC facility. It's a scientific research facility that houses a particle collider, which generates an incredible amount of data. Their original plan was to stream the data to tape, then sending the data to "islands" closer to the users, offloading the network as quickly as possible. But they found that the network could handle the streaming better than they thought - so they now stream the data directly to the users, saturating the network. It's a new way of thinking about moving the data around.

Another interesting data concept is that they filter it before they store it. We're not talking trivial reductions here - they are filtering a petabyte (PB) of data a second to a gigabyte per second! That's incredible. In fact, an overwhelming majority of the CPU power there doesn't go to computing numbers and so on in the scientific exercises - it's used to filter the data.

Most of us concern ourselves with data storage. We fret over space, the cost of drives, and backing up. But the LHC staff deals with that as well - but they are more concerned with network and CPU. To be sure, their data profile is different than yours or mine - but there are still things we can learn from their efforts. You can read the whole article yourself here:

Skip to main content