Knowing Your Data

Once of my favorite things about working here at Microsoft is that from time to time we get to hear a presentation from customers using our software and what they are doing with it. Yesterday we heard from a lead astronomer from Johns Hopkins University, Alexander Szalay, who talked about the various "Peta" projects they have going on SQL Server 2008. Some of the custom designs they have on SQL Server are storing a Petabyte of data a year - that's a staggering number! They worked with Jim Gray on this project, and it was extremely interesting to see what they are doing with the data there. And it's open to the public - you can even aid in the astronmic research - you can read more about it here: 

He also spoke about the medical side of the University. They are also storing Petabytes of data and allowing the cancer researchers to manipulate huge sets of data on the servers, real time.

He made a very intriguing quote during his talk that stuck with me. He was discussing how the science he was working with had progressed, and how the scientists worked. He said: "It isn't enough any more for the scientists just to know their science - they have to know their data". He stated that because the study had moved away from just gathering the data and the observational methods, to understanding the patterns in the data.

