Time To Mine the Data

Or is it “mind” the data? Last week I read a blog post by Stacey Armstrong (CS News – Video Game Data Mining) that linked to an article about how game companies collect and use the data they collect while people play their games online. (Video Game Data Mining) This is going to be a discussion topic for his computer classes this year. And a great one it is. As we discuss ethical issues in computing how companies collect and use data is an important topic.

I knew that game developers collected and used game play data during testing and development of games already. This has been a topic at a number of conferences (Foundations of Digital Games (FDG) Conference Series) I have attended. By looking at the data they can determine if part of a game is too difficult or too easy. It turns out both upset gamers. They can evaluate mazes, characters, and pretty much any aspect of the game play. This helps contribute to making games better. I hadn’t heard about studying data once a game was released though. It does make sense of course. There are also privacy concerns as there are with almost all data collection these days. The data can be collected without saving personally identifiable information and stored safely to protect the users. This has to be done deliberately and carefully of course. Not long ago Google came under some serious criticism for collecting too much and too detailed information about wireless access points for example. (Google: Oops, we spied on your Wi-Fi) Even with good intentions too much data, especially in the wrong hands or with improper protections and security, can become a real problem.

As a society we are collecting more and more data all the time. Terms like “data mining” and “business intelligence” are becoming part of the vocabulary of business schools, marketing courses, MBA programs and pretty much though out industry. Scientists in all fields are also swimming in huge data sets with amazing potential. Computer scientists are the ones who are going to be the ones making since of all this data possible. We’re going to have to understand the technical aspects of all this for sure. But we also need to make sure that the computer scientists working on all this data are aware of the ethical considerations as well. Ethics is not something people can outsource or depend on others to decide for them. Many times it takes a greater technical background to help understand what the possibilities are than non-computer scientists can be expected to be aware of. Computer scientists will often have to explain the complexities and risks of various data collections. Likewise computer scientists will sometimes be dependent on experts in other fields for better understanding of what specific pieces of data really mean.

For all of these to work though there has to be a common ethics vocabulary. This is just one reason that all fields need to have some aspects of ethics discussion and training these days. And why it has to start young.