What is a data scientist?

Article
01/31/2013

Is data science the new rock and roll? What does a data scientist actually do? We asked our own friendly data scientist Kenji Takeda to break it down for us.

There are many definitions of a ‘data scientist’. But it’s clear that it’s something to do with ‘big data’ and that they’re cool. Harvard Business Review called it the “sexiest job of the 21^st century” and McKinsey reckons the USA alone needs 140,000-190,000 of them by 2019.

Data has always been at the heart of IT. (Of course! Where else?) But in the past few years, it’s been exploding. The world’s per-capita capacity to store information has doubled roughly every 40 months since the 1980s.

But it’s not just volume, it’s also the variety of data types plus unstructured data and the velocity of data acquisition that are increasing.

For example, social networking sites see millions of interactions every minute. Businesses track website visitor and sales metrics in microscopic detail. Aircraft engine manufacturers track hundreds of parameters every second in real time on all their engines to spot potential problems before they happen. The Large Hadron Collider collects untold terabytes of data every day.

So companies need to do a lot with their data: gather, collate, store, transform, clean, analyse, explore, visualise, share and discover. The people who help organisations do this are data scientists. They turn data into products, insights and stories by adding value to raw information.

However what they deliver isn’t the same was what they do. Kenji says that the role is still protean. “I worry that the Data Scientist role is like the mythical “webmaster” of the 90s: master of all trades,” according to Aaron Kimball, CTO at Wibidata. Back then webmasters were the guys (and they were mostly guys) who installed the software, built the website, wrote the copy and marketed the site. They were high priests of a black art.

Just as the ‘web master’ did a number of different things, so a data scientist has different functions. According to Kenji, they include:

Data engineer. Operating at a low level close to the data, they are people who write the code that handles data and moves it around. They may have some machine learning background. Large companies may have teams of them in-house or they may look to third party specialists to do the work.
Data analyst. This is someone who knows statistics. They may know programming or they may be an Excel wizard. Either way, they can build models based on low-level data. They eat and drink numbers that are related to their work but they’re not interested in data as an abstract concept. Most importantly, they know which questions to ask of the data. Every company will have lots of these.
Data stewards. Thesepeople think about managing and preserving data. They are information specialists, archivists, librarians and compliance officers. This is an important role. If data has value, you want someone to manage it, make it discoverable, look after it and make sure it remains usable.

Over time the job of the web master gave way to web teams with differentiated, clearly-defined roles. Kenji expects the same thing to happen with data scientists.

However the role evolves, there’s no question that ‘big data’ will be very disruptive bringing new business opportunities and new competition. “We can swim in the data and find lots of hypotheses,” says Kenji. But the swimmers, the pool builders and the lifeguards are all going to be data scientists.

By Dr Kenji Takeda is Solutions Architect and Technical Manager for the Microsoft Research Connections EMEA team.

What is a data scientist?

Additional resources