Learning Data Science


Updated: 8 February 2018

It's been over a year since I wrote the original of this article - and much has changed in the world of Data Science. I've decided to update the information from time to time, since it's the most popular I've done - there is clearly a lot of demand and need out there for these topics. I've changed the format to have a table of the areas that a Data Scientist should know, and an example of a way to learn it.

I've also been asked for an article on actually getting a job in Data Science. I thought about putting that information here, but I will create a second article for that alone, as I think it's important to focus on your learning journey, even while searching for a job in the Data Science role.

With a format for learning to be an Amateur Data Scientist established and a firm understanding of how you learn, it’s time to focus on what to learn.

There are no shortages of Internet posts, magazine articles, or college syllabi describing what a “Data Scientist” should know. I originally thought the term was still up for debate – but there are “real” Data Scientists that have formal degrees and years of experience with that official title (my team is full of them, save yours truly). But in my case, I’m building this knowledge outside of a formal degree. Since I have to start somewhere, I’ll extrapolate from these other references to include the knowledge path I need to follow. Feel free to modify to your liking.

NOTE: There’s an absolutely wonderful visual representation of what a Data Scientist should know that you can find here:  http://nirvacana.com/thoughts/becoming-a-data-scientist/ by Swami Chandrasekaran, and I would encourage you to look over his work. What I show here is independent of that grouping, but similar. Of course, he’s using several tools from IBM and I’m using the ones at Microsoft. Pick your stack and learn it well. Want to use Open Source only? Knock yourself out.

ALSO NOTE: I have never liked a “tools approach” to learning. Yes, you’ll need to learn several tools and yes, I often use a tool to learn a thing (like using R to learn Statistics) but I focus on the concepts, not just how the work is done. First learn why you do something, and then shawarma after. So learning concepts first and then choosing a tool is the route I’ll follow here.

Or, you can simply follow a complete course, online. There are several really good ones:

Among many, many others. See the comments below as well for even more.

Of course, if you want to “assemble your own”….

Note: I have biased this list towards things that we've published at Microsoft, although I've included resources for some that aren't. Keep in mind the "Asset" column is simply one of the many places you can go to learn these topics - and I would caution you against using only this list of resources as your only stop for this information. There are lots of fine resources out there, and more being created every day, so I encourage you to do a web search on the "Technology/Concept" items as well as the "Topic" items. In any case, this list will serve you well on researching and learning more about the craft of Data Science. 

 

Order Technology/Concept Topic Asset Type
1 Math - Linear Algebra and College-level Statistics      
    Linear and Matrix Algebra Linear Algebra with Matrix Transforms Course
    Core Statistics Statistics and Probability
Essential Statistics for Data Analysis using Excel
Course
    Bayesian Methodologies in Modeling Bayesian Networks Book
    General AI Mathematics Essential Mathematics for AI Course
2 Team Software Development      
    Agile Agile Methods and Practices Self-Guided
    The Team Data Science Process Primary Documentation
Partner Resource
Self-Guided
    Source Control Version Control Self-Guided
3 IDE's      
    Visual Studio Code Visual Studio Code Site Self-Guided
    PyCharm PyCharm getting Started Course
    RStudio RStudio Learning Course
4 Data Constructs and Data Programming (SQL, Graphs)      
    Algorithms and Data Structures Algorithms and Data Structures Course
    Data Modeling Introduction to Data Modeling Course
    Data programming with SQL Learn SQL
Querying Data with Transact-SQL
Course
    Graph Database Programming Graph programming with the Gremlin API Self-Guided
    NoSQL Systems Introduction to NoSQL Data Solutions Course
5 Exploratory Data Analytics      
    EDA Methods Exploratory Data Analysis Book
6 Advanced Analytics and Business Analytics      
    Data Analytics and Business Intelligence MCSE in Business Intelligence Course
7 Programming Languages used in Data Science (R, Python)      
    R Programming Introduction to R for Data Science

Programming with R for Data Science

Course
    Python Introduction to Python
Introduction to Python for Data Science

Programming with Python for Data Science

Course
8 Big Data Processing Technologies      
    Hadoop/Spark Introduction to Big Data
HDInsight Developer Guide
Processing Big Data on Azure
Spark on HDInisght
Implementing Real-Time Analytics with Hadoop
Implementing Predictive Analytics with Spark
Course
9 Research methods (including hypothesis definition and testing)      
    Research Methods Overview Research Methods Overview
    Hypothesis Testing Hypothesis Testing: Methodology and Limitations Book
10 Data Science Algorithms and Data Analysis Techniques      
    Data Science Data Science Essentials Course
    Machine Learning Introduction to Machine Learning Overview
    Algorithms Choosing the right Estimator
How to choose algorithms for Microsoft Azure Machine Learning
Reinforcement Learning Explained
Self-Guided

Course

    Deep Learning Deep Learning Explained Course
    Artificial Intelligence Introduction to Artificial Intelligence Course
11 AI Model Management and Operationalization      
    Operationalization Building Your Azure Skills Toolkit
Developing Intelligent Apps and Bots
Operationalize analytics with Machine Learning Server
Course

Self-Guided

    Model Management Machine Learning Model Management Resource
12 Domain Expertise (Various Industry Verticals) (Various Industry Verticals)  

References

Here are a few other views on what a Data Scientist should know:


Comments (0)

Skip to main content