There are two defining difference between Data Science (or Data Mining for that matter) and other types of data analysis: The first is how far back you push the data analysis, and the other is the multiple processes and tools you’ll use within the analysis. In this post I’ll explain the first difference. In a…

## Can Data Science Cure Creeping Determinism?

Hindsight, it is said, has 20/20 vision. We seem to be able to predict the past flawlessly – or can we? The answer is surprisingly “no”. “Creeping Determinism” is phrase from 1970’s psychology. It’s the effect of thinking that something was predictable, but only after it happens. We look back and say – “Ah –…

## Knowing Which Statistical Formula to Use

In a previous Notebook entry, I showed you where you can learn Statistics. It’s one of the base skills you need to know if you’re going to work with Data Science. But many times students know the process of using a statistical formula (more accurately called a “test”) and not necessarily when they should use that formula – or formulas….

## Learning Statistics

This is a “Catalog” entry, which is used in Scientific Notebooks to list out a species name and data about that species. In this Notebook, I’ll use that to show a list entry of things you need to know. This entry is about Statistics, and where you can go to learn it. I’ll start with…

## Reducing (although sadly not eliminating) bias in sample gathering

To obtain the data for the analysis a Data Scientist needs to work with, there are two options: you can get all the data (called a population or “X”) or a subset of the data (called a sample, or “x”). Most of the time the information you need to perform analysis is too large to…

## Microsoft R

What, Why, How One of the most distinctive features of Data Science, as opposed to working with databases, Business Intelligence or other data professions, is its heavy use of statistical methods. At the first appearance of computing science, programs and algorithms were created to deal with the large amounts of calculations required in statistics. One of…

## Descriptive Statistics – Initial Evaluation of the Data

The most important part of data analysis is a thorough understanding of the data we’re looking at. Once we’ve verified what the source of the data actually means and that we can trust it, we need to do some simple visualizations and calculations to see what it means. I find that using even basic descriptions is…