Data Wrangling – ELT not ETL

In Data Science, we most often use Extract, Load and Transform (ELT) as opposed to Extract, Transform and Load (ETL) that you see most often in Business Intelligence (BI). There are a couple of reasons for this. First, in many BI solutions you have a few data sources that you are integrating into a historical…

0

But *Why* Do You Trust Your Data?

At the beginning of every data project is the data. While we spend a great deal of time figuring out how to move it, store it, compute it and evaluate it, the most important step is often given short shrift – sourcing the data properly. And that involves two things: Finding authoritative data and knowing…

0

Databas(ics)

The beginnings of data science is data. Data are things that you know about, well, other things, so it makes sense to ensure you have a firm grasp on handling that data. Note: I know this seems really is basic, but stick with me – it gets deep quick, and it’s essential to understand this…

0

Public Data Sources – finding them and using them

A quick post here on some valuable data sources you can use in your HDInsight, Microsoft Excel, SQL Server, APS, and other products to enrich your data. Sometimes it’s helpful just to peruse through various sources to see what you can put together to gain more insight and answers. If you know of other sources,…

2

The danger of predictive analytics

 As a technical professional working in the data field, I implement small and large-scale data systems every day. Some of those systems are used to record transactional data for immediate processing, others record the data in large collections for later historical analysis. And many of the systems I now design have the goal of predictive…

0

The Data Science Laboratory Series is Complete

I wrote a series of articles on creating a Data Science Laboratory over on Simple-Talk – you can find the complete list of articles below. The series covers installing various software tools and packages on a Virtual Machine running the Windows operating system. I think there’s no substitute for installing, configuring and experimenting with various…

1

Data Science and the Cloud

More than perhaps any other computing discipline, Data Science lends itself best to Cloud Computing in general, and Windows Azure in specific. That’s a big claim, but before I offer some evidence, I need to explain what I mean by “Data Science”. I’ve written before on Data Science (http://blogs.msdn.com/b/buckwoody/archive/2012/10/16/is-data-science-science.aspx, and https://www.simple-talk.com/cloud/data-science/data-science-laboratory-system—keyvalue-pair-systems/ ), but since it’s an…

1

How Does the Cloud Change a Developer’s Job?

I’ve recently posted a blog on how cloud computing would change the Systems Architect’s role in an organization, another on how the cloud changes a Database Administrator’s job, and the last post dealt with the Systems Administrator. In this post I’ll cover the changes facing the Software Developer when using the cloud. The software developer…

0

How Does the Cloud Change a Database Administrator’s Job?

I recently posted a blog entry on how cloud computing would change the Systems Architect’s role in an organization. In a way, the Systems Architect has the easiest transition to a new way of using computing technologies. In fact, that’s actually part of the job description. I mentioned that a Systems Architect has three primary vectors…

6

How Does the Cloud Change a Systems Architect’s Job?

I know – I said I didn’t like the “cloud” term, but my better-phrased “Distributed Systems” moniker just never took off like I had hoped. So I’ll stick with the “c” word for now, at least until the search engines catch up with my more accurate term. I thought I might spend a little time…

1