DevOps for Data Science – DevOps Maturity

In this series on DevOps for Data Science, I’ve explained what DevOps is, and given you lots of resources to go learn more about it. Now we can get to the details of implementing DevOps in your Data Science Projects. Consider that the standard Software Development Lifecycle (SDLC) with Data Science algorithms or API’s added…

0

The Keys to Effective Data Science Projects – Part 10: Project Close-Out with the TDSP

Data Science projects have a lot in common with other IT projects in general, and with Business Intelligence in particular. There are differences, however, and I’ve covered those for you here in this series on The Keys to Effective Data Science Projects. One of those areas where general projects and Data Science projects are similar…

0

Making a Data-Driven Decision

We make decisions all the time. Most are simple – diet or regular, paper or plastic, beach or mountains. But some decisions are complicated – and not because of the choice – but because of all the things you choose against when you make that choice. And in fact, the more choices we have, the…

0

I can’t hear you over the sound of how small your fonts are

I’ve had it. I sat through *another* presentation where the screen fonts and icons were so small I couldn’t tell what was going on. No, it wasn’t a Microsoft presentation, it was on a Linux box. But presenters from colleges to conferences routinely get poor marks because THEY DON’T MAKE THE SCREEN ELEMENTS BIG ENOUGH…

5

Public Data Sources – finding them and using them

A quick post here on some valuable data sources you can use in your HDInsight, Microsoft Excel, SQL Server, APS, and other products to enrich your data. Sometimes it’s helpful just to peruse through various sources to see what you can put together to gain more insight and answers. If you know of other sources,…

2

The danger of predictive analytics

 As a technical professional working in the data field, I implement small and large-scale data systems every day. Some of those systems are used to record transactional data for immediate processing, others record the data in large collections for later historical analysis. And many of the systems I now design have the goal of predictive…

0

A Writer’s Toolkit

Even in a technical role, communication is paramount. If you don’t read widely and communicate well, your progression suffers. And one of the primary modes of communication is writing. I’ve written several books, articles and blogs (see “Publications” on this page), but it isn’t just writing books that requires clear communication skills. You should learn to write well for something as trivial…

2

The Case for Moving From TPC to Database Throughput Units in Database Performance Comparisons

Scientific testing is based on controls, transparency, and repeatability. Whenever we as technical professionals want to test the performance of a database system, we search for a series of tests that show the system’s metrics against a standard. But the scientific basis for using the most common standard, the Transaction Performance Council (TPC) measurements (http://www.tpc.org/),…

8

New Content for Microsoft Azure’s HDInsight for March 2014

New HDInsight Content released in March: Avro SDK  – http://msdn.microsoft.com/en-us/library/dn469975.aspx Getting started with HDInsight (YouTube video) –  https://www.youtube.com/playlist?list=PLDrz-Fkcb9WWdY-Yp6D4fTC1ll_3lU-QS                   Monitor HDInsight clusters using the Ambari API – http://azure.microsoft.com/en-us/documentation/articles/hdinsight-monitor-use-ambari-api/ Analyze flight delay data using HDInsight – http://azure.microsoft.com/en-us/documentation/articles/hdinsight-analyze-flight-delay-data/ Analyze Twitter data using HDInsight – http://azure.microsoft.com/en-us/documentation/articles/hdinsight-analyze-twitter-data/ Use Oozie with HDInsight – http://azure.microsoft.com/en-us/documentation/articles/hdinsight-use-oozie/ Use time-based Oozie coordinator with HDInsight -…

0

The Data Science Laboratory Series is Complete

I wrote a series of articles on creating a Data Science Laboratory over on Simple-Talk – you can find the complete list of articles below. The series covers installing various software tools and packages on a Virtual Machine running the Windows operating system. I think there’s no substitute for installing, configuring and experimenting with various…

1