SSIS Catalog and Project Deployment with PowerShell

This may be my shortest blog post ever as I get ready to sign off from work for the next three weeks. But before I do, I wanted to share a quick script to automate deployment for SSIS 2012 (and 2014). I can’t take full credit for this script as the foundation was taken from…

4

Introduction to Apache Storm

The Apache Storm project delivers a platform for real-time distributed (complex event) processing across extremely large volume, high velocity data sets. By providing a simple, easy-to-use abstraction, Storm enables real-time analytics, online machine learning and operational/ETL scenarios that have previously been non-trivial to implement. In this post we will familiarize ourselves with the Storm platform, its…

4

Using #PolyBase in #SQLServer2016

It’s been a few weeks since the numerous Build and Ignite announcements ushered in the latest and greatest, SQL Server 2016. After having some time to soak it up (aka I’ve been too busy to blog) we will dive into some of features and capabilities I find most interesting. Poly-what? While there are many new…

3

Ooooh I’m Telling: Doing Swear Word Analysis with Storm on HDInsight

As promised, this is the first of three (maybe more) posts that will present an end-to-end example to showcase the distributed streaming capabilities of the Apache Storm project. This first post will provide an introduction to the project and an overview of all the moving pieces. Please note that I will not be getting into…

3

Something’s Brewing with Azure Data Factory – Part 3

In the first two parts of this blog series (HERE and HERE), we used Azure Data Factory to load Beer review data from an Azure SQL Database to an Azure Blob Storage account. We then processed that data using HDInsight and the Mahout Machine Learning Library to generate user-based recommendations. In this final post, we…

3

Indexes & Views in #Hive

In my last Hive post, we introduced partitions and bucketing both of which allow you to horizontally slice data to make it more manageable and easy to query. Staying the course in this post we will introduce two more techniques to improve your experience in Hive through the use of indexes and views. Indexes In…

2

Building an Azure ML SSIS Task

In several previous blog posts (HERE and HERE), I’ve introduced and discussed the Azure Machine Learning service, its features, benefits and general capabilities. Since that time I have been toying with the idea of a building a custom SSIS Task to integrate Azure ML into SSIS. My vision of the project is pretty simple and…

2

Partitions & Buckets in #Hive

In my previous post, we discussed the map, array and struct data types and their implementation in Hive. Continuing on the Hive theme, this post will introduce partitioning and bucketing as  method for segmenting large data sets to improve query performance. Partitions If you have previous experience working in the relational database world then inevitably…

2

Programmatically Executing SSIS Packages

While working on the next iteration of my SSIS ETL Framework, I’ve discovered that the capabilities of the out-of-the-box Execute Package task are quite lacking. Luckily, with SQL Server 2012, it has never been easier to execute SSIS packages programmatically. In this post, we will look at two different options for executing SSIS packages from…

1

Automating Update of Azure-Powershell

Just a quick post to share a useful script. The PowerShell script below will download and update the Azure-PowerShell command-lets to the latest and greatest version. It even does a slick little version compare. I’ll put the disclaimer out there, that I an not the original author of this script and unfortunately I’ve lost the…

1