Data Sources for Business Analysts

I commonly get asked where to find data.  For most Business Analysts, your best data sources are found within your enterprise, i.e. operational data in operational databases ideally transformed for easier, more consistent consumption in a data mart or data warehouse and exhaust data housed in a data lake. Locating these data can be tricky but…

0

Split a Large Row-Formatted Text File using PowerShell

I move a lot of large, row-formatted text files into Azure Storage for work I do with HDInsight and other technologies.  I land these files as block blobs which means that individual files must stick below the 200 GB block blob size limit.  Azure Data Lake Store does not have this limitation and most of…

0

Configuration of HBase on Azure HDInsight as a Drill Data Source

NOTE This post is part of a series on a deployment of Apache Drill on the Azure cloud. In a previous post, I showed how to connect my Azure-deployed Drill cluster to an Azure HDInsight (Hadoop) cluster via Hive.  Azure HDInsight also supports HBase.  In this post, I want to tackle how to get the…

0

Configuration of Hive on Azure HDInsight as a Drill Data Source

NOTE This post is part of a series on a deployment of Apache Drill on the Azure cloud. With my Drill cluster deployed to the Azure cloud, another potential source of data is Azure HDInsight, Microsoft’s managed Hadoop offering.  HDInsight makes use of WASB storage so that if the structure of my data was pretty…

0

Enabling SSL Encryption on the Drill Web Console

NOTE This is a continuation of my series on the deployment of Apache Drill on Azure. That said, there is nothing in this post that is specific to an Azure deployment though I will assume you are familiar with my topology when I reference servers by name. In a previous post, I walked through the…

0

Setting up Basic User Authentication in Drill

NOTE This is a continuation of my series on the deployment of Apache Drill on Azure. That said, there is nothing in this post that is specific to an Azure deployment though I will assume you are familiar with my topology when I reference servers by name. With my Drill cluster deployed, I want to…

0

A Script for Replicating Database Backup Files to the Azure Cloud

Let’s say we have a backup process that creates backup files on a file server.  For disaster recovery purposes, I need to get these files offsite in a timely manner.  I have decent bandwidth out of my data centers so that I have decided to put these files into low-cost Azure storage. NOTE This script is provided for…

0

Connecting to the Drill Cluster from a Client App

NOTE This post is part of a series on a deployment of Apache Drill on the Azure cloud. Drill supports two kinds of client connections: Direct Drillbit Connection ZooKeeper Quorum Connection When I read the Drill documentation, it seems the ZooKeeper Quorum Connection (aka a Random Drillbit Connection) is the preferred connection type. With this connection type,…

0

Connect Drill to an Azure SQL Database

NOTE This post is part of a series on a deployment of Apache Drill on the Azure cloud. The intent of Apache Drill is to make it easy for you to query across a wide-range of relational and NoSQL data stores.  If you are running Drill in Azure, you are likely leveraging Azure SQL Database…

0

Configuration of Azure Blob Storage (aka WASB) as a Drill Data Source

NOTE This post is part of a series on a deployment of Apache Drill on the Azure cloud. Azure Storage Blobs – aka WASB after the WASB in Hadoop for accessing it – provide a low-cost means to store files in Azure. As Drill is commonly used to query data residing in file systems, it would…

0