Working with the HBase Import and Export Utility

As mentioned in a couple other posts, I am working with a customer to move data between two Hadoop clusters. This includes data in several HBase tables which has led me to make use of the HBase Import and Export utilities. To help others who may have a similar need, I’m going to use this…

0

Exploring the Hive IMPORT and EXPORT Commands

In a recent post, I mentioned I am working with a customer to migrate from a native Hortonworks (HDP) cluster to Azure HDInsight. As part of this work, I’ve had need to use the Hive IMPORT and EXPORT commands as part of the migration of tables from one cluster to another. The Hive IMPORT and…

0

Pushing Data from a Hortonworks Cluster to an Azure HDInsight Cluster

I have a scenario where a customer wishes to explore a move from an existing Hortonworks (HDP) cluster to an Azure HDInsight (HDI) cluster. The customer is interested in the lower administrative overhead of HDInsight’s Platform-as-a-Service offering as well as the ability to scale-out and scale-back cluster resources to match demand, something that’s challenging to…

0

Goofing around with the Cognitive Services Translator API

Recently, I was asked to get familiar with the Translator API in Azure Cognitive Services.  Not being a developer, I was a little leery of how I might approach this, but I was surprise how easy it was to tap into this functionality.  I thought I would share some of what I learned here in hopes that someone…

0

A Fixed-Width Extractor for Azure Data Lake Analytics

I have a fixed-width text file I’d like to use with Azure Data Lake Analytics (ADLA). ADLA reads files using extractors.  As of today, ADLA comes out of the box with three extractors: one for comma-delimited text, another for tab-delimited text and a general purpose extractor for  delimited text.   Examples on GitHub demonstrate the mechanics for writing custom ADLA extractors and include…

0

Data Sources for Business Analysts

I commonly get asked where to find data.  For most Business Analysts, your best data sources are found within your enterprise, i.e. operational data in operational databases ideally transformed for easier, more consistent consumption in a data mart or data warehouse and exhaust data housed in a data lake. Locating these data can be tricky but…

0

Split a Large Row-Formatted Text File using PowerShell

I move a lot of large, row-formatted text files into Azure Storage for work I do with HDInsight and other technologies.  I land these files as block blobs which means that individual files must stick below the 200 GB block blob size limit.  Azure Data Lake Store does not have this limitation and most of…

0

Configuration of HBase on Azure HDInsight as a Drill Data Source

NOTE This post is part of a series on a deployment of Apache Drill on the Azure cloud. In a previous post, I showed how to connect my Azure-deployed Drill cluster to an Azure HDInsight (Hadoop) cluster via Hive.  Azure HDInsight also supports HBase.  In this post, I want to tackle how to get the…

0

Configuration of Hive on Azure HDInsight as a Drill Data Source

NOTE This post is part of a series on a deployment of Apache Drill on the Azure cloud. With my Drill cluster deployed to the Azure cloud, another potential source of data is Azure HDInsight, Microsoft’s managed Hadoop offering.  HDInsight makes use of WASB storage so that if the structure of my data was pretty…

0

Enabling SSL Encryption on the Drill Web Console

NOTE This is a continuation of my series on the deployment of Apache Drill on Azure. That said, there is nothing in this post that is specific to an Azure deployment though I will assume you are familiar with my topology when I reference servers by name. In a previous post, I walked through the…

0