Loading data in HBase Tables on HDInsight using bult-in ImportTsv utility

Apache HBase can give random access to very large tables– billions of rows X millions of columns. But the question is how do you upload that kind of data in the Hbase tables in the first place? HBase includes several methods of loading data into tables. The most straightforward method is to either use the…

5

Some Commonly Used Yarn Memory Settings

We were recently working on an out of memory issue that was occurring with certain workloads on HDInsight clusters. I thought it might be a good time to write on this topic based on all the current experience troubleshooting some memory issues. There are a few memory settings that can be tuned to suit your specific…


How to use HBase Java API with HDInsight HBase cluster, part 1

Recently we worked with a customer, who was trying to use HBase Java API to interact with an HDInsight HBase cluster. Having worked with the customer and trying to follow our existing documentations here and here, we realized that it may be helpful if we clarify a few things around HBase JAVA API connectivity to…

1

How to use parameter substitution with Pig Latin and PowerShell

When running Pig in a production environment, you’ll likely have one or more Pig Latin scripts that run on a recurring basis (daily, weekly, monthly, etc.) that need to locate their input data based on when or where they are run. For example, you may have a Pig job that performs daily log ingestion by…

1

HDInsight: – Creating, Deploying and Executing Pig UDF

  During my developer experience, I always look for how customization (write my own processing) can be done if functionality is not available in programming language. That thought was triggered again when I was working on Apache Pig in HDInsight. So I started researching it and thought it would be good to share. In this…

0

How to use a Custom JSON Serde with Microsoft Azure HDInsight

I had a recent need to parse JSON files using Hive. There were a couple of options that I could use. One is using native Hive JSON function such as get_json_object and the other is to use a JSON Serde to parse JSON objects containing nested elements with lesser code. I decided to go with the…


Some Frequently Asked Questions on Microsoft Azure HDInsight

  We have seen some common questions on HDInsight when interacting with customers and partners. On this blog post, we are going to help answer some of those common questions. 1. What is Microsoft Azure HDInsight? HDInsight is a Hadoop-based service from Microsoft that brings a 100 percent Apache Hadoop solution to the cloud. Through deep…


HDInsight News – New Videos to watch – HDInsight Provisioning demonstrations

Check out these two recent videos demos regarding HDInsight provisioning These videos complement the product documentation outlined at http://azure.microsoft.com/en-us/documentation/articles/hdinsight-get-started/#provision HDInsight is the name given to the Microsoft Azure service (in the Microsoft cloud data centers) running the Hortonworks Data Platform distribution of Apache Hadoop on Microsoft Windows. Provisioning is the word we use to describe the…


HDInsight: – backup and restore hive table

Introduction My name is Sudhir Rawat and I work on the Microsoft HDInsight support team. In this blog I am going to explain the options for backing up and restoring a Hive table on HDInsight. The general recommendation is to store hive metadata on SQL Azure during provisioning the cluster. Sometimes, we may have many…

1

Start using flume with HDInsight by installing HDP 2.0 on Windows Azure Virtual Machine

After reading Greg’s article Using apache flume with HDInsight I wanted to start to learn more about flume, but my Linux skills are none existent and currently flume is not included in HDInsight. For more information on HDInsight see Windows Azure HDInsight. For more information on Apache Flume see Apache Flume. So, I decided to…

2