HDFS gets full in Azure HDInsight with many Hive temporary files

Sometimes when Hive is using temporary files, and a VM is restarted in an HDInsight cluster in Microsoft Azure, then those files can become orphaned and consume space. In Azure HDInsight, those temp files live in the HDFS file system, which is distributed across the local disks in the worker nodes. This is a different…

0

HDInsight Name Node can stay in Safe mode after a Scale Down

This week we worked on an HDInsight cluster where the Name Node has gone into Safe mode and didn’t leave that mode on its own. It’s not very common, but I wanted to share why it happened, and how to get out of the situation, in case it prevents a headache for someone else. HDInsight…

0

HDInsight Hive Metastore fails when the database name has dashes or hyphens

Working in Azure HDInsight support today, we see a failure when trying to run a Hive query on a freshly created HDInsight cluster. Its brand new and fails on the first try, so what could be wrong? Our Hive client app fails with this kind of error. Exception in thread “main” java.lang.RuntimeException: java.lang.RuntimeException: Unable to…

0

Encoding 101 – Exporting from SQL Server into flat files, to create a Hive external table

Today in Microsoft Big Data Support we faced the issue of how to correctly move Unicode data from SQL Server into Hive via flat text files. The main issue faced was encoding special Unicode characters from the source database, such as the degree sign (Unicode 00B0) and other complex Unicode characters outside of A-Z 0-9….

0

Encoding the Hive query file in Azure HDInsight

Today at Microsoft we were using Azure Data Factory to run Hive Activities in Azure HDInsight on a schedule. Things were working fine for a while, but then we got an error that was hard to understand. I’ve simplified the scenario to illustrate the key points. The key is that Hive did not like the…

0

Troubleshooting Hive query performance in HDInsight Hadoop cluster

One of the common support requests we get from customers using Apache Hive is –my Hive query is running slow and I would like the job/query to complete much faster – or in more quantifiable terms, my Hive query is taking 8 hours to complete and my SLA is 2 hours. Improving or tuning hive…

0

How to access Hive using JDBC on HDInsight

While following up on a customer question recently on this topic, I realized that we have seen the same question coming up from other users a few times and thought I would share a simple example here on how to connect to HiveServer2 on Azure HDInsight using JDBC. For background, please review the apache wiki and the…

0

How to use a Custom JSON Serde with Microsoft Azure HDInsight

I had a recent need to parse JSON files using Hive. There were a couple of options that I could use. One is using native Hive JSON function such as get_json_object and the other is to use a JSON Serde to parse JSON objects containing nested elements with lesser code. I decided to go with the…


Some Frequently Asked Questions on Microsoft Azure HDInsight

  We have seen some common questions on HDInsight when interacting with customers and partners. On this blog post, we are going to help answer some of those common questions. 1. What is Microsoft Azure HDInsight? HDInsight is a Hadoop-based service from Microsoft that brings a 100 percent Apache Hadoop solution to the cloud. Through deep…


HDInsight: – backup and restore hive table

Introduction My name is Sudhir Rawat and I work on the Microsoft HDInsight support team. In this blog I am going to explain the options for backing up and restoring a Hive table on HDInsight. The general recommendation is to store hive metadata on SQL Azure during provisioning the cluster. Sometimes, we may have many…

1