HDFS gets full in Azure HDInsight with many Hive temporary files

Sometimes when Hive is using temporary files, and a VM is restarted in an HDInsight cluster in Microsoft Azure, then those files can become orphaned and consume space. In Azure HDInsight, those temp files live in the HDFS file system, which is distributed across the local disks in the worker nodes. This is a different…

0

How to Lock a Resource Group to prevent accidental deletion of resources like HDInsight

Did you know it is possible to prevent accidental deletion of resources in Azure? This could apply to any number of resource, HDInsight, Stream Analytics jobs, Data Factories, DocumentDB accounts, etc. We can add a lock to the resource group to prevent resources from being removed inadvertantly. I found out the hard way when someone…


HDInsight Name Node can stay in Safe mode after a Scale Down

This week we worked on an HDInsight cluster where the Name Node has gone into Safe mode and didn’t leave that mode on its own. It’s not very common, but I wanted to share why it happened, and how to get out of the situation, in case it prevents a headache for someone else. HDInsight…

0

HDInsight Hive Metastore fails when the database name has dashes or hyphens

Working in Azure HDInsight support today, we see a failure when trying to run a Hive query on a freshly created HDInsight cluster. Its brand new and fails on the first try, so what could be wrong? Our Hive client app fails with this kind of error. Exception in thread “main” java.lang.RuntimeException: java.lang.RuntimeException: Unable to…

0

Encoding 101 – Exporting from SQL Server into flat files, to create a Hive external table

Today in Microsoft Big Data Support we faced the issue of how to correctly move Unicode data from SQL Server into Hive via flat text files. The main issue faced was encoding special Unicode characters from the source database, such as the degree sign (Unicode 00B0) and other complex Unicode characters outside of A-Z 0-9….

0

Encoding the Hive query file in Azure HDInsight

Today at Microsoft we were using Azure Data Factory to run Hive Activities in Azure HDInsight on a schedule. Things were working fine for a while, but then we got an error that was hard to understand. I’ve simplified the scenario to illustrate the key points. The key is that Hive did not like the…

0

How to allow Spark to access Microsoft SQL Server

  Today we will look at configuring Spark to access Microsoft SQL Server through JDBC. On HDInsight the Microsoft SQL Server JDBC jar is already installed. On Linux the path is /usr/hdp/2.2.7.1-10/hive/lib/sqljdbc4.jar. If you need more information or to download the driver you can start here Microsoft SQL Server JDBC Spark needs to know the…

5

Multi-Stream support in SCP.NET Storm Topology

Streams are in the core of Apache Storm. In most cases topologies are based on a single input stream, however there are situations when one may need to start the topology with two or more input steams. User code to emit or receive from distinct streams at the same time is supported in SCP. To…

0

Using Azure SDK for Python

  Python is a great scripting tool with a large user base. In a recent support case I needed a way to constantly generate files with some random data in windows azure storage (wasb) in order to process them with Spark on HDInsight. Python, the Azure SDK for Python and a few lines of code…

0