HDInsight Name Node can stay in Safe mode after a Scale Down

This week we worked on an HDInsight cluster where the Name Node has gone into Safe mode and didn’t leave that mode on its own. It’s not very common, but I wanted to share why it happened, and how to get out of the situation, in case it prevents a headache for someone else. HDInsight…

0

HDInsight Hive Metastore fails when the database name has dashes or hyphens

Working in Azure HDInsight support today, we see a failure when trying to run a Hive query on a freshly created HDInsight cluster. Its brand new and fails on the first try, so what could be wrong? Our Hive client app fails with this kind of error. Exception in thread “main” java.lang.RuntimeException: java.lang.RuntimeException: Unable to…

0

How to allow Spark to access Microsoft SQL Server

  Today we will look at configuring Spark to access Microsoft SQL Server through JDBC. On HDInsight the Microsoft SQL Server JDBC jar is already installed. On Linux the path is /usr/hdp/2.2.7.1-10/hive/lib/sqljdbc4.jar. If you need more information or to download the driver you can start here Microsoft SQL Server JDBC Spark needs to know the…

5

Multi-Stream support in SCP.NET Storm Topology

Streams are in the core of Apache Storm. In most cases topologies are based on a single input stream, however there are situations when one may need to start the topology with two or more input steams. User code to emit or receive from distinct streams at the same time is supported in SCP. To…

0

A KMeans example for Spark MLlib on HDInsight

  Today we will take a look at Sparks’s module for MLlib or its built-in machine learning library Sparks MLlib Guide . KMeans is a popular clustering method. Clustering methods are used when there is no class to be predicted but instances are divided into groups or clusters. The clusters hopefully will represent some mechanism…

2

Understanding Spark’s SparkConf, SparkContext, SQLContext and HiveContext

  The first step of any Spark driver application is to create a SparkContext. The SparkContext allows your Spark driver application to access the cluster through a resource manager. The resource manager can be YARN, or Spark’s cluster manager. In order to create a SparkContext you should first create a SparkConf. The SparkConf stores configuration…

1

Collecting logs from Apache Storm cluster in HDInsight

While running an Apache Storm topology in a multi node storm cluster different components of the topology log in different files that are saved in different nodes in the cluster, depending on where that component is running. Today in this blog I will discuss the log files that are available in a storm cluster and…

0

Troubleshooting Oozie or other Hadoop errors with DEBUG logging

In troubleshooting Hadoop issues, we often need to review the logging of a specific Hadoop component. By default, the logging level is set to INFO or WARN for many Hadoop components like Oozie, Hive etc. and in many cases this level of logging is sufficient to trace the issue. However, in certain cases, INFO or…

1

Some things to consider for your Spark on HDInsight workload

  When it comes time to provision your Spark cluster on HDInsight we all want our workloads to execute fast. The Spark community has made some strong claims for better performance compared to mapreduce jobs. In this post I want to discuss two topics to consider when deploying your Spark application on an HDInsight cluster.  …

0

Troubleshooting Hive query performance in HDInsight Hadoop cluster

One of the common support requests we get from customers using Apache Hive is –my Hive query is running slow and I would like the job/query to complete much faster – or in more quantifiable terms, my Hive query is taking 8 hours to complete and my SLA is 2 hours. Improving or tuning hive…

0