How WebHCat Works and How to Debug (Part 2)

Link to Part 1 2. How to debug WebHCat 2.1. BadGateway (HTTP status code 502) This is a very generic message from Gateway nodes. We will cover some common cases and possible mitigations. This is the most common Templeton problems customer are seeing right now. 2.1.1. WebHcat service down This happens in-case WebHCat server on…

2

How WebHCat Works and How to Debug (Part 1)

1. Overview and Goals One of the common scenarios our customers facing are: why my Hive, Pig, or Scoop job submissions are failing? Most likely something is wrong with your WebHCat service. In this article, we will try to answer some of the common questions like: What is WebHCat or sometimes referred to also as…

0

Saving Spark Resilient Distributed Dataset (RDD) To PowerBI

The sample Jupyter Scala notebook described in this blog can be downloaded from https://github.com/hdinsight/spark-jupyter-notebooks/blob/master/Scala/SparkRDDToPowerBI.ipynb. Spark PowerBI connector source code is available at https://github.com/hdinsight/spark-powerbi-connector. This blog is a follow-up of our previous blog published at https://blogs.msdn.microsoft.com/azuredatalake/2016/03/09/saving-spark-dataframe-to-powerbi/ to show how to save Spark DataFrame to PorwerBI using the Spark PowerBI connector. In this blog we describe another…

0

Saving Spark Distributed Data Frame (DDF) To PowerBI

The sample Jupyter Scala notebook described in this blog can be downloaded from https://github.com/hdinsight/spark-jupyter-notebooks/blob/master/Scala/SparkDataFrameToPowerBI.ipynb. Spark PowerBI connector source code is available at https://github.com/hdinsight/spark-powerbi-connector. Data visualization is often the most important part of data processing as it can surface up data patterns and trends in data that cannot be otherwise easily perceptible by humans. PowerBI (https://powerbi.microsoft.com/en-us/)…

0

Extending Spark with Extension Methods in Scala: Fun with Implicits

The sample Jupyter Scala notebook described in this blog can be downloaded from https://github.com/hdinsight/spark-jupyter-notebooks/blob/master/Scala/ScalaExtensionMethod.ipynb Extension methods are programming language constructs which enable extending an object with additional methods after the original object has already been compiled. They are useful when a developer wants to add capabilities to an existing object when only the compiled object…

0

PySpark: Appending columns to DataFrame when DataFrame.withColumn cannot be used

The sample Jupyter Python notebook described in this blog can be downloaded from https://github.com/hdinsight/spark-jupyter-notebooks/blob/master/Python/AppendDataFrameColumn.ipynb In many Spark applications, there are common use cases in which columns derived from one or more existing columns in a DataFrame are appended during the data preparation or data transformation stages. DataFrame provides a convenient method of form DataFrame.withColumn([string] columnName,…

0

Organize and discover your big data in the Azure Data Lake with Azure Data Catalog

Enterprise data is growing at a remarkable pace today. A large portion of the growth in data is coming from a wide variety of unstructured and semi structured sources such as sensors, social, click streams and machine generated logs. Businesses of all sizes face the challenge of storing, organizing and utilizing their data in an…


How To: Increase number of reducers in your Hive/MapReduce job

Our customers often use compression technologies like ORC and Snappy that can compress data and offer high performance. The expectation is that since the data is compressed, the job should run faster. However, more often than not, the job still takes a long time to run. The main cause of this is that Hive often…

0

How To: output file as a CSV using Hive in Azure HDInsight

One of the common questions our team gets is how to output a Hive table to CSV. Hive does not provide a direct method to use the query language to dump to a file as CSV. Using the command INSERT OVERWRITE will output the table as TSV. We then have to manually convert it to…

2