Hive Metastore in HDInsight –Tips, Tricks & Best Practices

multiclusterscenario
multiclusterscenario

When you create a Hive table, the table definition (column names, data types, comments, etc.) are stored in the Hive Metastore. Hive Metastore is critical part of Hadoop architecture as it acts as a central schema repository which can be used by other access tools like Spark, Interactive Hive (LLAP), Presto, Pig and many other…


How to use BigDL on Apache Spark for Azure HDInsight

Deep learning is impacting everything from healthcare, transportation, manufacturing, and more. Companies are turning to deep learning to solve hard problems, like image classification, speech recognition, object recognition, and machine translation. This blog post describes how to enable Intel’s BigDL Deep Learning Library for Apache Spark on Microsoft’s Azure HDInsight Platform. In 2016, Intel released…


Azure Data Lake U-SQL March 9 2017 Updates: Deprecations turn into errors, PIVOT/UNPIVOT, cross ADLS account U-SQL catalog sharing, nuget packages and more!

After mainly internal service updates after our general availability, we released several new U-SQL features in our release last week. Note that these updates are now available in all regions, including the new Europe North region. Here are the March 9 2017 Updates for Azure Data Lake U-SQL and Developer Tooling! The main take away…

0

Nodes in HDInsight

Knowing the types and functions of nodes in HDInsight is key to taking full advantage of the service. This article is aimed at users who are familiar with big data concepts but are newer to HDInsight. Please feel free to read the article and provide me feedback even if you’re beyond the target audience for…


Using Custom Python Libraries with U-SQL

The Python extensions for Azure Data Lake Analytics ships with the standard Python libraries and includes pandas and numpy. We’ve been getting a lot of questions about how to use custom libraries with the Python extensions. The good news is that this is simple. Introducing zipimport First, let’s talk about “zipimport”. Thanks to the adoption…

0

Analyze your data in ADLS with more assurance with the recently GA’d Power BI Desktop connector

As you know, Azure Data Lake Store (ADLS) has customers, who analyze/view data stored in ADLS directly using PowerBI Desktop and PBI.com. We been providing this support since late CY2015 with the Power BI connector for ADLS. However, this connector was marked as “Beta” awaiting GA of ADLS and also waiting for feedback from customers….

0

How WebHCat Works and How to Debug (Part 2)

Link to Part 1 2. How to debug WebHCat 2.1. BadGateway (HTTP status code 502) This is a very generic message from Gateway nodes. We will cover some common cases and possible mitigations. This is the most common Templeton problems customer are seeing right now. 2.1.1. WebHcat service down This happens in-case WebHCat server on…

2

How WebHCat Works and How to Debug (Part 1)

1. Overview and Goals One of the common scenarios our customers facing are: why my Hive, Pig, or Scoop job submissions are failing? Most likely something is wrong with your WebHCat service. In this article, we will try to answer some of the common questions like: What is WebHCat or sometimes referred to also as…

0

Azure Data Lake Tools for VSCode (Preview) – March Update

Continue our journey to launch Azure Data Lake Tools for VSCode for better cross-platform support, meet developers where they are in Mac, Linux and Windows, and deliver a first class light weight code editor experiences for U-SQL. We are pleased to announce our March release which includes a few important features.

0

Garbage Collection and its performance impact

Hadoop is a beautiful abstraction that allows us to deal with the numerous complexities of data without delving into the details of the infrastructure. But once in a while to see why the performance of your applications are stalled, one has to look underneath the hood and find ways to extract performance or find why…