How to use a Custom JSON Serde with Microsoft Azure HDInsight

I had a recent need to parse JSON files using Hive. There were a couple of options that I could use. One is using native Hive JSON function such as get_json_object and the other is to use a JSON Serde to parse JSON objects containing nested elements with lesser code. I decided to go with the…


Sliding Window Data Partitioning on Microsoft Azure HDInsight

HCatalog is a table and storage management layer for Hadoop that enables users with different data processing tools like Pig, Mapreduce, Hive, and Oozie to read and write data. HCatalog’s table abstraction presents these tools and users with a relational view of data in the cluster. HCatalog Integration was made available starting with Apache Oozie…


How to add custom Hive UDFs to HDInsight

I recently had a need to add a UDF to Hive on HDInsight. I thought that it would be good to share that experience on a blog post. Hive provides a library of built-in functions to achieve the most common needs. The cool thing is that it also provides the framework to create your own…


Get Started with Hive on HDInsight

Hi, my name is Dharshana and I work on the Big Data Support Team at Microsoft. As covered in the earlier post by Dan from our team, HDInsight provides a very easy to use interface to provision a Hadoop cluster with a few clicks and interact with the cluster programmatically. In this blog post, we…