Big Data Support

This is the team blog for the Big Data Analytics & NoSQL Support team at Microsoft. We support HDInsight which is Hadoop running on Azure in the cloud, as well as other big data analytics features.

Rerunning many slices and activities in Azure Data Factory

Today someone asked me how to run all the data slices in their data factory on-demand in an ad-hoc...

Author: JasonHowell Date: 08/31/2016

Capture Microsoft Azure Stream Analytics logs

Microsoft Azure Stream Analytics makes building real time solution very easy. Developers can build...

Author: sudhirblog Date: 08/24/2016

HDFS gets full in Azure HDInsight with many Hive temporary files

Sometimes when Hive is using temporary files, and a VM is restarted in an HDInsight cluster in...

Author: JasonHowell Date: 08/15/2016

How to Find and Kill a running Yarn Application Master in HDInsight with and without SSH access

Today we faced a challenge in HDInsight not knowing the SSH user password to terminal into the...

Author: JasonHowell Date: 06/11/2016

How to Lock a Resource Group to prevent accidental deletion of resources like HDInsight

Did you know it is possible to prevent accidental deletion of resources in Azure? This could apply...

Author: JasonHowell Date: 05/16/2016

HDInsight Name Node can stay in Safe mode after a Scale Down

This week we worked on an HDInsight cluster where the Name Node has gone into Safe mode and didn't...

Author: JasonHowell Date: 03/16/2016

HDInsight Hive Metastore fails when the database name has dashes or hyphens

Working in Azure HDInsight support today, we see a failure when trying to run a Hive query on a...

Author: JasonHowell Date: 02/24/2016

How to call a Azure Machine Learning Web Service from NodeJS

Azure machine learning allows data scientists and developers to embed predictive analytics into...

Author: carrollwp Date: 02/18/2016

Encoding 101 - Exporting from SQL Server into flat files, to create a Hive external table

Today in Microsoft Big Data Support we faced the issue of how to correctly move Unicode data from...

Author: JasonHowell Date: 02/05/2016

Encoding the Hive query file in Azure HDInsight

Today at Microsoft we were using Azure Data Factory to run Hive Activities in Azure HDInsight on a...

Author: JasonHowell Date: 02/05/2016

Incremental data load from Azure Table Storage to Azure SQL using Azure Data Factory

Azure Data Factory is a cloud based data integration service. The service not only helps to move...

Author: sudhirblog Date: 01/23/2016

How to allow Spark to access Microsoft SQL Server

Today we will look at configuring Spark to access Microsoft SQL Server through JDBC. On HDInsight...

Author: carrollwp Date: 10/22/2015

Using Azure SDK for Python

Python is a great scripting tool with a large user base. In a recent support case I needed a way to...

Author: carrollwp Date: 10/02/2015

A KMeans example for Spark MLlib on HDInsight

Today we will take a look at Sparks's module for MLlib or its built-in machine learning library...

Author: carrollwp Date: 09/24/2015

Dealing with RequestRateTooLarge errors in Azure DocumentDB and testing performance

In Azure DocumentDB support, one of the most common errors we have seen as reported by our customers...

Author: Azim Uddin Date: 09/02/2015

How to configure Hortonworks HDP to access Azure Windows Storage

Recently I was asked how to configure a Hortonworks HDP 2.3 cluster to access Azure Windows Storage....

Author: carrollwp Date: 09/01/2015

Troubleshooting Oozie or other Hadoop errors with DEBUG logging

In troubleshooting Hadoop issues, we often need to review the logging of a specific Hadoop...

Author: Azim Uddin Date: 08/21/2015

Some things to consider for your Spark on HDInsight workload

When it comes time to provision your Spark cluster on HDInsight we all want our workloads to execute...

Author: carrollwp Date: 08/19/2015

How to Access HDInsight Linux Web UI's using SSH Dynamic Tunneling

Scenario One of the most important feature of Azure HDInsight Linux (currently on preview), is the...

Author: Meer Al - MSFT Date: 08/12/2015

Why is my spark application running out of disk space?

In your zeppelin notebook you have scala code that loads parquet data from two folders that is...

Author: carrollwp Date: 08/12/2015

Using cross/outer apply in Azure Stream Analytics

Recently I got involved in working with a problem where JSON data events contain an array of values....

Author: sudhirblog Date: 08/05/2015

Azure Data Factory JSON Changes in July 2015

Azure Data Factory factories are designed with a series of fairly simple JSON documents and uploaded...

Author: JasonHowell Date: 07/21/2015

Spark on Azure HDInsight is available

Spark on Azure HDInsight (public preview) is now available! The following components are included as...

Author: carrollwp Date: 07/14/2015

How to access Hive using JDBC on HDInsight

While following up on a customer question recently on this topic, I realized that we have seen the...

Author: Azim Uddin Date: 06/09/2015

How to install Splunk on HDINSIGHT with a custom action script

Recently I worked with a customer that wanted to use Splunk Enterprise and Splunk Forwarder to...

Author: carrollwp Date: 06/01/2015

Why are the Hadoop services disabled on my HDInsight cluster

I came across this question while working with a few customers recently and thought I would share a...

Author: Azim Uddin Date: 05/31/2015

Understanding HDInsight Custom Node VM Sizes

// With the 02/18/2015 update to HDInsight and Azure Powershell 0.8.14 we introduced a lot more...

Author: Rick_H Date: 05/11/2015

Azure PowerShell 0.8.14 Released, fixes problems with pipelining HDInsight configuration cmdlets

We recently pushed out the 0.8.14 release of Azure PowerShell. This release includes some updates to...

Author: Rick_H Date: 02/16/2015

Problems When Using a Shared Default Storage Container with Multiple HDInsight Clusters

We have seen several cases come in to Microsoft Support that ended up being caused by having...

Author: Rick_H Date: 02/12/2015

Some Commonly Used Yarn Memory Settings

We were recently working on an out of memory issue that was occurring with certain workloads on...

Author: Dharshana_Bharadwaj Date: 11/11/2014

How to use parameter substitution with Pig Latin and PowerShell

When running Pig in a production environment, you'll likely have one or more Pig Latin scripts that...

Author: Dan (MSFT) Date: 08/12/2014

HDInsight: - Creating, Deploying and Executing Pig UDF

During my developer experience, I always look for how customization (write my own processing) can be...

Author: sudhirblog Date: 07/07/2014

How to use a Custom JSON Serde with Microsoft Azure HDInsight

I had a recent need to parse JSON files using Hive. There were a couple of options that I could use....

Author: Dharshana_Bharadwaj Date: 06/18/2014

Some Frequently Asked Questions on Microsoft Azure HDInsight

We have seen some common questions on HDInsight when interacting with customers and partners. On...

Author: Dharshana_Bharadwaj Date: 05/22/2014

HDInsight News - New Videos to watch - HDInsight Provisioning demonstrations

Check out these two recent videos demos regarding HDInsight provisioning These videos complement the...

Author: JasonHowell Date: 05/09/2014

HDInsight: - backup and restore hive table

Introduction My name is Sudhir Rawat and I work on the Microsoft HDInsight support team. In this...

Author: sudhirblog Date: 05/01/2014

Sliding Window Data Partitioning on Microsoft Azure HDInsight

HCatalog is a table and storage management layer for Hadoop that enables users with different data...

Author: Dharshana_Bharadwaj Date: 04/23/2014

Querying HDInsight Job Status with WebHCat via Native PowerShell or Node.js

// One of the great things about HDInsight is that under the covers, it has the same capabilities as...

Author: Rick_H Date: 04/22/2014

Customizing HDInsight Cluster provisioning

In my last blog, I discussed how we can specify Hadoop configurations for a job on an HDInsight...

Author: Azim Uddin Date: 04/15/2014

Using Apache Flume with HDInsight

Gregory Suarez – 03/18/2014 (This blog posting assumes some basic knowledge of Apache Flume)...

Author: Gregory Suarez - MSFT Date: 03/18/2014

Next>