Azure Data Lake & Azure HDInsight Blog

The official blog for the Azure Data Lake services - Azure Data Lake Analytics, Azure Data Lake Store and Azure HDInsight

Azure Data Lake Tools for Visual Studio Code (VSCode) General Availability

Azure Data Lake Tools for Visual Studio Code (VSCode) gives developers a light but powerful code...

Author: JennyJiang Date: 05/12/2017

Using Jupyter notebooks and Pandas with Azure Data Lake Store

This blog post describes how to use Jupyter notebooks and Pandas with Azure Data Lake Store. Using...

Author: Amit R. Kulkarni Date: 05/05/2017

SCP.Net with HDInsight Linux Storm clusters

SCP.Net is now available on HDInsight Linux clusters 3.4 and above. Versions Note: HDInsight Storm...

Author: Ravi Peri (MSFT) Date: 05/03/2017

HDInsight tools for IntelliJ & Eclipse April Updates

We are pleased to announce the April updates of HDInsight Tools for IntelliJ & Eclipse. This is...

Author: JennyJiang Date: 04/27/2017

Azure Data Lake U-SQL April 25 2017 Updates: Introducing Packages, UNPIVOT INCLUDE NULLS, fast file set preview flag, R extension returns dataframes, exporting your cluster database with sample data to your local run and more!

We have concluded the rollout of our April 2017 refresh to all the regions today. Here are the April...

Author: MRys Date: 04/27/2017

Exposing Hive!

I sat down with Justin Scott (Application Development Manager at Microsoft working with our top...

Author: AshishThapliyal Date: 04/25/2017

Cloudera clusters now run with Azure Data Lake Store

We are excited to announce that with today's release of Cloudera Enterprise 5.11 you can now run...

Author: CP_MSFT Date: 04/18/2017

Use H2O.ai on Azure HDInsight

We're hosting an upcoming webinar to present you how to use H2O on HDInsight and to answer your...

Author: Xiaoyong Zhu (MSFT) Date: 04/11/2017

Azure Data Factory makes it even easier and convenient to uncover insights from data when using Data Lake Store with SQL Data Warehouse

Earlier in February 2017 we announced  availability of SQL Data Warehouse (SQLDW) PolyBase support...

Author: Sachin C Sheth Date: 04/08/2017

Azure HDInsight 3.6 - Five things that will make a data developer happy

Working with Hive, I regularly find myself staring at a csv/tsv/json files wondering where to...

Author: AshishThapliyal Date: 04/06/2017

Hive Metastore in HDInsight –Tips, Tricks & Best Practices

When you create a Hive table, the table definition (column names, data types, comments, etc.) are...

Author: AshishThapliyal Date: 03/24/2017

How to use BigDL on Apache Spark for Azure HDInsight

Deep learning is impacting everything from healthcare, transportation, manufacturing, and more....

Author: Xiaoyong Zhu (MSFT) Date: 03/17/2017

Azure Data Lake U-SQL March 9 2017 Updates: Deprecations turn into errors, PIVOT/UNPIVOT, cross ADLS account U-SQL catalog sharing, nuget packages and more!

After mainly internal service updates after our general availability, we released several new U-SQL...

Author: MRys Date: 03/16/2017

Using Custom Python Libraries with U-SQL

The U-SQL/Python extensions for Azure Data Lake Analytics ships with the standard Python libraries...

Author: Saveen Reddy Date: 03/10/2017

Analyze your data in ADLS with more assurance with the recently GA'd Power BI Desktop connector

As you know, Azure Data Lake Store (ADLS) has customers, who analyze/view data stored in ADLS...

Author: Sachin C Sheth Date: 03/10/2017

How WebHCat Works and How to Debug (Part 2)

Link to Part 1 2. How to debug WebHCat 2.1. BadGateway (HTTP status code 502) This is a very generic...

Author: jiangmouren Date: 03/08/2017

How WebHCat Works and How to Debug (Part 1)

  1. Overview and Goals One of the common scenarios our customers facing are: why my Hive, Pig, or...

Author: jiangmouren Date: 03/08/2017

Azure Data Lake Tools for VSCode (Preview) - March Update

Continue our journey to launch Azure Data Lake Tools for VSCode for better cross-platform support,...

Author: JennyJiang Date: 03/07/2017

Garbage Collection and its performance impact

Hadoop is a beautiful abstraction that allows us to deal with the numerous complexities of data...

Author: Ranjan Banerjee Date: 03/06/2017

Wiring your older Hadoop clusters to access Azure Data Lake Store

This blog post describes how to connect older Hadoop clusters, those with version lower than 3.0, to...

Author: Amit R. Kulkarni Date: 02/27/2017

Restarting Storm EventHub Topology on a new cluster

Azure EventHub is a popular highly scalable data streaming platform. More about Azure EventHub can...

Author: Ranjan Banerjee Date: 02/24/2017

Using Oozie SLA on HDInsight clusters

Introduction Often we have several jobs running on our HDInsight clusters that have tight timelines...

Author: Bharath Venkatesh Date: 02/24/2017

Ingest data into Azure Data Lake Store with StreamSets Data Collector

Today, I want to give a shout out to one of our partners who has a great offering for Azure Data...

Author: CP_MSFT Date: 02/23/2017

Making Azure Data Lake Store the default file system for Hadoop

Here's an article that explains how to make Azure Data Lake Store the default file system for...

Author: Amit R. Kulkarni Date: 02/21/2017

Enabling U-SQL Advanced Analytics for Local Execution

After we announced the ability for U-SQL to massively distributed Python code in the Azure Data...

Author: Saveen Reddy Date: 02/20/2017

Connecting your own Hadoop or Spark to Azure Data Lake Store

A frequent question we get is how do I connect my Hadoop or Spark cluster to Azure Data Lake Store....

Author: Amit R. Kulkarni Date: 02/17/2017

Building advanced analytical solutions faster using Dataiku DSS on HDInsight

The Azure HDInsight Application Platform allows users to use applications that span a variety of use...

Author: Bharath Sreenivas Date: 02/16/2017

HDinsight - How to perform Bulk Load with Phoenix ?

Apache HBase is an open Source No SQL Hadoop database, a distributed, scalable, big data store. It...

Author: Anunay Tiwari Date: 02/14/2017

Uncover insights rapidly from petabytes of data in Azure Data Lake Store with SQL Data Warehouse PolyBase support

Most common patterns using Azure Data Lake Store (ADLS) involve customers ingesting and storing raw...

Author: Sachin C Sheth Date: 02/06/2017

Distributed Deep Learning on HDInsight with Caffe on Spark

Introduction Deep learning is impacting everything from healthcare to transportation to...

Author: Xiaoyong Zhu (MSFT) Date: 02/02/2017

U-SQL Deprecation Update: Migration of Data Source Credentials and Removal of CREATE CREDENTIAL, ALTER CREDENTIAL and DROP CREDENTIAL

Back in October, we announced that we simplified the U-SQL Credentials by merging the password...

Author: MRys Date: 01/24/2017

U-SQL Deprecation notice: PARTITION BY BUCKET will be removed

Hi all In the upcoming refresh, we are removing the deprecated syntax PARTITION BY BUCKET and will...

Author: MRys Date: 01/23/2017

Introducing: Microsoft Azure Data Lake Tools for Visual Studio Code

Welcome to the Microsoft Azure Data Lake Tools preview for Visual Studio Code, an extension for...

Author: JennyJiang Date: 01/20/2017

Microsoft Azure Data Lake Tools for Visual Studio Code

Welcome to the Microsoft Azure Data Lake Tools preview for Visual Studio Code, an extension for...

Author: JennyJiang Date: 01/20/2017

HDInsight tools for IntelliJ & Eclipse December Updates

We are pleased to announce the December updates of HDInsight Tools for IntelliJ & Eclipse. The...

Author: JennyJiang Date: 01/20/2017

Spark Job Submission on HDInsight 101

This article is part two of the Spark Debugging 101 series we initiated a few weeks ago. Here we...

Author: Bharath Venkatesh Date: 01/06/2017

Cornell Lab of Ornithology Improves Machine Learning Workflow with Azure HDInsight

For the last 14 years, the Cornell Lab of Ornithology has been collecting millions of bird...

Author: Rashim Gupta Date: 12/28/2016

Introducing: Interactive Hive cluster using LLAP (Long Live and Process)

Earlier in the Fall, we announced the public preview of Hive LLAP (Long Live and Process) in the...

Author: Rashim Gupta Date: 12/28/2016

Spark Debugging on HDInsight 101

Apache Spark is an open source processing framework that runs large-scale data analytics...

Author: Abdullah Al Mahmood Date: 12/19/2016

Problems with new File Set (update 2016-12-14 - 16:30 PST)

In the latest push we enabled the new faster file set feature per default. Unfortunately that caused...

Author: MRys Date: 12/14/2016

Introducing Python SDKs for Data Lake Store & Analytics

We are committed to "meeting developers where they are" and part of that means letting developers...

Author: Saveen Reddy Date: 11/28/2016

U-SQL Advanced Analytics: Introducing Cognitive scenarios for Text and Imaging

Yesterday we introduced you to U-SQL Advanced Analytics and showed how Python can be used with...

Author: Saveen Reddy Date: 11/22/2016

U-SQL Advanced Analytics: Introducing Python Extensions for U-SQL

Last week at Microsoft's Connect 2016 conference, we announced the General Availability of Azure...

Author: Saveen Reddy Date: 11/22/2016

Apache HBase/Phoenix - Tips , Tricks & Best Practices in HDInsight

We will keep this page updated with HDInsight HBase/ Phoenix related commonly asked questions. You...

Author: AshishThapliyal Date: 11/19/2016

Azure Data Lake Store is now generally available

Today we announced general availability of Azure Data Lake services including Azure Data Lake Store...

Author: Amit R. Kulkarni Date: 11/17/2016

Preview: Azure Data Lake Tools for Visual Studio Code

We are pleased to announce the Public Preview of the Azure Data Lake (ADL) Tools for VSCode. The...

Author: Saveen Reddy Date: 11/17/2016

Executing Spark SQL Queries using dotnet ODBC driver

Introduction HDInsight provides numerous ways of executing Spark applications on your cluster. This...

Author: Bharath Venkatesh Date: 10/26/2016

OozieBot: Automated Oozie Workflow and Coordinator Generation

Introducing OozieBot - a tool to help customers automate Oozie job creation. Learn how to use...

Author: Bharath Venkatesh Date: 10/20/2016

Azure Data Lake U-SQL October 2016 Updates: Deprecations turn into errors, sampling is live, sharing catalog objects across ADLA accounts, outputting headers and more!

We seem to be just cranking out new stuff :). Here are the October 2016 Updates for Azure Data Lake...

Author: MRys Date: 10/17/2016

<Previous Next>