Share via


Azure Data Lake & Azure HDInsight Blog

The official blog for the Azure Data Lake services - Azure Data Lake Analytics, Azure Data Lake Store and Azure HDInsight

Using Cask Data Application Platform on Azure HDInsight

Recently, CDAP (Cask Data Application Platform) by Cask, was added to the set of applications that...

Author: Bharath Sreenivas Date: 10/17/2016

Azure Data Lake U-SQL September 2016 Updates: OUTER UNION, Set operations by name, FILE and PARTITION intrinsic functions and more!

I finally found the time to publish the release notes of the September refresh. My apologies for the...

Author: MRys Date: 10/13/2016

Understanding the Data Lake Analytics Unit

Developers often ask us: "What is an Azure Data Lake Analytics Unit? How does it affect my U-SQL...

Author: Saveen Reddy Date: 10/12/2016

Azure Data Lake Analytics: More Compute and More Control

Customers have been telling us that they want access to more computational horsepower for running...

Author: Saveen Reddy Date: 10/12/2016

Experience Updates to the Azure Data Lake Store and Analytics Portal

In this month's refresh of the Azure Data Lake Store and Azure Data Lake Analytics portal, we've...

Author: Saveen Reddy Date: 10/09/2016

HDInsight HBase: How to Improve HBase cluster restart time by Flushing tables?

This blog is written by Nitin Verma, Sr. Software Engineer, HDInsight. Do you restart or re-create...

Author: AshishThapliyal Date: 09/19/2016

Getting started with Azure Data Lake Analytics and Store has never been faster!

We’re happy to announce that we’ve made it much faster to get started with the Data Lake Store and...

Author: Saveen Reddy Date: 09/09/2016

HDInsight HBase: 9 things you must do to get great HBase performance

HBase is a fantastic high end NoSql BigData machine that gives you many options to get great...

Author: AshishThapliyal Date: 09/02/2016

HDInsight -New self-paced trainings and labs

This week Microsoft Learning Experiences released/updated 3 HDInsight courses ( These are free , $49...

Author: AshishThapliyal Date: 08/28/2016

How to register U-SQL Assemblies in your U-SQL Catalog

U-SQL's extensibility model heavily depends on your ability to add your own custom code. Currently,...

Author: MRys Date: 08/26/2016

HDInsight:- Attach additional Azure storage accounts to the cluster

This blog is discontinued in favor of updated HDInsight documentation on MSDN...

Author: AshishThapliyal Date: 08/26/2016

Introducing Image Processing in U-SQL

Rukmani Gopalan - Senior Program Manager Apostolos "Toli" Lerios - Entrepreneur in Residence and...

Author: Rukmani G Date: 08/18/2016

Rapid Big Data Prototyping with Microsoft R Server on Apache Spark: Context Switching & Spark Tuning

Max Kaznady – Data Scientist; Jason Zhang – Senior Software Engineer; Arijit Tarafdar – Senior...

Author: Saveen Reddy Date: 08/09/2016

Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight

This session was presented by Nitin Verma (Sr. Software Engineer) and Pravin Mittal (Principal...

Author: AshishThapliyal Date: 08/04/2016

Azure Data Lake U-SQL August 1st 2016 Updates: ACLs on Databases, Skipping Header Rows, Sampling and more!

As part of the Azure Data Lake Analytics and U-SQL August 1st refresh, we released a couple of new,...

Author: MRys Date: 08/03/2016

Introducing File and Folder ACLs for Azure Data Lake Store

Overview We’re excited today to announce the availability of File and Folder ACLs for the Azure Data...

Author: Amit R. Kulkarni Date: 07/31/2016

HDinsight - How to use Spark-HBase connector?

Apache Spark is an open-source parallel processing framework that supports in-memory processing to...

Author: Anunay Tiwari Date: 07/25/2016

Azure Data Lake U-SQL July Updates

As part of the Azure Data Lake Analytics and U-SQL July refresh released earlier this month, we...

Author: MRys Date: 07/18/2016

Partial Caching of DataFrame by Vertical and Horizontal Partitioning

The sample Jupyter Scala notebook described in this blog can be downloaded from...

Author: ArijitT Date: 07/08/2016

HDInsight tool in Azure Toolkit for Eclipse is GA!

Today, we are pleased to announce that the HDInsight tool in Azure Toolkit for Eclipse is generally...

Author: Xiaoyong Zhu (MSFT) Date: 07/04/2016

How do I combine overlapping ranges using U-SQL? Introducing U-SQL Reducer UDOs

The problem statement A few weeks ago, a customer on stackoverflow asked for a solution for the...

Author: MRys Date: 06/27/2016

Azure Data Lake Analytics: Greater flexibility with assigning Parallelism to U-SQL Jobs

The Data Lake team is always receiving very useful and thoughtful feedback from users on how we can...

Author: Saveen Reddy Date: 06/17/2016

Appending an Index Column to Distributed DataFrame based on another Column with Non-unique Entries

The sample Jupyter Scala notebook described in this blog can be downloaded from...

Author: ArijitT Date: 06/09/2016

HDInsight Tool for IntelliJ is GA!

We are excited to announce that the HDInsight Tool for IntelliJ is now GA. The HDInsight Tool for...

Author: Saveen Reddy Date: 06/06/2016

Leveraging Azure Data Lake Partitioning to Recalculate Previously Processed Days

Many data flows will require partial reloading of U-SQL tables due to the need to recalculate a...

Author: brimit Date: 05/03/2016

April Updates to Azure Data Lake Analytics and Azure Data Lake Store

Hello everyone, the Azure Data Lake engineering team has been working hard on refining the services...

Author: Saveen Reddy Date: 04/24/2016

Debugging U-SQL Error E_RUNTIME_USER_EXTRACT_UNEXPECTED_NUMBER_COLUMNS: Unexpected number of columns in input record

Did you run into an error that said E_RUNTIME_USER_EXTRACT_UNEXPECTED_NUMBER_COLUMNS with a...

Author: Rukmani G Date: 04/23/2016

HDInsight jobs troubleshooting

WebHCat is a REST interface for remote jobs (Hive, Pig, Scoop, MapReduce) execution. WebHCat...

Author: kolli.kiran Date: 04/21/2016

HDInsight Hive job workload

Typically, HIVE queries are developed using HIVE console or through interactive experiences like...

Author: kolli.kiran Date: 04/19/2016

HDInsight Hive workload under covers

HDInsight under covers post covered cluster creation/set-up overview. Apache Hive is the most...

Author: kolli.kiran Date: 04/06/2016

HDInsight under covers

Azure HDInsight provisions and manages Apache Hadoop clusters in Azure cloud. HDInsight uses...

Author: kolli.kiran Date: 04/04/2016

Saving Spark Streaming Metrics to PowerBI

The sample Jupyter Scala notebook described in this blog can be downloaded from...

Author: ArijitT Date: 04/01/2016

Saving Spark Resilient Distributed Dataset (RDD) To PowerBI

The sample Jupyter Scala notebook described in this blog can be downloaded from...

Author: ArijitT Date: 03/22/2016

U-SQL Programming Improvements to Azure Data Lake Analytics for March 2016

Hello everyone! The Azure Data Lake team is pleased to announce additional enhancements to U-SQL...

Author: Saveen Reddy Date: 03/15/2016

Saving Spark Distributed Data Frame (DDF) To PowerBI

The sample Jupyter Scala notebook described in this blog can be downloaded from...

Author: ArijitT Date: 03/09/2016

Extending Spark with Extension Methods in Scala: Fun with Implicits

The sample Jupyter Scala notebook described in this blog can be downloaded from...

Author: ArijitT Date: 03/01/2016

PySpark: Appending columns to DataFrame when DataFrame.withColumn cannot be used

The sample Jupyter Python notebook described in this blog can be downloaded from...

Author: ArijitT Date: 02/10/2016

Copy data easily from Azure Storage Blobs to Azure Data Lake Store

The Azure Data Lake team has just released capability that helps users to jumpstart their usage of...

Author: Sachin C Sheth Date: 12/15/2015

Organize and discover your big data in the Azure Data Lake with Azure Data Catalog

Enterprise data is growing at a remarkable pace today. A large portion of the growth in data is...

Author: Amit R. Kulkarni Date: 12/10/2015

How To: Increase number of reducers in your Hive/MapReduce job

Our customers often use compression technologies like ORC and Snappy that can compress data and...

Author: Rashim Gupta Date: 12/08/2015

How To: output file as a CSV using Hive in Azure HDInsight

One of the common questions our team gets is how to output a Hive table to CSV. Hive does not...

Author: Rashim Gupta Date: 11/23/2015

Hello world!

Welcome to the official blog of the Azure Data Lake Engineering team. On this blog we will cover the...

Author: Rashim Gupta Date: 11/18/2015

<Previous