Spark Job Submission on HDInsight 101

This article is part two of the Spark Debugging 101 series we initiated a few weeks ago. Here we discuss ways in which spark jobs can be submitted on HDInsight clusters and some common troubleshooting guidelines. So here goes. Livy Batch Job Submission Livy is an open source REST interface for interacting with Apache Spark remotely from…


Cornell Lab of Ornithology Improves Machine Learning Workflow with Azure HDInsight

For the last 14 years, the Cornell Lab of Ornithology has been collecting millions of bird observations through a citizen science project called eBird. This data can be used to model and understand the distribution, abundance and movements of birds across large geographic areas and over long periods of time, which yields priorities for broad-scale…

0

Introducing: Interactive Hive cluster using LLAP (Long Live and Process)

Earlier in the Fall, we announced the public preview of Hive LLAP (Long Live and Process) in the Azure HDInsight service. LLAP is a new feature in Hive 2.0 allowing in-memory caching making Hive queries much more interactive and faster. This makes HDInsight one of the world’s most performant, flexible and open Big Data solution…

0

Spark Debugging on HDInsight 101

Apache Spark is an open source processing framework that runs large-scale data analytics applications. Built on an in-memory compute engine, Spark enables high performance querying on big data. It leverages a parallel data processing framework that persists data in-memory and disk if needed.  This article details common ways of submitting spark applications on our HDInsight…


Problems with new File Set (update 2016-12-14 – 16:30 PST)

In the latest push we enabled the new faster file set feature per default. Unfortunately that caused a regression that can manifest itself in one of the following ways: Your EXTRACT statement extracts 0 rows, even though there should be more data. You run into the following error message: E_RUNTIME_SYSTEM_CODEGENFAILURE — Internal error! The SV_Extract_Combine…

0

Introducing Python SDKs for Data Lake Store & Analytics

We are committed to “meeting developers where they are” and part of that means letting developers use the programming languages they want on the operating systems they already use.  For example, just last week we showed you how to use Python inside a U-SQL script. This week, were are now making announcing even more support for Python. As of today…

0

U-SQL Advanced Analytics: Introducing Cognitive scenarios for Text and Imaging

Yesterday we introduced you to U-SQL Advanced Analytics and showed how Python can be used with U-SQL. Today, we’ll show U-SQL’s built-in support for Cognitive scenarios for images and text. Currently U-SQL Supports these cognitive scenarios: Detecting Objects in Images (Tagging) Detecting Emotion in Faces in Images Detecting Text in Images (OCR) Text Key Phrase Extraction Text Sentiment Analysis…

3

U-SQL Advanced Analytics: Introducing Python Extensions for U-SQL

Last week at Microsoft’s Connect 2016 conference, we announced the General Availability of Azure Data Lake Analytics. As part of the announcement we revealed that U-SQL now includes built-in support for Advanced Analytics scenarios. This includes: The ability to perform massively distributed analytics using Python The ability to perform massively distributed analytics using R Built-in Cognitive capabilities (such as…

5

HDInsight HBase/Phoenix FAQ

We will keep this page updated with HDInsight HBase/ Phoenix related commonly asked questions. You can leave comments/questions on this blog. Also, official channel to provide HDInsight related feedback and make feature requests is here What is the advantage of using HBase in Azure HDInsight? Azure HDInsight HBase – A NoSql database like no other  …


SCP.Net with HDInsight Linux Storm clusters

SCP.Net is now available on HDInsight Linux clusters 3.4 and above. Versions Note: HDInsight Storm team recommends HDI 3.5 clusters for users looking to migrate their SCP.Net topologies from Windows to Linux.   Development of SCP.Net Topology Pre-Steps   Azure Datalake Tools for Visual Studio HDInsight tools for Visual Studio does not support submission of…