Announcing Microsoft Machine Learning Library for Apache Spark

This post is authored by Roope Astala, Senior Program Manager, and Sudarshan Raghunathan, Principal Software Engineering Manager, at Microsoft. This is a cross post and its original post is in Cortana Intelligence and Machine Learning Blog. We’re excited to announce the Microsoft Machine Learning library for Apache Spark – a library designed to make data scientists…


Exposing Hive!

I sat down with Justin Scott (Application Development Manager at Microsoft working with our top customers) to talk about Apache Hive and where it’s heading. You can listen to channel 9 podcast now


Azure HDInsight 3.6 – Five things that will make a data developer happy

Working with Hive, I regularly find myself staring at a csv/tsv/json files wondering where to start…. Hive View 2.0 is a new Web Experience in HDInsight 3.6 that greatly simplifies many common Hive Tasks and makes it easy to author and debug hive queries. In this post, we will look into 5 key feature that…


Nodes in HDInsight

Knowing the types and functions of nodes in HDInsight is key to taking full advantage of the service. This article is aimed at users who are familiar with big data concepts but are newer to HDInsight. Please feel free to read the article and provide me feedback even if you’re beyond the target audience for…


HDInsight tools for IntelliJ & Eclipse December Updates

  We are pleased to announce the December updates of HDInsight Tools for IntelliJ & Eclipse. The HDInsight Tools for IntelliJ & Eclipse serve the open source community and will be of interest to HDInsight Spark developers. The tools run smoothly in Linux, Mac and Windows. The recent release focuses on users’ feedback to ensure…

0

Spark Job Submission on HDInsight 101

This article is part two of the Spark Debugging 101 series we initiated a few weeks ago. Here we discuss ways in which spark jobs can be submitted on HDInsight clusters and some common troubleshooting guidelines. So here goes. Livy Batch Job Submission Livy is an open source REST interface for interacting with Apache Spark remotely from…


Spark Debugging on HDInsight 101

Apache Spark is an open source processing framework that runs large-scale data analytics applications. Built on an in-memory compute engine, Spark enables high performance querying on big data. It leverages a parallel data processing framework that persists data in-memory and disk if needed.  This article details common ways of submitting spark applications on our HDInsight…


Executing Spark SQL Queries using dotnet ODBC driver

Introduction HDInsight provides numerous ways of executing Spark applications on your cluster. This blogpost outlines how to run Spark SQL queries on your cluster remotely from Visual Studio using C#.  The examples explained below is intended to serve as a framework on which you can extend it to build your custom Spark SQL queries. Prerequisite…


HDinsight – How to use Spark-HBase connector?

Apache Spark is an open-source parallel processing framework that supports in-memory processing to boost the performance of big-data analytic applications. Azure HDInsight offers a fully managed Spark service with many benefits. Apache HBase is an open Source No SQL Hadoop database, a distributed, scalable, big data store. It provides real-time read/write access to large datasets….

1