How to allow Spark to access Microsoft SQL Server

  Today we will look at configuring Spark to access Microsoft SQL Server through JDBC. On HDInsight the Microsoft SQL Server JDBC jar is already installed. On Linux the path is /usr/hdp/2.2.7.1-10/hive/lib/sqljdbc4.jar. If you need more information or to download the driver you can start here Microsoft SQL Server JDBC Spark needs to know the…

9

A KMeans example for Spark MLlib on HDInsight

  Today we will take a look at Sparks’s module for MLlib or its built-in machine learning library Sparks MLlib Guide . KMeans is a popular clustering method. Clustering methods are used when there is no class to be predicted but instances are divided into groups or clusters. The clusters hopefully will represent some mechanism…

5

Understanding Spark’s SparkConf, SparkContext, SQLContext and HiveContext

  The first step of any Spark driver application is to create a SparkContext. The SparkContext allows your Spark driver application to access the cluster through a resource manager. The resource manager can be YARN, or Spark’s cluster manager. In order to create a SparkContext you should first create a SparkConf. The SparkConf stores configuration…

2

Some things to consider for your Spark on HDInsight workload

  When it comes time to provision your Spark cluster on HDInsight we all want our workloads to execute fast. The Spark community has made some strong claims for better performance compared to mapreduce jobs. In this post I want to discuss two topics to consider when deploying your Spark application on an HDInsight cluster.  …

0

Why is my spark application running out of disk space?

  In your zeppelin notebook you have scala code that loads parquet data from two folders that is compressed with snappy. You use SparkSQL to register one table named shutdown and another named census. You then use the SQLContext to join the two tables in a query and show the output. Below is the zeppelin…

0

Spark or Hadoop

  Spark is the most active Apache project and has a lot of media press in the big data world. So how do you know if Spark is right for your project and what is the difference between Spark and Hadoop when run on HDInsight? I’ll cover some of the differences between Spark and Hadoop…

0

Spark on Azure HDInsight is available

  Spark on Azure HDInsight (public preview) is now available! The following components are included as part of a Spark cluster on Azure HDInsight. Spark 1.3.1 Comes with Spark Core, Spark SQL, Spark streaming APIs, GraphX, and MLlib. Anaconda. A collection of powerful packages for python. Spark Job Server, which allows your to submit jars…

0