Why is my spark application running out of disk space?

  In your zeppelin notebook you have scala code that loads parquet data from two folders that is compressed with snappy. You use SparkSQL to register one table named shutdown and another named census. You then use the SQLContext to join the two tables in a query and show the output. Below is the zeppelin…

0

Spark or Hadoop

  Spark is the most active Apache project and has a lot of media press in the big data world. So how do you know if Spark is right for your project and what is the difference between Spark and Hadoop when run on HDInsight? I’ll cover some of the differences between Spark and Hadoop…

0

Spark on Azure HDInsight is available

  Spark on Azure HDInsight (public preview) is now available! The following components are included as part of a Spark cluster on Azure HDInsight. Spark 1.3.1 Comes with Spark Core, Spark SQL, Spark streaming APIs, GraphX, and MLlib. Anaconda. A collection of powerful packages for python. Spark Job Server, which allows your to submit jars…

0