A KMeans example for Spark MLlib on HDInsight

  Today we will take a look at Sparks’s module for MLlib or its built-in machine learning library Sparks MLlib Guide . KMeans is a popular clustering method. Clustering methods are used when there is no class to be predicted but instances are divided into groups or clusters. The clusters hopefully will represent some mechanism…

5

Understanding Spark’s SparkConf, SparkContext, SQLContext and HiveContext

  The first step of any Spark driver application is to create a SparkContext. The SparkContext allows your Spark driver application to access the cluster through a resource manager. The resource manager can be YARN, or Spark’s cluster manager. In order to create a SparkContext you should first create a SparkConf. The SparkConf stores configuration…

2

Dealing with RequestRateTooLarge errors in Azure DocumentDB and testing performance

In Azure DocumentDB support, one of the most common errors we have seen as reported by our customers is RequestRateTooLargeException or HTTP Status code 429. For example, from an application using DocumentDB .Net SDK, we may see an error like this – System.AggregateException: One or more errors occurred. —> Microsoft.Azure.Documents.DocumentClientException: Exception: Microsoft.Azure.Documents.RequestRateTooLargeException, message: {“Errors”:[“Request rate…

3

How to configure Hortonworks HDP to access Azure Windows Storage

  Recently I was asked how to configure a Hortonworks HDP 2.3 cluster to access Azure Windows Storage. In this post we will go through the steps to accomplish this. The first step is to create an Azure Storage account from the Azure portal. My storage account is named clouddatalake. I choose the “local redundant”…

3