The ultimate showdown of NoSQL destiny!

Sharks and bees and… fast Italians?! If you’ve been following this blog recently, you’d have noticed that I’m having a blast trying different data products on Azure and playing with them. I recently managed to get Spark / Spark SQL (Shark’s replacement) running on Azure in the same way, but rather than dedicate a post…

1

HDInsight working with different storage accounts

Storage accounts – configured and otherwise When you create an HDInsight cluster, whether through the Azure portal or programmatically through e.g. PowerShell, you get a chance to add extra storage accounts to the cluster. In PowerShell it looks something like this: $clusterConfig = New-AzureHDInsightClusterConfig -ClusterSizeInNodes 4 $allAccounts = @(‘myfirstaccount’,’mysecondaccount’,’mythirdaccount’) $clusterConfig = Set-AzureHDInsightDefaultStorage $clusterConfig ` -StorageAccountName…


WASB back stories: Masquerading a key-value store

There are a few excellent articles out there already that introduce the concept of Azure Blob Storage and how it’s accessed from HDInsight. In this post though I wanted to start giving some backstage looks at some of the decisions we made while exposing blobs to HDInsight, hopefully answering a tiny part of the oft-asked…


Merging small files on HDInsight

The situation The following is a drammatically enhanced story inspired by true events It’s weird how we can see our signal in the daytime. I guess that’s why Microsoft chose the Seattle area: all these clouds provide a great surface to project on. But here it was: the ominously cute elephant on the faint blue…

1

Analyzing Azure Table Storage data with HDInsight

HDInsight was optimized from the start to be able to quickly analyze data on Azure’s blob storage service using Hadoop by using the WASB file system to expose the data there as a native Hadoop file system. But the spirit of Hadoop has always been to be able to analyze data wherever it is, so…

28