The real confusion bug

This is a story of a recent debugging session that ended up with discovering a bug that I just love so much because it’s a bug I can explain to my mother (I think, haven’t tried – also no offense to my brilliant-pathologist-but-slightly-techphobic mother): the computer got confused because all these teams have such similar…

2

Fight for your right to monkey

The gentle start to a rant… In any software system, especially but certainly not just in the cloud, things can and will go horribly wrong in a variety of ways given enough time/opportunity. Some of those are unanticipated disasters: bugs, floods, hurricanes, etc. But a good number of those are just tradeoffs that are consciously…


The ultimate showdown of NoSQL destiny!

Sharks and bees and… fast Italians?! If you’ve been following this blog recently, you’d have noticed that I’m having a blast trying different data products on Azure and playing with them. I recently managed to get Spark / Spark SQL (Shark’s replacement) running on Azure in the same way, but rather than dedicate a post…

1

HDInsight working with different storage accounts

Storage accounts – configured and otherwise When you create an HDInsight cluster, whether through the Azure portal or programmatically through e.g. PowerShell, you get a chance to add extra storage accounts to the cluster. In PowerShell it looks something like this: $clusterConfig = New-AzureHDInsightClusterConfig -ClusterSizeInNodes 4 $allAccounts = @(‘myfirstaccount’,’mysecondaccount’,’mythirdaccount’) $clusterConfig = Set-AzureHDInsightDefaultStorage $clusterConfig ` -StorageAccountName…


More Blue Coffee – Presto on Azure

Presto! The Facebook team recently open-sourced a very cool distributed query engine that they concisely called Presto. Unlike say HBase, Presto doesn’t store its own data, instead it can plug-in data from a variety of sources (e.g. Cassandra) and offers an ANSI SQL query engine that distributes the query processing to many nodes. Since I’m…


Come in – have some Blue Coffee

Initial sips There’s an explosion of awesome OSS projects happening in the big data analysis space now. A big chunk of them follow a similar pattern: they’re released as Apache projects under the Apache Software Foundation, they are typically written in Java or at least a JVM language like Scala, and even though the JVM…

1

Analyzing Azure Table Storage data with HDInsight

HDInsight was optimized from the start to be able to quickly analyze data on Azure’s blob storage service using Hadoop by using the WASB file system to expose the data there as a native Hadoop file system. But the spirit of Hadoop has always been to be able to analyze data wherever it is, so…

28