The real confusion bug

This is a story of a recent debugging session that ended up with discovering a bug that I just love so much because it’s a bug I can explain to my mother (I think, haven’t tried – also no offense to my brilliant-pathologist-but-slightly-techphobic mother): the computer got confused because all these teams have such similar…

2

Fight for your right to monkey

The gentle start to a rant… In any software system, especially but certainly not just in the cloud, things can and will go horribly wrong in a variety of ways given enough time/opportunity. Some of those are unanticipated disasters: bugs, floods, hurricanes, etc. But a good number of those are just tradeoffs that are consciously…


The ultimate showdown of NoSQL destiny!

Sharks and bees and… fast Italians?! If you’ve been following this blog recently, you’d have noticed that I’m having a blast trying different data products on Azure and playing with them. I recently managed to get Spark / Spark SQL (Shark’s replacement) running on Azure in the same way, but rather than dedicate a post…

1

HDInsight working with different storage accounts

Storage accounts – configured and otherwise When you create an HDInsight cluster, whether through the Azure portal or programmatically through e.g. PowerShell, you get a chance to add extra storage accounts to the cluster. In PowerShell it looks something like this: $clusterConfig = New-AzureHDInsightClusterConfig -ClusterSizeInNodes 4 $allAccounts = @(‘myfirstaccount’,’mysecondaccount’,’mythirdaccount’) $clusterConfig = Set-AzureHDInsightDefaultStorage $clusterConfig ` -StorageAccountName…


More Blue Coffee – Presto on Azure

Presto! The Facebook team recently open-sourced a very cool distributed query engine that they concisely called Presto. Unlike say HBase, Presto doesn’t store its own data, instead it can plug-in data from a variety of sources (e.g. Cassandra) and offers an ANSI SQL query engine that distributes the query processing to many nodes. Since I’m…


Living on the edge – testing without mocking (Part 2)

The (micro-) epic continues… In the first part of this N-part series (where N may or may not equal 2), I thoroughly convinced you that while testing against mock systems is wise and all, it’s always cool to also live dangerously and write/run unit tests against live systems at least every once in a while….


Come in – have some Blue Coffee

Initial sips There’s an explosion of awesome OSS projects happening in the big data analysis space now. A big chunk of them follow a similar pattern: they’re released as Apache projects under the Apache Software Foundation, they are typically written in Java or at least a JVM language like Scala, and even though the JVM…

1

Living on the edge – testing without mocking (Part 1)

You must be mocking me In this day & age of SaaS and everything cloud, a lot of us rarely write code in isolation anymore. Our code is always getting a live twitter feed, using Skynet’s machine learning service to analyze it for dangerous human rebellions, and reacting on the results by calling into the…


Misadventures in immutability

My infatuation Starting I think a few years ago, I began to heavily drink the immutability kool-aid. I started getting oddly satisfied whenver I saw a class like this: public sealed class WeatherState { private readonly float _temperature; // In Celsius of course private readonly float _windSpeed; // km/h … } With this kind of…

6

WASB back stories: Masquerading a key-value store

There are a few excellent articles out there already that introduce the concept of Azure Blob Storage and how it’s accessed from HDInsight. In this post though I wanted to start giving some backstage looks at some of the decisions we made while exposing blobs to HDInsight, hopefully answering a tiny part of the oft-asked…