It seems like everything is “HD” these days. Your laptop has a “high density” plastic case, your car has a “heavy duty” battery, and your TV has a “high definition” picture. So I guess it’s only to be expected that your data analysis tools will be “highly distributed”. And at last we’re “happily done” with our guide to HDInsight.
Yep, after several months fighting with Microsoft’s Big Data solution we’ve shipped the first version of our guide to Windows Azure HDInsight, based on the current preview release. It’s been one of the most troublesome guides in terms of figuring out the structure and the content boundaries; and interfacing with the exciting world of open source technologies (as I ruminated about just a few weeks ago). But we got it all together in the end.
Our guide contains the obligatory “What is Big Data” section, as well as describing how HDInsight integrates with the rest of the Microsoft data platform. Then there’s a chapter about loading data, one about performing queries and transformations, and one about consuming the data. From there on in the guide discusses automating the whole data analysis process; plus the usual management and monitoring topics as well.
What’s likely to be of most interest to developers, however, are the scenario chapters and associated code examples that show how you can use HDInsight in four distinct ways: as an experimental platform for investigating interesting data, as an extract/transform/load (ETL) mechanism for data validation and cleansing, as a data warehouse that you can turn on and off on demand, and as a data source for your existing enterprise business intelligence (BI) systems.
Here’s our tube map for the guide:
Unlike most other books and guides, we’ve concentrated on integration of HDInsight with your existing business processes, and combining it with data analysis and visualization tools such as Power View and GeoFlow, as part of an end-to-end solution. Yes, it’s Microsoft-centric – but, hey, that’s who pays my wages…