HDInsight: Jiving about Hadoop and Hive with CAT

Tomorrow I will be talking about Hive as part of Pragmatic Work's Women in Technology (WIT) month of webcasts. I am proud to be part of this lineup with all these stellar WITs! I encourage my fellow WITs to get more involved in your data community and if you don't already do so start tweeting, blogging, and speaking. I am happy to coach you through your first speaking engagement if you are interested. Get out there and start showing the world what you can do!

Thursday's talk is going to be HDInsight: Jiving about Hadoop and Hive with CAT. Let's break that title down.

HDInsight is Microsoft's distribution of Hadoop. As part of the HDInsight project we have checked code back into the core Apache Hadoop source code to make the core code runs great on Windows. We are also adding functionality and features such as JavaScript and Azure Storage Vault that make the product more robust and enterprise friendly. This week the HDInsight Service Preview on Azure became available to those with an Azure subscription.

Hadoop is a scale out methodology that allows businesses to quickly consume and analyze data in ways they haven't been able to before. This can lead to faster, better business insights and business actions.

Hive is a way to impose metadata and structure on the loosely structured (unstructured, multi-structured, semi-structured) data that resides in Hadoop's HDFS file system. With Hive and the Hive ODBC driver you can make Hadoop data look like any other data source to your familiar BI tools such as Excel. PowerPivot can connect to Hive data, mash that data up with existing data sources such as SQL Azure, SQL Server, and OData, and allow you to visualize it with Power View. I have an end to end demo of this: Hurricane Sandy Mash-Up: Hive, SQL Server, PowerPivot & Power View.

CAT is my team at Microsoft. The Customer Advisory Team (CAT) works with customers who are doing new, unusual, and interesting things that push the boundaries of technology. We share what we learn with the community so you can do your jobs better and we take what we learn from you to the product team to help improve the product.

My slides are attached at the bottom of this post. I believe a recording of the talk will be posted by Pragmatic Works on their site.

I look forward to "seeing" you all at my talk tomorrow and would love to see your tweets or hear directly from you afterwards.

I hope you’ve enjoyed this small bite of big data! Look for more blog posts soon on the samples and other activities.

Note: the CTP and TAP programs are available for a limited time. Details of the usage and the availability of the CTP may change rapidly.

Digg This

PragmaticWorksHDInsightJivingAboutHadoopAndHiveWithCATMar202013.pptx