Got some time to learn Big Data Technologies? How about starting with Hive which is considered the de facto standard for SQL queries in Hadoop
We just released HDInsight labs used during the BUILD conference code challenge. You will need 2 things to run these labs
1- HDInsight Cluster - How to create?
2- Step by Step Instructions - Lab Instructions
Apache Hive is a data warehouse system for Hadoop. Hive enables data summarization, querying, and analysis of data. Hive queries are written in HiveQL, which is a query language similar to SQL.
Hive allows you to project structure on largely unstructured data. After you define the structure, you can use HiveQL to query the data without knowledge of Java or MapReduce.
HDInsight provides several cluster types, which are tuned for specific workloads. The following cluster types are most often used for Hive queries:
- Interactive Hive: A Hadoop cluster that provides Low Latency Analytical Processing (LLAP) functionality to improve response times for interactive queries. For more information, see the Start with Interactive Hive in HDInsight document.
- Hadoop: A Hadoop cluster that is tuned for batch processing workloads. For more information, see the Start with Hadoop in HDInsightdocument.
- Spark: Apache Spark has built-in functionality for working with Hive. For more information, see the Start with Spark on HDInsightdocument.
- HBase: HiveQL can be used to query data stored in HBase. For more information, see the Start with HBase on HDInsight document.