Ask Learn
Preview
Ask Learn is an AI assistant that can answer questions, clarify concepts, and define terms using trusted Microsoft documentation.
Please sign in to use Ask Learn.
Sign inThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
When you create a Hive table, the table definition (column names, data types, comments, etc.) are stored in the Hive Metastore. Hive Metastore is critical part of Hadoop architecture as it acts as a central schema repository which can be used by other access tools like Spark, Interactive Hive (LLAP), Presto, Pig and many other Big Data engines.
Image – HDInsight Architecture and Hive Metastore
In HDInsight, we use Azure SQL database as Hive Metastore. Azure SQL DB is relational database-as-a-service (DBaaS) hosted in the Azure and give availability SLA of 99.99.
There are two ways you can setup Metastore for your HDInsight clusters
HDInsight default Metastore – If you don’t provide a custom Metastore option, HDInsight will provision
Metastore with every cluster type. Here are some key considerations with default Metastore
This is generally good option for relatively simple workload where you don’t have multiple clusters and don’t need metadata preserved beyond the life cycle of the cluster.
Custom Metastore – HDInsight lets you pick custom Metastore. It’s a recommended approach for production clusters due to number reasons such as
Image – Typical shared custom Metastore scenario in HDInsight
Here are general HDinsight Hive Metastore best practices that you should consider
How to select a custom Metastore during cluster creation?
You can easily point your cluster to a pre-created Azure SQL DB during cluster creation as well as after cluster is created. The option is under storage --> Metastore settings while creating a new Hadoop , Spark or Intractive Hive cluster from Azure portal
Additionally, You can add additional clusters to the Custom Metastore for Azure Portal as well as from Ambari configurations ( Hive -->Advanced)
As discussed above Hive Metastore is critical component of Hadoop and Spark architecture and picking up right Metastore strategy will certainly help you with right Architecture and user experience.
Ask Learn is an AI assistant that can answer questions, clarify concepts, and define terms using trusted Microsoft documentation.
Please sign in to use Ask Learn.
Sign in