Apache HBase/Phoenix - Tips , Tricks & Best Practices in Azure HDInsight

We will keep this page updated with HDInsight HBase/ Phoenix related commonly asked questions. You can leave comments/questions on this blog. Also, official channel to provide HDInsight related feedback and make feature requests is here

What is the advantage of using HBase in Azure HDInsight?

https://blogs.msdn.microsoft.com/ashish/2016/07/08/azure-hdinsight-hbase-a-nosql-database-like-no-other/

 

Can't wait , give me a quick link to deploy HBase cluster in HDInsight?

How can I deploy OpenTSDB with HDInsight HBase?

Sure , check this out

So, I just got HBase up and running in HDInsight and want to test the performance without writing any code. How can I "take HBase for a spin"?  

SSH into your cluster , and type
hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1

PerformanceEvaluation tool takes number of parameters and commands , just type hbase org.apache.hadoop.hbase.PerformanceEvaluation for all the options.  org.apache.hadoop.hbase.PerformanceEvaluation

Now go to HBase Shell and type list ,  you will see a new table and you can play with many more options.

Are there free online training's on HDInsight & HBase

Yes, check this out

Great! I am an enterprise customer and want to secure the cluster inside a virtual network. How can I do that?

Please follow the article Here

I really need to secure the VNET , what IP & ports Azure needs to operate the service

If you need to install HDInsight into a secured Virtual Network, you must allow inbound access over port 443 for the following IP addresses, which allow Azure to manage the HDInsight cluster.
168.61.49.99 23.99.5.239 168.61.48.131 138.91.141.162
Allowing inbound access from port 443 for these addresses will allow you to successfully install HDInsight into a secured virtual network.

Enough with playing , give me few best practices for great HBase performance

Read-
https://blogs.msdn.microsoft.com/ashish/2016/09/02/hdinsight-hbase-9-things-you-must-do-to-get-great-hbase-performance/

Watch-

There are some inconsistencies when running “hbase hbck”. Then I want to run “sudo -u hbase<or hdfs> hbase hbck -repair”, it reports access denied to the folders in azure data lake store

Try adding “-ignorePreCheckPermission” as a command parameter
hbase hbck -ignorePreCheckPermission

 

I have Hive and HBase clusters in same VNET. How can I access HBase table from Hive?

In below example , I have HBase Table'TestTable' which we will map to Hive Table 'hive_table'

Step 1 - Open Hive shell with correct parameters as shown below hive --hiveconf hbase.zookeeper.quorum=zk0-xxxx.xxxxxxxxxxxxxxxxxxxxxxx.cx.internal.cloudapp.net,zk1-xxxx.xxxxxxxxxxxxxxxxxxxxxxx.cx.internal.cloudapp.net,zk2-xxxx.xxxxxxxxxxxxxxxxxxxxxxx.cx.internal.cloudapp.net --hiveconf zookeeper.znode.parent=/hbase-unsecure
Step 2 - Map Hive table to HBase table hive> CREATE EXTERNAL TABLE hive_table(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:0") TBLPROPERTIES ("hbase.table.name" = "TestTable");
Step 3 - Get the data hive> select * from hbase_table;

Running hbase hbck shows multiple regions not assigned and holes in the region chain

The symptom is the count of regions is not balanced across all the nodes from HBase Master UI and running hbck shows multiple regions not assigned and holes in the region chain.

1. Run hbase zkcli
2. rmr /hbase/regions-in-transition (or rmr /hbase-unsecure/regions-in-transition)
3. exit hbase zkcli
4. Restart Active HMaster from Ambari
5. Run hbase hbck again to check issue is fixed (no unassigned regions and no holes).