HDInsight HBase: How to Improve HBase cluster restart time by Flushing tables?

This blog is written by Nitin Verma, Sr. Software Engineer, HDInsight.

Do you restart or re-create your HDInsight HBase clusters often? and wished restart/re-create times were faster? if yes, please read on-

This blog introduces a new script for HDInsight HBase service through which you can flush the MemStore of all HBase tables conveniently. The script can significantly reduce the HBase service restart time by avoiding WAL recovery for region edits that has been flushed.

Mechanism in HBase which avoids recovery for already flushed edits:

When flush 'table' operation is triggered, all the regions belonging to that table will flush independently. Once the HFile corresponding to a region is flushed, it records the max sequence id in metadata and notifies the WAL corresponding to the regionserver. WAL maintains a mapping table for regions and their corresponding flushed sequence id's. When the HBase cluster restarts, the hMaster will distribute flushed sequence id's per region to the recovery threads splitting the WAL, so that they can skip the edits which have already been persisted in HFiles.

How to run the script:

Below are the two ways to run the script.

1. Inside the cluster.

SSH to the head node of the cluster.
wget https://raw.githubusercontent.com/Azure/hbase-utils/master/scripts/flush_all_tables.sh bash ./flush_all_tables.sh

2. From HDInsight Azure portal script action:

a. Login to https://ms.portal.azure.com
b. Select the desired HBase cluster.
c. Click on "Script Actions" button.
d. Click on + Submit New button.
e. Give a short meaningful name (For example: "Flushing all hbase tables")
f. Give Bash Script URI as
https://raw.githubusercontent.com/Azure/hbase-utils/master/scripts/flush_all_tables.sh
g. Select just Head and deselect Region and Zookeeper nodes.
h. Give hn1 as parameter, so that script will execute on the idle headnode.
i. Click on Create button.

The progress of script can be monitored from Ambari UI by accessing "ops button", which shows active operation count in blue.