How to use HBase Java API with HDInsight HBase cluster, part 1


Recently we worked with a customer, who was trying to use HBase Java API to interact with an HDInsight HBase cluster. Having worked with the customer and trying to follow our existing documentations here and here, we realized that it may be helpful if we clarify a few things around HBase JAVA API connectivity to HBase cluster and show a simpler way of running the JAVA client application using HBase JAVA APIs. In this blog, we will explain the recommended steps for using HBase JAVA APIs to interact with HDInsight HBase cluster.

The Background:

Our existing documentation here does a nice job in explaining how to use Maven to develop a Java application and use HBase JAVA API to interact with HDInsight HBase cluster – but one may wonder why we are packaging the HBase Java client code as a MapReduce JAR and running the jar as a MapReduce job. This part begs a little more clarity. Remember that HBase JAVA API uses RPC (Remote Procedure Call) to communicate with an HBase Cluster, which means that the client application running HBase JAVA API code and the HBase cluster needs to exist in the same network and subnet. In the absence of Azure Virtual Netowrk, aka VNet, (I imagine, the documentation was written before we introduced the capability of installing HBase cluster in a Virtual Network), the example takes an approach of packaging the HBase Client code as a mapreduce JAR and submitting the job as a mapreduce job via WebHCat/Templeton. With this approach, the client Java JAR (containing HBase Java API calls) runs on one of the worker nodes in the HBase cluster and runs successfully. However, with the current capability of provisioning an HDInsight HBase cluster in a Virtual Network, as shown in this documentation, we feel that a more realistic and better approach for using HBase JAVA APIs is to provision the HDInsight HBase cluster in a VNet, provision the client machine/VM in the same Vnet and then run the HBase Java API Client on the client VM within the same Vnet – this is shown in the diagram below-

We will touch on each of these steps below –

Provision HDInsight HBase cluster in a VNet:

You can follow our HDInsight HBase documentation which has very detailed steps on how we can do this either via Azure Portal or Azure PowerShell.

Provision a Microsoft Azure VM in the same VNet and subnet:

Following the same documentation above, provision a Microsoft Azure virtual machine in the same VNet and subnet as the HDInsight HBase cluster – A standard Windows Server 2012 image with a small VM size should be sufficient. Since we need JDK installed on the VM in order to use HBase JAVA API, we have found it convenient to use an Oracle JDK image from the gallery for our testing (this is not required though and may have special pricing), like below –

If you choose a standard windows server VM (that does not have JDK installed), you can install JDK from Zulu.

Get the DNS Suffix to build FQDN of ZooKeeper nodes:

When using HBase Java API to connect to HBase cluster remotely, we must use the fully qualified domain name (FQDN). To determine this, we need to get the connection-specific DNS suffix of the HDInsight HBase cluster. The documentation shows multiple ways to accomplish this. The simplest way is to RDP into the HDInsight HBase cluster, and execute ipconfig /all and copy and paste the connection specific DNS suffix for the Ethernet adapter as shown on the screenshot below –

So, in my cluster as shown above, connection specific DNS suffix is AzimHbaseTest.d3.internal.cloudapp.net. Please make a note of the value from your cluster and we will use this to build the ZooKeeper FQDN in the next section. To verify that the virtual machine can communicate with the HBase cluster, use the following command, ping headnode0.<dns suffix> from the virtual machine, as shown below –

C:\Users\DBAdmin>ping headnode0.AzimHbaseTest.d3.internal.cloudapp.net
Pinging headnode0.AzimHbaseTest.d3.internal.cloudapp.net [10.0.0.6] with 32 bytes of data:
Reply from 10.0.0.6: bytes=32 time=3ms TTL=128
……
 

 Develop/Test HBase JAVA API Client on the Azure VM:

Our documentation has detailed steps of how to use Maven to develop a Java Client using HBase JAVA APIs and we don’t want to repeat all the steps here – but we would like to share our own experience and show a few different ways we can use Maven for developing the JAVA client. A few options (not limited to) are –

  1. Use Maven command line and a JAVA IDE (like Eclipse, IntelliJ etc)
  2. Use a JAVA IDE (that comes integrated with Maven) like Eclipse to develop the JAVA client

Using Maven command line and a Java IDE:

Note: I am using IntelliJ as an example – you can use your preferred JAVA IDE. Also, the steps below assume that you have installed IntelliJ on the Azure VM (client).

  1. From the command-line on your Azure VM, go to the folder where you wish to create the project. For example, cd C:\Maven\MavenProjects
  2. Use the mvn command, which is installed with Maven, to generate the project template, as shown below –

    mvn archetype:generate -DgroupId=com.microsoft.css -DartifactId=HBaseJavaApiTest -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false 

    This will create the src directory and POM.xml in the directory HbaseJavaApiTest (same as artifactId)

  3. Start the JAVA IDE IntelliJ and select ‘Import Project’ and point to the POM.xml created in the last step, as shown below –

     

  4. On the next window, in addition to the default settings enabled, also select the options ‘Import Maven Projects automatically’ and automatically download 1)sources and 2)documentation, as shown below –

     

  5. Select the default options on the next windows and the project will open inside IntelliJ – add the necessary JAVA source files and remove ‘test’ folder if you don’t plan to use it. In our case, we just just tested the CreateTable.java from the above documentation page and it looks something like this –

     

  6. Modify the POM.xml file as shown in the documentation– something like this –
  7. Create a new directory named conf in the HbaseJavaApiTest directory. In the conf directory, create a new file named hbase-site.xml and use the ZooKeeper FQDNs created using the DNS suffix you got previously, as shown below:
  8. Open a command prompt and change directories to the HbaseJavaApiTest directory. Use the following command to build a JAR containing the application:

    mvn clean package 

    This will clean any previous build artifacts, download any dependencies that have not already been installed, then build and package the application. The command will create a jar file HBaseJavaApiTest-1.0-SNAPSHOT.jar in the directory HbaseJavaApiTest\target.

 

Using Eclipse IDE to develop and build the HBase JAVA client application:

You can use the same steps as above for generating the project template using Maven and then import the project (POM.xml) in Eclipse. Alternatively, you can use the Eclipse IDE itself (without using Maven command line) to create the Maven project, as shown below-

1. Install a latest package of ‘Eclipse IDE for Java EE Developers’, such as Kepler SR2 or Luna SR1

2. Open Eclipse IDE and select File -> new -> Project -> Maven -> New Maven Project, Leave the default options and enter GroupId and ArtifactId

This will create a vanilla maven project. You can then modify the project to add dependencies to pom.xml, and add/modify the source code like CreateTable.java etc. When loading the project on Eclipse IDE, you may notice errors such as “Missing artifact jdk.tools:jdk.tools.jar:1.7”. This can be fixed by either modifying eclipse.ini to add the -vm argument to point to the JDK\bin directory, or by including the following dependency within pom.xml –

    </dependency>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
<version>${java.version}</version>
<scope>system</scope>
<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>

This is due to a limitation with Maven support on Eclipse IDE. It is documented here. Once the above changes are done, you can build the project and run it from within Eclipse IDE and debug as needed.

Running the HBase JAVA API Client on Azure VM:

If you have made your JAR an executable one using a Maven build plugin (see the POM.xml file above) like this –

<plugin>
<!– Build an executable JAR –>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.4</version>
<configuration>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<classpathPrefix>lib/</classpathPrefix>
<mainClass>com.microsoft.css.CreateTable</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
 

You can run the executable JAR from a command line. Change directory to HbaseJavaApiTest\target and run the following command –

java -jar HBaseJavaApiTest-1.0-SNAPSHOT.jar 

Alternatively, you can test and debug the code within the IDE itself, by setting a breakpoint and stepping through the code, as shown in the screenshot below-

 

I hope you find the blog helpful in using HBase JAVA API to interact with an HDInsight HBase cluster, we would love to hear your feedback! – in part 2, we will discuss some troubleshooting tools you can use for an HBase JAVA API client application.

Thanks Farooq for reviewing this!

– Azim Uddin and Dharshana Kumar

Comments (1)

  1. Amber says:

    Good blog to explain how to use HBase, thanks.

Skip to main content