NOTE This post is part of a series on a deployment of Apache Drill on the Azure cloud.
The intent of Apache Drill is to make it easy for you to query across a wide-range of relational and NoSQL data stores. If you are running Drill in Azure, you are likely leveraging Azure SQL Database for some portion of your relational data storage. In this post, I want to document the steps for setting this up. The documentation on this topic in the official Apache Drill documentation is actually really good so that this post is presented here just for the sake of being complete.
Assuming we already have an Azure SQL Database up and running, our high-level steps for configuring Drill are as follows:
- Copy the SQL JDBC Driver to the Drill Servers
- Restart the Drill Cluster
- Create a Storage Plugin for the Azure SQL Database
- Test Connectivity between Drill and the Azure SQL Database
Copy the SQL JDBC Driver to the Drill Servers
On each Drill VM, all I need to do is SSH into the VM and download the latest SQL Server JDBC driver to /drill/current/jars/3rdparty. Links to the latest SQL Server JDBC driver can be found here:
sudo wget https://download.microsoft.com/download/0/2/A/02AAE597-3865-456C-AE7F-613F99F850A8/sqljdbc_6.0.7130.100_enu.tar.gz
sudo tar -xzvf sqljdbc_6.0.7130.100_enu.tar.gz
sudo cp /drill/sqljdbc_6.0/enu/sql*.jar /drill/current/jars/3rdparty/
sudo rm -r /drill/sqljdbc_6.0/
sudo rm /drill/sqljdbc*.gz
Restart the Drill Cluster
In working through the steps in this post, I went directly from the installation of the JDBC drivers in the pervious step into the creation of the Storage Plugin in the next step. When I attempted to create the Storage Plugin, I kept receiving an error message stating that Drill was "unable to create/update storage."
Reviewing the Drill documentation, my first thought was that Drill might be scanning the JARs in the 3rdparty directory at start up and therefore wouldn't see the newly added JARs until each server was restarted. So I went about the process of restarting each Drill VM, one at a time. Once all the Drill VMs were restarted, I attempted to create the Storage Plugin again but received the exact same message.
After verifying everything was in the right place and re-reading the documentation, I decided to shutdown all the Drill VMs in the cluster before starting them back up again. This did the trick so that I would recommend that you shutdown all your Drill VMs and only restart them once all the VMs in the Drill cluster are offline before proceeding to the next step.
Create a Storage Plugin for the Azure SQL Database
With the Drill cluster shutdown and re-started, I now need to configure the Storage Plugin. As before, I will connect to the Drill Web Console at http://dr004.westus.cloudapp.azure.com:8047. Once connected, I navigate to the Storage page and, under New Storage Plugin, enter a friendly name for the database connection before clicking the Create button.
NOTE Do not use sql or one of the other Drill reserved words as the name of your Storage Plugin.
In the resulting Configuration page, I replace null with the following JSON block, applying the appropriate substitutions:
With the server name, database name, user name and password all updated, I click Create to create the plugin.
Test Connectivity between Drill and the Azure SQL Database
NOTE In my post on configuring a Storage Plugin to work with WASB, I recommended restarting your Drill cluster at this point. In working through the actions documented here, I found that I could immediately move to the Query page in the Drill Web Console and issue queries without another restart. At this point, I'll just say your mileage may vary so attempt another restart if you receive any error messages in this last set of steps.
On the Query page of the Drill Web Console, enter a simple SELECT statement in the format SELECT * FROM storageplugin.schema.name;. For example, in this query, I am querying the sys.objects table in the database targeted by the Storage Plugin named drilldb:
SELECT * FROM drilldb.sys.objects;
Clicking the Submit button should return a set of results from your query, demonstrating you have connectivity to the Azure SQL Database.
A Footnote Regarding the Azure SQL Database Firewall
By default, an Azure SQL Database permits connection attempts to it from applications running in Azure. In other words, you do not need to mess with the default firewall on your database or the logical SQL Server on which it is deployed in order for Drill to attempt a connection. For more info on Azure SQL Database firewalls, check out this document.