SSH tunnel to endpoints in Azure VNet from Windows

When you deploy virtual machines on Azure, a good practice is to set up Azure Network Security Groups (NSG) to minimize the exposure of endpoints and limit access to those endpoints to only known IPs from the Internet.  In order to access the rest of the endpoints in your Virtual Network (VNet) on Azure, you can…

0

Backup and Restore Cassandra on Azure

When you run Cassandra on virtual machines on Azure, one way to back up and restore data is to rsync Cassandra snapshots to data disks attached to each VM.  The data disks in Azure are backed by Azure blob storage and automatically benefit from the durability that Azure storage brings.  You can also copy the…

2

Backup Cloudera data to Azure Storage

Azure Blob Storage supports an HDFS interface which can be accessed by HDFS clients using the syntax wasb://.  The hadoop-azure module which implements this interface is distributed with Apache Hadoop, but is not configured out of the box in Cloudera.  In this blog, we will provide instructions on how to backup Cloudera data to Azure…

0

Run Jupyter Notebook on Cloudera

In a previous blog, we demonstrated how to enable Hue Spark notebook with Livy on CDH.  Here we will provide instructions on how to run a Jupyter notebook on a CDH cluster.   These steps have been verified on a default deployment of Cloudera CDH cluster on Azure.  At the time of this writing, the…

2

Enable Kerberos on Cloudera with Azure AD Domain Service

In this previous blog series we documented how to integrate Active Directory deployed in virtual machines on Azure with Cloudera. In that scenario, we need to deploy and maintain the domain controller VMs ourselves. In this article, we will use Azure Active Directory Domain Service (AADDS) to integrate Kerberos and single-sign-on with Cloudera.  AADDS is a managed service…

2

Run Hue Spark Notebook on Cloudera

When you deploy a CDH cluster using Cloudera Manager, you can use Hue web UI to run, for example, Hive and Impala queries.  But Spark notebook is not configured out of the box.  Turns out installing and configuring Spark notebooks on CDH isn’t as straightforward as is described in their existing documentation.  In this blog,…

7

Integrating Cloudera cluster with Active Directory (Part 3/3)

In Part 1 and Part 2 of this blog, we covered the first 5 steps, here we will describe the remaining Cloudera specific steps to enable Kerberos and Single-Sign-On for web consoles. Deploy Active Directory with HA in Azure Deploy Linux VMs for the Cloudera cluster Enable Active Directory DNS on the Linux VMs Sync…

0

Integrating Cloudera cluster with Active Directory (Part 2/3)

In Part 1 of this blog, we covered the first 4 steps, here we will describe how to join the LInux VMs to AD. Deploy Active Directory with HA in Azure Deploy Linux VMs for the Cloudera cluster Enable Active Directory DNS on the Linux VMs Sync Linux VMs to Active Directory time service Join…

0

Integrating Cloudera cluster with Active Directory (Part 1/3)

[Update 8/2017: With Cloudera Director support on Azure, you can now automate this whole process of enabling Kerberos on a Cloudera cluster.  See this Github repo for instructions and scripts.] In this blog post, we will show you step by step how to secure a Cloudera cluster by enabling DNS, Single-Sign-On (SSO) and Kerberos with…

5

Real Time Analytics with Azure Event Hubs, Cloudera, and Azure SQL

In this blog post, I will demonstrate how to ingest data from Azure Event Hubs to Spark Streaming running on Cloudera EDH, process the data in real time using Spark SQL, and write the results to Azure SQL database.  Alternatively, data processing can also be done using Impala.  This example uses the same data generator as described in…

0