Backup Cloudera data to Azure Storage

Azure Blob Storage supports an HDFS interface which can be accessed by HDFS clients using the syntax wasb://.  The hadoop-azure module which implements this interface is distributed with Apache Hadoop, but is not configured out of the box in Cloudera.  In this blog, we will provide instructions on how to backup Cloudera data to Azure…

0

Run Jupyter Notebook on Cloudera

In a previous blog, we demonstrated how to enable Hue Spark notebook with Livy on CDH.  Here we will provide instructions on how to run a Jupyter notebook on a CDH cluster.   These steps have been verified on a default deployment of Cloudera CDH cluster on Azure.  At the time of this writing, the…

2

Enable Kerberos on Cloudera with Azure AD Domain Service

In this previous blog series we documented how to integrate Active Directory deployed in virtual machines on Azure with Cloudera. In that scenario, we need to deploy and maintain the domain controller VMs ourselves. In this article, we will use Azure Active Directory Domain Service (AADDS) to integrate Kerberos and single-sign-on with Cloudera.  AADDS is a managed service…

2

Run Hue Spark Notebook on Cloudera

When you deploy a CDH cluster using Cloudera Manager, you can use Hue web UI to run, for example, Hive and Impala queries.  But Spark notebook is not configured out of the box.  Turns out installing and configuring Spark notebooks on CDH isn’t as straightforward as is described in their existing documentation.  In this blog,…

7

Integrating Cloudera cluster with Active Directory (Part 3/3)

In Part 1 and Part 2 of this blog, we covered the first 5 steps, here we will describe the remaining Cloudera specific steps to enable Kerberos and Single-Sign-On for web consoles. Deploy Active Directory with HA in Azure Deploy Linux VMs for the Cloudera cluster Enable Active Directory DNS on the Linux VMs Sync…

0

Integrating Cloudera cluster with Active Directory (Part 2/3)

In Part 1 of this blog, we covered the first 4 steps, here we will describe how to join the LInux VMs to AD. Deploy Active Directory with HA in Azure Deploy Linux VMs for the Cloudera cluster Enable Active Directory DNS on the Linux VMs Sync Linux VMs to Active Directory time service Join…

0

Integrating Cloudera cluster with Active Directory (Part 1/3)

[Update 8/2017: With Cloudera Director support on Azure, you can now automate this whole process of enabling Kerberos on a Cloudera cluster.  See this Github repo for instructions and scripts.] In this blog post, we will show you step by step how to secure a Cloudera cluster by enabling DNS, Single-Sign-On (SSO) and Kerberos with…

5

Real Time Analytics with Azure Event Hubs, Cloudera, and Azure SQL

In this blog post, I will demonstrate how to ingest data from Azure Event Hubs to Spark Streaming running on Cloudera EDH, process the data in real time using Spark SQL, and write the results to Azure SQL database.  Alternatively, data processing can also be done using Impala.  This example uses the same data generator as described in…

0

Connect Cloudera to Azure ML Hive Reader

Azure Machine Learning supports Hive as a data source using WebHCat API.  In this post, I will show you how to configure Cloudera to connect to Azure ML through WebHCat.  These steps have been verified on a Cloudera cluster created from Azure Marketplace.  If you don’t already have a cluster, you can follow this blog post to deploy one.  We…

0