- Deploy Active Directory with HA in Azure
- Deploy Linux VMs for the Cloudera cluster
- Enable Active Directory DNS on the Linux VMs
- Sync Linux VMs to Active Directory time service
- Join the Linux VMs to Active Directory and enable Single-Sign-On
- Install Cloudera
- Enable Kerberos on Cloudera
- Enable Single-Sign-On for Cloudera web consoles
Step 6: Install Cloudera
- adminUserName: this could be the AD sudo user if the default user created with the VM has been disabled
Step 7: Enable Kerberos on Cloudera
- Add server role "Certificate Authority" to the PDC. After installation, complete configuration with default options.
- Run mmc on PDC, add "Certificates" snapin, Computer Account -> Local Computer -> Personal -> Certificates -> All Tasks -> Request New Certificate to request a certificate for Kerberos authentication.
sudo su hdfs
hdfs dfs -ls /
//should display security error
7. Create the hadoop superuser hdfs in AD in the same NIS domain. ssh in as hdfs@<domain name>, run the above command again, it should succeed.
8. Create hadoop users in AD in the same NIS domain, create their home directory with hdfs, then log in as a hadoop user, and run mapreduce job.
//log in as hdfs, create home directory for each hadoop user
hdfs dfs -mkdir /user/alice
hdfs dfs -chown alice /user/alice
//log in as a hadoop user, for example, then run mapreduce
hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 10000
Step 8: Enable Single-Sign-On for Cloudera web consoles
- Follow the Cloudera documentation to enable Single-Sign-On using AD credentials. Once enabled, it will prompt for user credential when we open, for example, Yarn ResourceManager Web UI. We need to provide the fully qualified user name, for example, firstname.lastname@example.org.
- To enable AD authentication for Cloudera Manager console, configure External Authentication:
Note that users must be explicitly added to the AD groups specified here, for example,
Restart Cloudera Manager server:
service cloudera-scm-server restart
All done. In summary, we started from scratch, created an AD forest, deployed a Cloudera cluster, enabled DNS and joined the Cloudera cluster VMs to the AD. Finally we enabled authentication on Cloudera and web consoles using the credentials managed by AD.