Introducing File and Folder ACLs for Azure Data Lake Store

Article
07/31/2016

Overview

We’re excited today to announce the availability of File and Folder ACLs for the Azure Data Lake Store. Many of you have been eagerly awaiting this feature because it is critical in securing their big data.

When we launched the preview of Data Lake Store in October 2015, filesystem security was controlled by a single ACL at the root of store that applied to all files and folders underneath.

Starting today, ACLs can be set on any file or folder within the store, not just the root folder.

The Access Control Model used by Data Lake Store

We’ve emphasized that Azure Data Lake Store is compatible with WebHDFS. Now that ACLs are fully available, it’s important to understand the ACL model in WebHDFS/HDFS because they are POSIX-style ACLs and not Windows-style ACLs. Before we dive deep into the details on the ACL model, here are key points to remember.

POSIX-STYLE ACLs DO NOT ALLOW INHERITANCE. For those of you familiar with POSIX ACLs, this is not a surprise. For those coming from a Windows background this is very important to keep in mind. For example, if Alice can read files in folder /foo, it does not mean that she can read files in /foo/bar. She must be granted explicit permission to /foo/bar. The POSIX ACL model is different in some other interesting ways, but this lack of inheritance is the most important thing to keep in mind.
ADDING A NEW USER TO DATA LAKE ANALYTICS REQUIRES A FEW NEW STEPS. Fortunately, a portal wizard automates the most difficult steps for you.

The FULL DESCRIPTION of the Access Control model is here: https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-access-control/

Adding a New Data Lake Analytics User

If you want a new user to run U-SQL jobs in ADLA, the overall steps are shown below:

Assign the user to a role in the Azure Data Lake Analytics account (using Azure RBAC)
*Optional* Assign the user to a role in the Azure Data Lake Store account (using Azure RBAC)
Run the ADLA “Add User Wizard” for the user
Give user R-X access on all folders and their subfolders recursively where data must be read by U-SQL jobs
Give user RWX access on all folders and their subfolders recursively where data must be written by U-SQL jobs

Detailed Instructions can be found here: https://1drv.ms/w/s!AvdZLquGMt47gzohZ69Ob47k-P_y

Adding an New Data Lake Store User

Detailed Instructions can be found here: https://1drv.ms/w/s!AvdZLquGMt47gzyviEyNrAn8kAqS

Giving an HDInsight Cluster Access to Data Lake Store

Detailed Instructions can be found here: https://1drv.ms/w/s!AvdZLquGMt47gz3ks4YwQRMXGi3j

ProTip: Leverage the power of Active Directory Security groups

Repeating manual steps is both irritating and prone to error. It’s easier if you use Active Directory security groups.

First give the needed permissions to the security group. Afterwards, adding new users is simple: just add them to the security group. This will dramatically simplify maintaining and securing your Data Lake.