We are often asked: "what is an Azure Data Lake Analytics Unit ? How does it affect my U-SQL job? How many do I need for my U-SQL job?" You will find the answers to these in this post.
An Azure Data Lake Analytics Unit, or AU, is a unit of computation resources made available to your U-SQL job. Each AU gives your job access to a set of underlying resources like CPU and memory. Currently, an AU is the equivalent of 2 CPU cores and 6 GB of RAM. As we see how people want to use the service, we may change the definition of an AU or more options for controlling CPU and memory usage.
How AUs are used during U-SQL Query Execution
When you submit a U-SQL script for execution, the U-SQL compiler parallelizes the U-SQL script into hundreds or even thousands of tasks called vertices. Each vertex is allocated to one AU. The AU is dynamically allocated to the task and released once that particular task is completed.
When submitting a U-SQL job, you can specify the number of AUs you want this job to run with also called parallelism. The ADLA service will reserve that amount for the exclusive use by your job for its entire duration. You are able to easily scale up and down the number of AUs you want to use for your different jobs. So, what happens if you allocate more or less AUs than the number of tasks or vertices that your job can be compiled down to?
- If there are more tasks than AUs, the tasks will wait in the queue for the next available AU. You are utilizing your AUs efficiently but your job will likely take longer to complete.
- If there are more AUs than tasks that are ready to run, the remaining AUs will be waiting for the next task. You would be under-utilizing the number of AUs but your job is likely to run as fast as it possibly could.
We will go in to more detail about the impact of AUs on job performance in a little bit. But first, let’s talk about how resource consumption by a job is actually measured. You will be charged based on this – so, it is kind of important.
What is an AU Second?
An “AU Second” is the basic unit of measurement of the compute resources (AUs) requested and reserved for a job over its entire execution time in terms of seconds. It is calculated as the product of the number of AUs assigned to the job and the total job execution time in second. The AU seconds consumed by a job determines how much the job will cost.
The current price for a given AU second and related options are provided here .
Now that you know how resource consumption is measured, let’s go back to how the number of AUs can impact job performance.
Will your job run faster if you assign more AUs to your job?
The shortest answer is: Maybe.
Increasing the number of AUs makes more compute resources available to a job and the job could run faster. However, depending on your job’s characteristics (e.g. how parallelizable it is, how much data it is processing etc.), you may not always see a proportional reduction in job execution time.
How should you decide the right number of AUs to assign to your job?
In order to decide upon the right number of AUs to assign to your U-SQL job, you need to consider the following:
- The characteristics of your job – Will your job benefit from the additional AUs? This may not be easy to determine when you run this job for the first time. For smaller data sets, we recommend starting with allocating 1 AU for 1 GB of input data. Besides input data, your computation complexity also affects how many AUs can this job be parallelized to. For this, we provide rich tools for you to understand and fine-tune the number of AUs. In next blog, we will walk you through how to use Azure Data Lake Tools for Visual Studio to choose an optimal amount of AU.
- Business requirements and budget - If your job can benefit from additional AUs then you need to consider your business scenarios and costs. Is your business willing to pay more for this job in order to have it run faster?
As an example, let’s consider the following scenarios:
- You run one job with 100 AUs and it lasts 1 hour. Your job will cost you the equivalent of 100*60 = 6,000 AU minutes.
- You run the same job with 1000 AUs and depending on its characteristics, it takes 6 minutes to complete (10X faster than before). In this case, you still pay for the same number of AU minutes i.e. 1000*6=6,000 AU minutes. This seems like a good deal and you should consider increasing the AUs for this job.
- It is also possible, that when you run this job with 1,000 AUs, it takes 12 minutes to complete (5X faster than before). At the same time, the cost of the job has doubled to 1,000*12 = 12,000 AU minutes. You have to now decide whether having this job run 5X faster is worth the 2X increase in cost.
We hope that this blog answers some of your basic questions about AUs, how they impact jobs and how to decide on the number of AUs to assign to your job. In the very near future, we will post on more advanced topics like job scheduling, resource usage diagnosis with ADL tools for Visual Studio, etc.
In the meanwhile, please reach out to us @azuredatalake at twitter.