Get more out of your U-SQL scripts — Reuse them by adding script parameters with Azure PowerShell


Do you submit U-SQL jobs to Data Lake Analytics using Azure PowerShell? Do you find yourself making lots of minor changes to your U-SQL script between job submissions? Make life easier by parameterizing your U-SQL script and passing in parameter values when submitting the job through Azure PowerShell.

You need two things to make this work with your Data Lake Analytics account:

  1. A U-SQL script that references U-SQL variables.
  2. A set of values for these U-SQL variables.

Use the job submission cmdlet, Submit-AzureRmDataLakeAnalyticsJob (or Submit-AdlJob), to submit the job and read in the values for the U-SQL variables.

Here's how to use this feature.

Prepare your script

Let’s say you’re starting out with the following script:

@searchlog = EXTRACT UserId int, 
                     Start DateTime, 
                     Department string, 
                     Query string, 
                     Duration int, 
                     Urls string, 
                     ClickedUrls string 
               FROM "/Input/input.tsv"
               USING Extractors.Tsv();

@searchlogFiltered = SELECT *
                       FROM @searchlog
                       WHERE Department == "Finance"
                       AND Duration >= 5;

OUTPUT @searchlogFiltered
  TO "/Output/output.tsv"
  USING Outputters.Tsv();

You’ll first parameterize your script by referencing U-SQL variables. Here's an example of the same script with variables @Department, @Duration, and @StartDateTime, along with default values:

DECLARE EXTERNAL @Department string = "Finance";
DECLARE EXTERNAL @Duration int = 2;
DECLARE EXTERNAL @StartDateTime DateTime = new DateTime(2017, 10, 1, 0, 0, 0, 0);

@searchlog = EXTRACT UserId int, 
                     Start DateTime, 
                     Department string, 
                     Query string, 
                     Duration int, 
                     Urls string, 
                     ClickedUrls string 
               FROM "/Input/input.tsv"
               USING Extractors.Tsv();

@searchlogFiltered = SELECT *
                       FROM @searchlog
                       WHERE Department == @Department
                         AND Duration >= @Duration
                         AND Start >= @StartDateTime;

OUTPUT @searchlogFiltered
  TO "/Output/output.tsv"
  USING Outputters.Tsv();

If you’d like to use a script that can run with and without script parameters, make sure you have a default value for each U-SQL variable in use by adding DECLARE EXTERNAL statements at the top, as shown above.

Submit your script with parameter values

You can use this feature either via Azure Cloud Shell or on a Windows machine with Azure PowerShell installed.

When submitting, simply construct a hashtable of U-SQL variable names to values and pass it in using the -ScriptParameter cmdlet parameter. The .NET type of each value in the hashtable is used when defining the variable in U-SQL.

Supported types include:

byte, sbyte, int, uint (or uint32), long, ulong (or uint64), float, double, decimal, short (or int16), ushort (or uint16), char, string, DateTime, bool, Guid, or byte[]

Here's how you can add script parameters to your job submission:

$parameters = [ordered]@{}

$parameters["Department"] = "Sales"
$parameters["Duration"] = 10
$parameters["StartDateTime"] = (Get-Date).AddDays(-14)

Submit-AdlJob -Account "contosoadla" -Name "New Job" `
   -ScriptPath $scriptPath -ScriptParameter $parameters

The cmdlet prepends U-SQL variable declarations above the main script contents, and it overwrites the default parameter value for @Department. Note: The [ordered] dictionary type used above is not necessary in this case, but it ensures that the ordering of the DECLARE statements below match the order of the parameters you specified.

DECLARE @Department string = "Sales";
DECLARE @Duration int = 10;
DECLARE @StartDateTime DateTime = new DateTime(2017, 12, 6, 0, 0, 0, 0);
DECLARE EXTERNAL @Department string = "Finance";
DECLARE EXTERNAL @Duration int = 2;
DECLARE EXTERNAL @StartDateTime DateTime = new DateTime(2017, 10, 1, 0, 0, 0, 0);

@searchlog = EXTRACT UserId int, 
                     Start DateTime, 
                     Department string, 
                     Query string, 
                     Duration int, 
                     Urls string, 
                     ClickedUrls string 
               FROM "/Input/input.tsv"
               USING Extractors.Tsv();

@searchlogFiltered = SELECT *
                       FROM @searchlog
                       WHERE Department == @Department
                         AND Duration >= @Duration
                         AND Start >= @StartDateTime;

OUTPUT @searchlogFiltered
  TO "/Output/output.tsv"
  USING Outputters.Tsv();

This makes it easy to submit multiple versions of the same U-SQL script with slight variations. As an example, let's say your U-SQL script accepts an integer variable @NumRecords. You can submit the script with values for @NumRecords from 1 to 10 like this:

(1..10) | %{ Submit-AdlJob -Account "contosoadla" -Name "Job $_" `
                -ScriptPath $scriptPath -ScriptParameter @{"NumRecords"=$_ }

Note: To avoid collisions when submitting lots of jobs at once, make sure the jobs don't write to the same output file by incorporating a script parameter when determining output paths.

Try this feature today!

Give this feature a try and let us know your feedback in the comments.

Are you looking for any other samples, features, or improvements? Let us know and vote for them on our UserVoice.


Comments (5)

  1. Soum says:

    Hi Matt,

    This is a nice and clean way of parameterizing the USQL query.

    Is it possible to define a column name as a variable (@Col) and use it in the SELECT section? For EXAMPLE …

    DECLARE EXTERNAL @Col string = “UserId”; //Define a column name as variable

    DECLARE EXTERNAL @Department string = “Finance”;
    DECLARE EXTERNAL @Duration int = 2;
    DECLARE EXTERNAL @StartDateTime DateTime = new DateTime(2017, 10, 1, 0, 0, 0, 0);

    @searchlog = EXTRACT UserId int,
    Start DateTime,
    Department string,
    Query string,
    Duration int,
    Urls string,
    ClickedUrls string
    FROM “/Input/input.tsv”
    USING Extractors.Tsv();

    @searchlogFiltered = SELECT sl.@Col, sl.Query //Select 2 columns UserId and Query
    FROM @searchlog sl
    WHERE sl.Department == @Department
    AND sl.Duration >= @Duration
    AND sl.Start >= @StartDateTime;

    OUTPUT @searchlogFiltered
    TO “/Output/output.tsv”
    USING Outputters.Tsv();

    Thanks,
    Soum

  2. Soum says:

    There seems to be some syntax issues in the below section:

    (1..10) | %{ Submit-AdlJob -Account “contosoadla” -Name “Job $_” `
    -ScriptPath $scriptPath -ScriptParameter @{“NumRecords”=$_ }

    -Soum

  3. Soum says:

    Hi Matt,

    This is a nice and clean way of parameterizing the USQL query.

    Is it possible to define a column name as a variable (@Col) and use it in the SELECT section? For EXAMPLE …

    DECLARE EXTERNAL @Col string = “UserId”; //Define a column name as variable

    DECLARE EXTERNAL @Department string = “Finance”;
    DECLARE EXTERNAL @Duration int = 2;
    DECLARE EXTERNAL @StartDateTime DateTime = new DateTime(2017, 10, 1, 0, 0, 0, 0);

    @searchlog = EXTRACT UserId int,
    Start DateTime,
    Department string,
    Query string,
    Duration int,
    Urls string,
    ClickedUrls string
    FROM “/Input/input.tsv”
    USING Extractors.Tsv();

    @searchlogFiltered = SELECT sl.@Col, sl.Query //Select 2 columns UserId and Query
    FROM @searchlog sl
    WHERE sl.Department == @Department
    AND sl.Duration >= @Duration
    AND sl.Start >= @StartDateTime;

    OUTPUT @searchlogFiltered
    TO “/Output/output.tsv”
    USING Outputters.Tsv();

    Thanks,
    Soum

  4. Soum says:

    Hi Matt,

    Thanks for the post!
    This is a nice and clean way of parameterizing the USQL query.

    Is it possible to parameterize a SELECT column i.e. define a column name as a variable (@Col) and use it in the SELECT section? For EXAMPLE …

    DECLARE EXTERNAL @Col string = “UserId”; //Define a column name as variable

    DECLARE EXTERNAL @Department string = “Finance”;
    DECLARE EXTERNAL @Duration int = 2;
    DECLARE EXTERNAL @StartDateTime DateTime = new DateTime(2017, 10, 1, 0, 0, 0, 0);

    @searchlog = EXTRACT UserId int,
    Start DateTime,
    Department string,
    Query string,
    Duration int,
    Urls string,
    ClickedUrls string
    FROM “/Input/input.tsv”
    USING Extractors.Tsv();

    @searchlogFiltered = SELECT sl.@Col, sl.Query //Select 2 columns UserId and Query
    FROM @searchlog sl
    WHERE sl.Department == @Department
    AND sl.Duration >= @Duration
    AND sl.Start >= @StartDateTime;

    OUTPUT @searchlogFiltered
    TO “/Output/output.tsv”
    USING Outputters.Tsv();

    Thanks,
    Soum

  5. shaun says:

    This is cool but doesn’t it muddy the water with U-SQL procedures? Is there a reason why either would be preferable over the other?

Skip to main content