Azure Data Factory: Detecting and Re-Running failed ADF Slices


 

Recently I came across  a scenario where I need to detect failed slices of all Datasets in Azure Data Factory, in my case I need to detect for last 3 months and the number of slices was around 600+, they failed due to validation error as the source data wasn’t present and after a number of re-try slices were marked as failed.

In such cases its difficult to perform re-run from the  Portal as you need to right click on each slice and run it explicitly.

Solution: I wrote a following PowerShell Script, this script will detect all failed slices in a given Azure Data Factory and re-run same with your consent.

You can use same script not only for failed slices but for any status, you just need to change the Dataset status in filtering of slices, shown in following script.

I am also planning to write a solution which will run as a service in a worker role and automatically detect failed slices in a given time and re-run same.

Question can be asked, that in ADF you already have re-run logic they why you need to go through the hassles of writing and running script.

Yes we do have but after x number of re-runs a slices is marked as failed and only way is to run is through portal or programmatically.

So, here is my contribution to ADF Community.

Pre-requisite – Azure Resource Manager PowerShell (https://azure.microsoft.com/en-us/blog/azps-1-0-pre/)

Copy following code in text file and save it as file.ps1

You can also download the script and save it as PS1 – Click Here

#Begin Script

Login-AzureRmAccount

$slices= @()
$tableName=@()
$failedSlices= @()
$failedSlicesCount= @()
$tableNames=@()

$Subscription="Provide Subscription ID" 
 
  Select-AzureRMSubscription -SubscriptionId  $Subscription   
$DataFactoryName="Provide Data Factory Name"
$resourceGroupName ="Porvide Resource Group Name for Data Factory"
 
$startDateTime ="2015-05-01" #Start Date for Slices
$endDateTime="2015-08-01" # End Date for Slices

#Get Dataset names in Data Factory - you can explicitly give a table name using $tableName variable if you like to run only for an individual tablename
$tableNames = Get-AzureRMDataFactoryDataset -DataFactoryName $DataFactoryName -ResourceGroupName $resourceGroupName | ForEach {$_.DatasetName}

$tableNames #lists tablenames

foreach ($tableName in $tableNames)
{
    $slices += Get-AzureRMDataFactorySlice -DataFactoryName $DataFactoryName -DatasetName $tableName -StartDateTime $startDateTime -EndDateTime $endDateTime -ResourceGroupName $resourceGroupName -ErrorAction Stop
}

$failedSlices = $slices | Where {$_.Status -eq 'Failed'}

$failedSlicesCount = @($failedSlices).Count

if ( $failedSlicesCount -gt 0 )
{

    write-host "Total number of slices Failed:$failedSlicesCount"
    $Prompt = Read-host "Do you want to Rerun these failed slices? (Y | N)"
    if ( $Prompt -eq "Y" -Or $Prompt -eq "y" )
    {

        foreach ($failed in $failedSlices)
        {
               write-host "Rerunning slice of Dataset "$($failed.DatasetName)" with StartDateTime "$($failed.Start)" and EndDateTime "$($failed.End)""
            Set-AzureRMDataFactorySliceStatus -UpdateType UpstreamInPipeline -Status Waiting -DataFactoryName $($failed.DataFactoryName) -DatasetName $($failed.DatasetName) -ResourceGroupName $resourceGroupName -StartDateTime "$($failed.Start)" -EndDateTime "$($failed.End)"
            $failed.DatasetName

        }
    }
           
}
else
{
    write-host "There are no Failed slices in the given time period."
}

#End Script


Comments (2)

  1. Azure_Geek says:

    BTW, where did you plagiarise this code from ? I remember seeing this same exact content else where on the web.

Skip to main content