I’m happy to announce that we have released a new version of SSIS Azure Feature Pack (AFP) highlighting on HDInsight support upgrade. The download links are as follows:
Since the introduction of HDInsight support in AFP, there have been profound changes made to the HDInsight service on Azure. This new release aims to support these changes.
- Change from classic deployment model to Azure Resource Manager (ARM) model. This is an on-going effort for all Azure resources, not just limited to HDInsight. It changes the way Azure resources are managed. More details can be found here. Concerning our case, this affects the creation/deletion of HDInsight clusters which are done by the Azure HDInsight Create Cluster Task/Azure HDInsight Delete Cluster Task, respectively. In previous versions, the classic certificate-based Azure Subscription Connection Manager is used by the tasks for resource-management authentication. In this new release, a new Azure Resource Manager Connection Manager is introduced for this purpose, and the two control flow tasks are updated to use this new connection manager in place of the original one. Currently, only service principal authentication is supported by the new connection manager.
- Change from Windows-based clusters to Linux-based. In the beginning, only Windows-based clusters are supported. Nowadays, however, Windows-based clusters are being deprecated, and Linux-based ones are taking the place. As explained by the HDInsight team: “For continued investment on the open source big data technologies, future releases of HDInsight will be available only on Linux OS. There will not be any future release of HDInsight on Windows OS. The last release of HDInsight on Windows was HDI 3.3. The support for HDI 3.3 expired on 06/27/2016 and it will be deprecated on 07/31/2017.” Following this trend, since this release, the Azure HDInsight Create Cluster Task creates Linux-based clusters instead of Windows-based ones as done in previous versions. Compared to Windows-based clusters, Linux-based require two extra properties in Azure HDInsight Create Cluster Task, namely SshUserName and SshPassword which are used to remote-connect to the clusters via SSH.
- Introduce the new Azure HDInsight Connection Manager. This new connection manager is used by Azure HDInsight Hive Task/Azure HDInsight Pig Task to identify the target HDInsight cluster to run the script and supply authentication information.
- Extra properties in Azure HDInsight Hive/Pig Task to fetch script execution outcome and error logs. The extra properties include an Azure Storage Connection Manager and a blob container name which are used to specify the default storage account and container associated with the cluster, respectively.