Elastic processing with PowerShell in Windows Azure HDinsight

Azure HDInsight is Hadoop in the cloud, built to handle any amount of relational or non-relational data, scaling from terabytes to petabytes on demand.HDInsight allows customers to deploy a variety of cluster types, for different data analytics workloads.Each cluster type consists of a set of nodes. Customers are billed for the usage of those nodes for the duration of the cluster’s life. Billing starts once a cluster is created and stops when the cluster is deleted .So to reduce cost of the cluster,automatically management is very necessary in HDinsight.

Powershell is a very useful tool to implement elastic processing.Here I want to share with you some useful PowerShell script:

Firstly,we need to set up variables used to create an HDinsight cluster:

Make sure the storage account exists:

Remove the cluster if it already exists:

 Create a cluster

After creating the cluster,we'll use the cluster to analyse the file 'sample.log'.

The file is just like this:

Then we need to upload the log from local workstation to blob container:

Set up the pig job and the query:

Submit the pig job and download the result,display the result in our powershell:

The result should be like this:

Remove the cluster after we complete our job:

Finally,we completed our HDinsight job using PowerShell.

You can download the script from the attachment.Hope this demo will be helpful to you.


Skip to main content