Pushing Hadoop Cluster Configuration Changes using PowerShell

In my previous post I talked about Implementing and Deploying Rack Awareness using PowerShell. However PowerShell is a great tool for not only managing things like Rack Awareness but for installing and managing the Hadoop cluster; especially for managing configuration changes, the focus of this post. All the files relating to post can be found…

3

Deploying Hadoop Rack Awareness with PowerShell

In a previous post I talked about Implementing Hadoop Rack Awareness with PowerShell. One thing I skimmed over in this post was how to deploy the necessary files to the cluster and make the configuration file changes. Once again PowerShell is your friend. Deploying this solution involves two processes. Firstly copying the necessary files to…

2

Implementing Hadoop Rack Awareness with PowerShell

This post walks-through building a PowerShell script for enabling Rack Awareness in Hadoop. While several example scripts can be found online for Linux, samples building a script for Windows is less common. Hadoop divides the data into multiple file blocks and stores them on different machines. By default all machines are deemed to be on…

0

Managing Your HDInsight Cluster using PowerShell – Update

Since writing my last post Managing Your HDInsight Cluster and .Net Job Submissions using PowerShell, there have been some useful modifications to the Azure PowerShell Tools. The HDInsight cmdlets no longer exist as these have now been integrated into the latest release of the Windows Azure Powershell Tools. This integration means: You don’t need to…

0

Managing Your HDInsight Cluster and .Net Job Submissions using PowerShell

This post explains how best to manage an HDInsight cluster using a management console and Windows PowerShell. The goal is to outline how to create a simple cluster, provide a mechanism for managing an elastic service, and demonstrate how to customize the cluster creation. Before provisioning a cluster one need to ensure the Azure subscription…

3

Managing Hive Job Submissions With PowerShell

In my previous post, I talked about “Managing Your HDInsight Cluster with PowerShell”. In this post I made no mention of using Hive. I hope to re-address this balance by specifically talking about how you can submit Hive jobs from the same local management console. As before all the scripts mentioned in this and the…

3

Managing Your HDInsight Cluster with PowerShell

An updated version of this post can be found here. This blog post provides a mechanism for managing an HDInsight cluster using a local management console through the use of Windows PowerShell. The goal is to outline how to configure the local management console, create a simple cluster, submit jobs using MRRunner, and finally provide…

0

Submitting Hadoop MapReduce Jobs using PowerShell

As always here is a link to the “Generics based Framework for .Net Hadoop MapReduce Job Submission” code. In all the samples I have shown so far I have always used the command-line consoles. However this does not need to be the case, PowerShell can be used. The Console application which is used to submit…

0

Hive and XML File Processing

When I put together the “Generics based Framework for .Net Hadoop MapReduce Job Submission” code one of the goals was to support XML file processing. This was achieved by the creation of a modified Mahout document reader where one can specify the XML node to be presented for processing. But what if ones wants to…

7

Implementing a MapReduce Join with Hadoop and the .Net Framework

I have often been asked how does one implement a Join whilst writing MapReduce code. As such, I thought it would be useful to add an additional sample demonstrating how this is achieved. There are multiple mechanisms one can employ to perform a Join operation, and the one to be discussed will be a Reduce…

1