Submitting Hadoop MapReduce Jobs using PowerShell

As always here is a link to the “Generics based Framework for .Net Hadoop MapReduce Job Submission” code.

In all the samples I have shown so far I have always used the command-line consoles. However this does not need to be the case, PowerShell can be used. The Console application which is used to submit the MapReduce jobs call a .Net Submissions API. As such one can call the .Net API directly from within PowerShell; as I will now demonstrate.

The key types one needs to be concerned with are:

  • MSDN.Hadoop.Submission.Api.SubmissionContext – The type containing the job submission options
  • MSDN.Hadoop.Submission.Api.SubmissionApi – The type used for submitting the job

To use the .Net API one firstly has to create the two required objects:

$SubmitterApi = $BasePath + "\Release\MSDN.Hadoop.Submission.Api.dll"
Add-Type -Path $SubmitterApi
$context = New-Object -TypeName MSDN.Hadoop.Submission.Api.SubmissionContext
$submitter = New-Object -TypeName MSDN.Hadoop.Submission.Api.SubmissionApi

After this one just has to define the context with the necessary job submission properties:

[string[]]$inputs = @("mobile/data")
[string[]]$files = @($BasePath + "\Sample\MSDN.Hadoop.MapReduceCSharp.dll")

$config = New-Object 'Tuple[string,string]'("DictionaryCapacity", "1000")
$configs = @($config)

$context.InputPaths = $inputs
$context.OutputPath = "mobile/querytimes"
$context.MapperType = "MSDN.Hadoop.MapReduceCSharp.MobilePhoneRangeMapper, MSDN.Hadoop.MapReduceCSharp"
$context.ReducerType = "MSDN.Hadoop.MapReduceCSharp.MobilePhoneRangeReducer, MSDN.Hadoop.MapReduceCSharp"
$context.Files = $files
$context.ExeConfigurations = $configs

One just has to remember that the input and files specifications are defined as string arrays.

In a recent build I added support for adding user-defined key-value pairs to the application configuration file. This ExeConfigurations property expects an array of Tuple<string, String> types, hence the object definition for the $config value.

Optionally one can also set the Data and Output format types:

$context.DataFormat = [MSDN.Hadoop.Submission.Api.DataFormat]::Text
$context.OutputFormat = [MSDN.Hadoop.Submission.Api.OutputFormat]::Text

However, this is not necessary if one is using the default Text values.

Once the context has been defined one just has to run the job:

$submitter.RunContext($context)

To call the PowerShell script from the Hadoop command-line once can use:

powershell -ExecutionPolicy unrestricted /File %BASEPATH%\SampleScripts\hadoopcstextrangesubmit.ps1

All in all a simple process.