More Hadoop+F# Goodness

More F# Hadoop goodness from Carl Nolan. This time a compositional framework for submitting F# and C# Azure/Hadoop map-reduce jobs.

 

 Framework for Composing and Submitting .Net Hadoop MapReduce Jobs

If you have been following my blog you will see that I have been putting together samples for writing .Net Hadoop MapReduce jobs; using Hadoop Streaming. However one thing that became apparent is that the samples could be reconstructed in a composable framework to enable one to submit .Net based MapReduce jobs whilst only writing Mappers and Reducers types.

To this end I have put together a framework that allows one to submit MapReduce jobs using the following command line syntax:

MSDN.Hadoop.Submission.Console.exe -input "mobile/data/debug/sampledata.txt" -output "mobile/querytimes/debug" -mapper "MSDN.Hadoop.MapReduceFSharp.MobilePhoneQueryMapper,MSDN.Hadoop.MapReduceFSharp" -reducer "MSDN.Hadoop.MapReduceFSharp.MobilePhoneQueryReducer,MSDN.Hadoop.MapReduceFSharp" -file "%HOMEPATH%MSDN.Hadoop.MapReducebinReleaseMSDN.Hadoop.MapReduceFSharp.dll"

Where the mapper and reducer parameters are .Net types that derive from a base Map and Reduce abstract classes. The input, output, and files options are analogous to the standard Hadoop streaming submissions. The mapper and reducer options (more on a combiner option later) allow one to define a .Net type derived from the appropriate abstract base classes.

Under the covers standard Hadoop Streaming is being used, where controlling executables are used to handle the StdIn and StdOut operations and activating the required .Net types. The “file” parameter is required to specify the DLL for the .Net type to be loaded at runtime, in addition to any other required files.

As an aside the framework and base classes are all written in F#; with sample Mappers and Reducers, and abstract base classes being provided both in C# and F#. The code is based off the F# Streaming samples in my previous blog posts. I will cover more of the semantics of the code in a later post, but I wanted to provide some usage samples of the code.

As always the source can be downloaded from:

code.msdn.microsoft.com/Framework-for-Composing-af656ef7

  ....