Hadoop for .NET Developers: Unit-Testing with the .NET SDK

NOTE This post is one in a series on Hadoop for .NET Developers.

Data are problematic and code doesn’t always work like we had anticipated. Before running a potential large MapReduce job on our cluster, we may want to perform a test on a subset of data. But even before that it would be best to unit test our code, stepping through it in Visual Studio, and the .NET SDK provides us this functionality through the StreamingUnit object.

To use the StreamingUnit, we write our Mapper and Reducer classes just like before. But instead of writing a job against a cluster, we execute the StreamingObject, configured for our Mapper and Reducer, against an enumerable collection of string data that we define within our program.

To see this in action, pull up your code from the previous exercise. Modify the Main method to reflect the following code:

static void Main(string[] args)
{
//test data
string[] myData = {
"19980710t19980721tBrick, NJtspheret5 minutestOn the evening of July 10, 1998, I was walking near my home ...",
"19980711t19980721tBrick, NJtfireballt5 minutestOn the evening of July 10, 1998, I was walking near my home ...",
"19980712t19980721tBrick, NJtfireballt5 minutestOn the evening of July 10, 1998, I was walking near my home ...",
"19970710t19980721tBrick, NJtspheret5 minutestOn the evening of July 10, 1998, I was walking near my home ..."
};

//execute mapreduce job on test data
StreamingUnitOutput output =
StreamingUnit.Execute<MyUfoMapper, MyUfoReducer>(myData);

    //inspect job output
Console.ReadLine();
}

With this code in place, you can now step through your map and reduce functions from within Visual Studio.

For more info on using the StreamingUnit object, check out Andy Cross's post on this topic.