Hadoop for the .NET Developer: Troubleshooting with the MapReduce Job Logs

NOTE This post is one in a series on Hadoop for .NET Developers. Despite your best efforts, you will occasionally have to deal with failed jobs.  To troubleshoot such a job, it helps to understand how to use the logs available to you on the Hadoop cluster.  We’ll focus on how to access these logs…

0

Hadoop for .NET Developers: Unit-Testing with the .NET SDK

NOTE This post is one in a series on Hadoop for .NET Developers. Data are problematic and code doesn’t always work like we had anticipated. Before running a potential large MapReduce job on our cluster, we may want to perform a test on a subset of data. But even before that it would be best…

0

Hadoop for .NET Developers: Implementing a (Slightly) More Complex MapReduce Job

NOTE This post is one in a series on Hadoop for .NET Developers. In our first MapReduce exercise, we implemented a purposefully simple MapReduce job using the .NET SDK against our local development cluster.  In this exercise, we’ll implement a slightly more complex MapReduce job using the same SDK but against our remote Azure-based cluster….

2

Hadoop for .NET Developers: Understanding Hadoop Streaming

NOTE This post is one in a series on Hadoop for .NET Developers. In the last post, we built a simple MapReduce job using C#.  But Hadoop is a Java-based platform.  So how is it we can execute a MapReduce job using a .NET language?  The answer is Hadoop Streaming. In a nutshell, Hadoop Streaming…

0

Hadoop for .NET Developers: Implementing a Simple MapReduce Job

NOTE This post is one in a series on Hadoop for .NET Developers. In this exercise, we will write and execute a very simple MapReduce job using C# and the .NET SDK.  The purpose of this exercise is to illustrate the most basic concepts behind MapReduce. The job we will create will operate off the…

16

Hadoop for .NET Developers: Understanding MapReduce

NOTE This post is one in a series on Hadoop for .NET Developers. In Hadoop, data processing is tackled through MapReduce jobs. A job consists of basic configuration information, e.g. paths to input files and an output folder, and are executed by Hadoop’s MapReduce layer as a series of tasks.  These tasks have responsibility for…

1