Hadoop for .NET Developers: Working with HDInsight on Azure GA

This last week, HDInsight on Azure became generally available.  This is great but with the roll out comes a few changes:   1. HTTPS communications with the Hadoop cluster are now on the standard HTTPS port, 443.  So instead of communicating over HTTP port 563, just identify your remote cluster using https://clustername or append a…

2

Hadoop for the .NET Developer: Troubleshooting with the MapReduce Job Logs

NOTE This post is one in a series on Hadoop for .NET Developers. Despite your best efforts, you will occasionally have to deal with failed jobs.  To troubleshoot such a job, it helps to understand how to use the logs available to you on the Hadoop cluster.  We’ll focus on how to access these logs…

0

Hadoop for .NET Developers: Unit-Testing with the .NET SDK

NOTE This post is one in a series on Hadoop for .NET Developers. Data are problematic and code doesn’t always work like we had anticipated. Before running a potential large MapReduce job on our cluster, we may want to perform a test on a subset of data. But even before that it would be best…

0

Hadoop for .NET Developers: Implementing a (Slightly) More Complex MapReduce Job

NOTE This post is one in a series on Hadoop for .NET Developers. In our first MapReduce exercise, we implemented a purposefully simple MapReduce job using the .NET SDK against our local development cluster.  In this exercise, we’ll implement a slightly more complex MapReduce job using the same SDK but against our remote Azure-based cluster….

2

Hadoop for .NET Developers: Understanding Hadoop Streaming

NOTE This post is one in a series on Hadoop for .NET Developers. In the last post, we built a simple MapReduce job using C#.  But Hadoop is a Java-based platform.  So how is it we can execute a MapReduce job using a .NET language?  The answer is Hadoop Streaming. In a nutshell, Hadoop Streaming…

0

Hadoop for .NET Developers: Implementing a Simple MapReduce Job

NOTE This post is one in a series on Hadoop for .NET Developers. In this exercise, we will write and execute a very simple MapReduce job using C# and the .NET SDK.  The purpose of this exercise is to illustrate the most basic concepts behind MapReduce. The job we will create will operate off the…

16

Hadoop for .NET Developers: Understanding MapReduce

NOTE This post is one in a series on Hadoop for .NET Developers. In Hadoop, data processing is tackled through MapReduce jobs. A job consists of basic configuration information, e.g. paths to input files and an output folder, and are executed by Hadoop’s MapReduce layer as a series of tasks.  These tasks have responsibility for…

1

Hadoop for .NET Developers: Programmatically Loading Data to AVS

NOTE This post is one in a series on Hadoop for .NET Developers. As mentioned in an earlier post, the WebHDFS client assumes a Hadoop cluster employs HDFS but can be configured to work with a cluster leveraging AVS. If you are working with a persistent HDInsight in Azure cluster (based on AVS), then the…

0

Hadoop for .NET Developers: Understanding Azure Vault Storage

NOTE This post is one in a series on Hadoop for .NET Developers. My explanation of Hadoop storage in this blog series has focused on HDFS.  Hadoop abstracts its file system layer so that alternative storage options can be employed.  With HDInsight in Azure, Azure Blob Storage is used as the underlying storage layer. The…

0

Hadoop for .NET Developers: Programmatically Loading Data to HDFS

NOTE This post is one in a series on Hadoop for .NET Developers. In the last blog post in this series, we discussed how to manually load data to a cluster.  While this is fine for occasional needs, a programmatic approach is more typically preferred.  To enable this, Hadoop presents a REST interface on HTTP port…

10