Hadoop for .NET Developers: Programmatically Loading Data to AVS

NOTE This post is one in a series on Hadoop for .NET Developers. As mentioned in an earlier post, the WebHDFS client assumes a Hadoop cluster employs HDFS but can be configured to work with a cluster leveraging AVS. If you are working with a persistent HDInsight in Azure cluster (based on AVS), then the…


Hadoop for .NET Developers: Understanding Azure Vault Storage

NOTE This post is one in a series on Hadoop for .NET Developers. My explanation of Hadoop storage in this blog series has focused on HDFS.  Hadoop abstracts its file system layer so that alternative storage options can be employed.  With HDInsight in Azure, Azure Blob Storage is used as the underlying storage layer. The…


Hadoop for .NET Developers: Programmatically Loading Data to HDFS

NOTE This post is one in a series on Hadoop for .NET Developers. In the last blog post in this series, we discussed how to manually load data to a cluster.  While this is fine for occasional needs, a programmatic approach is more typically preferred.  To enable this, Hadoop presents a REST interface on HTTP port…

10

Hadoop for .NET Developers: Manually Loading Data to Hadoop

NOTE This post is one in a series on Hadoop for .NET Developers. To manually load a file to Hadoop, the file should first be loaded to the name node server.  With the file now on the name server, one of either of two commands can be used at the Hadoop command prompt to load…

1

Hadoop for .NET Developers: Understanding HDFS

NOTE This post is one in a series on Hadoop for .NET Developers. From a data storage perspective, you can think of Hadoop as simply a big file server.  Through the name node, the Hadoop cluster presents itself as a single file system accepting basic Linux file system commands such as ls, rmr, mkdir, and…


Hadoop for .NET Developers: Obtaining the Sample Data Sets

NOTE This post is one in a series on Hadoop for .NET Developers. In the exercises that follow, we will work with two sample data files.  These files are available as part of a ZIP file associated with this blog post. The first sample data file, integers.txt, contains a simple list of integers from 1 to…


Hadoop for .NET Developers: Setting Up an Azure Cluster

NOTE This post is one in a series on Hadoop for .NET Developers. For rapid provisioning and lack of long-term commitment, the cloud is an excellent place to try your hand with a multi-node Hadoop cluster.  If you are an MSDN subscriber, Microsoft provides you access to cloud services as part of your benefits as described…

1

Hadoop for .NET Developers: Setting Up a Desktop Development Environment

NOTE This post is one in a series on Hadoop for .NET Developers. If you are a .NET developer, you will want to setup a desktop development environment with the following components: Visual Studio 2010 or 2012 NuGet Package Installer for Visual Studio A Local, Single Node Hadoop “Cluster” Having these components installed on your…

7

Hadoop for .NET Developers: Basic Architecture

NOTE This post is one in a series on Hadoop for .NET Developers. Hadoop is implemented as a set of interrelated project components. The core components are MapReduce, which handles job execution, and a storage layer, typically implemented as the Hadoop Distributed File System (HDFS). For the purpose of this post, we will assume HDFS…


Hadoop for .NET Developers: Understanding Hadoop

NOTE This post is one in a series on Hadoop for .NET Developers. Big Data has been a source of excitement in the analytics community for a few years now. For the purpose of this blog series, I’ll loosely define the term to mean an expansion of focus from data originating from core operational systems – the domain…