Well, it’s Summer again and time for some new blog entries. This Summer, I’ve had some time to dig into Hadoop and want to share some of the basics of storage and job processing from a .NET developer’s perspective.
Hadoop is an open-source platform written in Java. However, thanks to work by Hortonworks and Hortonworks, those familiar with C#, VB.NET, or any other .NET language can now leverage the platform.
With this series of posts, it’s my objective to help you get started with .NET development on Hadoop. The explanations of Hadoop as well as the samples and demonstrations are purposefully simple but will provide you an accessible starting point for your own exploration of the platform.
Please note, I will focus on storage and MapReduce, the core components of the Hadoop platform. In a later series, I will explore Hive and Pig for higher-level interaction with the data sets employed here.
NOTE Anoop Madhusudanan has written quite a bit about the use of .NET with Hadoop. Check out his blog at http://www.amazedsaint.com/search/label/Hadoop for some very informative content.
- Understanding Hadoop
- Basic Architecture
- Setting Up a Desktop Development Environment
- Setting Up an Azure Cluster
- Obtaining the Sample Data Sets
- Understanding HDFS
- Manually Loading Data to Hadoop
- Programmatically Loading Data to HDFS
- Understanding Azure Vault Storage
- Programmatically Loading Data to AVS
- Understanding MapReduce
- Implementing a Simple MapReduce Job
- Understanding Hadoop Streaming
- Implementing a (Slightly) More Complex MapReduce Job
- Unit-Testing with the .NET SDK
- Troubleshooting with the MapReduce Job Logs