Hadoop for .NET Developers: Basic Architecture

NOTE This post is one in a series on Hadoop for .NET Developers.

Hadoop is implemented as a set of interrelated project components. The core components are MapReduce, which handles job execution, and a storage layer, typically implemented as the Hadoop Distributed File System (HDFS). For the purpose of this post, we will assume HDFS is in use.

Hadoop components are implemented across a series of servers referred to as data (or compute) nodes. These nodes are where data are stored and processed.

A name node server keeps track of the data nodes in the environment, which data are stored on which node, and presents that data nodes as a singular entity. This singular representation is referred to as a cluster. If you are familiar with the term cluster from RDBMS implementations, please note that there is not necessarily any shared storage or other resources between the nodes. A Hadoop cluster is purely logical.