Hadoop for .NET Developers: Manually Loading Data to Hadoop

NOTE This post is one in a series on Hadoop for .NET Developers.

To manually load a file to Hadoop, the file should first be loaded to the name node server. With the file now on the name server, one of either of two commands can be used at the Hadoop command prompt to load the file into the Hadoop file system. While this is not ideal for most data loading needs, this technique is fine for development exercises and other one-off situations when the data file is sufficiently small to fit on the name node.

To demonstrate the manual loading of a file, we will load the integers.txt file from the name node server of the desktop development environment, both of which items were created or downloaded in previous posts in this series. Be sure to place the integers.txt file in the C:Temp folder of the name node server or alter the statements below appropriately:

NOTE The steps presented here will work whether HDFS or AVS is used as the underlying Hadoop data storage mechanism.

1. From the desktop, launch the Hadoop command prompt:

 

2. From the Hadoop command prompt, create the /demo/simple/in folder structure within the Hadoop file system by issuing the following command:

hadoop fs -mkdir /demo/simple/in

3. Create an out folder under /demo/simple using the following command:

hadoop fs -mkdir /demo/simple/out

NOTE This folder will be used in later exercises.

4. Load the integers.txt file from the local file system to /demo/simple/in within the Hadoop file system using the following command:

hadoop fs -put "c:tempintegers.txt" /demo/simple/in

5. Verify the integers.txt file is in the /demo/simple/in folder by issuing the following command:

hadoop fs -ls /demo/simple/in

 

6. View the contents of the integers.txt file by issuing the following command:

hadoop fs -cat /demo/simple/in/integers.txt 

 

7. You can now close the command prompt by simply typing exit and hitting return.