Using the Pi Estimator Sample on HadoopOnAzure CTP
Small Bites of Big Data
Cindy Gross, SQLCAT PM
UPDATED Jun 2013: HadoopOnAzure CTP has been replaced with HDInsight Preview. See: How to Run the HDInsight Samples http://www.windowsazure.com/en-us/manage/services/hdinsight/howto-run-samples/
Now that you have created your Hadoop on Azure cluster you can run the sample programs to become familiar with the interface. Click on the “Samples” tile under “Manage your account”.
The gallery of available samples is growing rapidly.
We’ll start with the very simple Pi Estimator sample. When you click on the Pi tile you’ll see some information about the sample (scroll down to see more). You can download the files and review them in as much detail as you want. You can review the PiEstimator.java file in notepad. The .jar file is a zip file that contains many other files. You can unzip it with many compression utilities then explore the files at your convenience. The description tells us the first parameter indicates how many maps to create (default of 16) and the second indicates how many samples per map (10 million by default). For now, we’ll take advantage of the Azure portal’s simplification of the complexities of Hadoop and click on “Deploy to your cluster”.
This brings up a screen to create the Hadoop job. You can modify the job name and parameters if you like. The “final command”, in this case “Hadoop jar hadoop-examples-0.20.203.1-SNAPSHOT.jar pi 16 10000000”, can be copied to the command line if you choose to go execute the job at the command line later.
Click on “Execute job”. It may run for a minute or two, or perhaps longer if the CTP system is being heavily used. As it progresses entries are added to the “Output (stdout)” and “Debug Output (stderr)” sections. Eventually you will see “Status: Completed Successfully” under “Job Info”. I highlighted the runtime (“Job Finished in 66.123 seconds”) and output (“Estimated value of Pi is 3.14159155000000000000”).
You can see that there were 16 maps created because that’s the first parameter we passed in to the jar command. If I change it to 8 maps and 1000 samples per map the command is now generated as “call hadoop.cmd jar hadoop-examples-0.20.203.1-SNAPSHOT.jar pi 8 1000” and the output has fewer significant digits even though the runtime is nearly the same:
Job Finished in 57.061 seconds
Estimated value of Pi is 3.14100000000000000000
When you go back to the main portal screen you now see the “Pi Example” tile. If the job is currently running you will see “…in progress…” Once it finishes the tile will show “Complete”.
To view job results, click on the “Job History” tile under “Manage your account”.
If you click on any history row it will take you to the same output you see if you are looking at the job interactively.
I hope you’ve enjoyed this small bite of big data! Look for more blog posts soon on the samples and other activities.
Note: the CTP and TAP programs are available for a limited time. Details of the usage and the availability of the CTP may change rapidly.