Welcome to today’s Article Spotlight!
Check out the full version of the article here:
This blog post is a preview of the content in that article (you’ll find 3-5 times more information in the TNWiki article). The article (and many others about Hadoop) is written by Wesley McSwain, SQL Server technical writer.
Apache Hadoop is an open source software framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It consists of two primary components: Hadoop Distributed File System (HDFS), a reliable and distributed data storage, and MapReduce, a parallel and distributed processing system. A Hadoop cluster can be made up of a single node or thousands.
HDFS is the primary distributed storage used by Hadoop applications. As you load data into a Hadoop cluster, HDFS splits up the data into blocks/chunks and creates multiple replicas of blocks and distributes them across the nodes of the cluster to enable reliable and extremely rapid computations.
The links in this section provide information on deploying Apache Hadoop to Microsoft Windows Platforms. All these articles are on TechNet Wiki:
|Getting Started with Hadoop-based Services for Windows||An overview of the Getting Started guides currently available.|
|Getting Started a Hadoop cluster on the Elastic Map Reduce Portal.||A walkthrough for provisioning and using a temporary Hadoop cluster on the Elastic Map Reduce Portal (EMR) Portal.|
This section contains information on using Hadoop with other BI technologies. All these articles are on TechNet Wiki:
|How to Connect Excel to Hadoop on Azure via HiveODBC||Explains how to use Excel 2010 to access data in the Hive data warehouse running on Windows Azure by using the Hive ODBC Driver.|
|How to Connect Excel PowerPivot to Hive on Azure via HiveODBC||Explains how to use PowerPivot to access data in the Hive data warehouse running on Windows Azure by using the Hive ODBC Driver.|
This section contains a list of Hadoop-related how-to articles. All these articles are on TechNet Wiki:
|Hadoop-based Services on Windows Azure How-Tos and FAQs||A collection of common How To topics along with FAQs.|
|How to Run a Job on a Provisioned Hadoop on Windows Azure Cluster||Information about creating Map Reduce jobs on a cluster that has been provisioned on the Elastic Map Reduce (EMR) portal|
|How To FTP Data to Hadoop on Windows Azure||A walkthrough for using FTPS to send file data to the cluster|
|How to create a mapper and reducer in C# (Hadoop Streaming)||A walkthough for creating a mapper and reducer in C# using Hadoop Streaming|
|Use SQL Azure database as a Hive metastore||Information about using SQL Azure database as a Hive metastore|
Check out the article and add to it here (it’s a lot bigger than the sections I featured in this blog post):
Jump on in. The Wiki is warm!
– User Ed