Leveraging a Hadoop cluster from SQL Server Integration Services (SSIS)

Article
12/19/2012

With the explosion of data, the open source Apache™ Hadoop™ Framework is gaining traction thanks to its huge ecosystem that has arisen around the core functionalities of Hadoop distributed file system (HDFS™) and Hadoop Map Reduce. As of today, being able to have SQL Server working with Hadoop™ becomes increasingly important because the two are indeed complementary. For instance, while petabytes of data can be stored unstructured in Hadoop and take hours to be queried, terabytes of data can be stored in a structured way in the SQL Server platform and queried in seconds. This leads to the need to transfer data between Hadoop and SQL Server.

This white paper explores how SQL Server Integration Services (SSIS), i.e. the SQL Server Extract, Transform and Load (ETL) tool, can be used to automate Hadoop + non Hadoop job executions, and manage data transfers between Hadoop and other sources and destinations.

To review the document, please download the Leveraging a Hadoop cluster from SQL Server Integration Services (SSIS) Word document.

Author : Debarchan(MSFT) SQL Developer Engineer, Microsoft

Reviewed by : Snehadeep(MSFT), SQL Developer Technical Lead , Microsoft

Leveraging a Hadoop cluster from SQL Server Integration Services (SSIS)

Additional resources