Connectors will include:
- Hadoop to SQL Server Parallal Data Warehouse (PDW) for large data volumes.
- Hadoop to SQL Server 2008 R2 or SQL Server ‘Denali’ software.
Microsoft brings over a decade of Big Data expertise to the market. For instance we use it at Bing to deliver the best search results (over 100 PBs of data). Over the years Microsoft has invested steadily in unstructured data, including support for Binary files, FILESTREAM in SQL Server, semantic search, File Table, StreamInsight and geospatial data types.
Microsoft understands that customers are working with unstructured data in different environments such as Hadoop; we are committed to providing these customers with interoperability to enable them to move data between their Hadoop and SQL Server environments.
The announcement was made on the SQL Server team blog post Parallel Data Warehouse News and Hadoop Interoperability Plans.
The Apache Hadoop software library is a framework that supports distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop is highly scalable and can support petabytes of data. One of its key attractions is cost: through the use of commodity servers, Hadoop dramatically reduces the cost of analyzing large data volumes. As an example there is an application of Hadoop at New York Times that processed 4 TB of images, producing up to 11 million PDF files in 24 hours for only $240 in computational cost.
Bruce D. Kyle
ISV Architect Evangelist | Microsoft Corporation