Recently I attended a SQL Server Parallel Data Warehouse (PDW) appliance and FasTrack 3.0 workshop.
Microsoft has three key Data Warehouse offerings including SQL Server 2008 R2, Fast Track Data Warehouse and Parallel Data Warehouse. Together, these offerings enable customers to scale their warehouses from gigabytes to hundreds of terabytes on a SQL Server platform. Fast Track helps customers to scale up to tens of terabytes while Parallel Data Warehouse is ideal for workloads with hundreds of terabytes and very high performance. Finally, Microsoft has a unique Distributed Architecture that supports Data Warehouses deployed on both Fast Track and Parallel Data Warehouse.
- Fast Track Data Warehouse is a set of Reference Architectures that help customers and Partners reduce time to solution and cost. Fast Track also provides significantly increased performance and scale through pre-tested and pre-tuned configurations. Fast Track Data Warehouse 3.0 offers customers:
- Faster performance – delivered via new Fast Track configurations that take advantage of the latest hardware innovations. The latest configurations have double the data throughput of their current counterparts. HP’s latest 8-processor Reference Architecture has peak data throughput of up to 15GB/s, which translates to much faster queries.
- Increased scale than ever before! The latest configurations on HP’s G7 servers offer double the capacity of their current equivalents. Fast Track 3.0 configurations now scale to 80TB on a single 8-processor server!
- Greater choice with configurations from several Partners. Existing Partners - HP, Dell, IBM, Bull, and EMC continue to innovate, while new Partners are being on-boarded e.g. Cisco, Hitachi Data Systems and Fujitsu, with EMC and NetApp being storage providers to a few configurations. We are also pleased to announce that WhereScape is the first ISV to build a solution for Fast Track Data Warehouse! Several System Integrators such Avanade, HP, EMC and Hitachi Consulting Services also offer services for Fast Track Data Warehouse.
- In some cases, Fast Track Data Warehouse is helping customers reduce their DW costs by over 50% without compromising on performance and scalability.
- Fast Track details - http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/fast-track.aspx
- Fast Track 3.0 sizing tool - http://download.microsoft.com/download/5/4/8/54863BA7-DFC4-4AC5-B690-ED72119EDD80/Fast_Track_v3.0_System_Sizing_Tool_01252011.xlsx
- Parallel Datawarehouse
- Parallel Data Warehousing or PDW is an appliance based architecture that provides significant scale out capabilities based on the technology the acquired from Datallegro Corp in 2008. This MPP architecture provides more scalable and predictable performance for significantly greater workloads up into the 100′s of terabytes.
- PDW's implementation is particularly exciting because PDW provides a much lower cost per terabyte as it can be implement it with commodity hardware instead of a proprietary system like Teradata or Neteeza
In symmetric multi-processing (SMP) architecture, query processing occurs entirely within one physical instance of a database. CPU, memory and storage impose physical limits upon speed and scale. A SQL Server 2008 R2 Parallel Data Warehouse MPP data warehouse appliance partitions large tables across multiple physical nodes, each node having dedicated CPU, memory and storage, and each running its own instance of SQL Server, in a parallel shared nothing architecture.
SQL Server 2008 R2 Parallel Data Warehouse delivers improved performance through its unique design as follows:
- PDW Works by controlling several different physical servers each running their own instance of SQL Server 2008 R2. The database and it’s tables are spread across these physical servers but appear as one database and table(s) to the end user.
- The appliance or brain of the PDW manages query execution and the meta data for what is stored and processed on what portion of the PDW.
- In PDW implementation generally the fact data is distributed and dimensions are replicated to all nodes using PDW component itself.
- Massive scalability to hundreds of terabytes. The largest appliance available currently scales to over 500TB of user data capacity and has up to 40 compute nodes
- Enterprise class performance – Parallel Data Warehouse delivers up to 10X data throughput and 100X faster query performance than an SMP Data Warehouse
- First-class integration with Microsoft BI tools such as PowerPivot, SQL Server Analysis Services, Reporting Services and Integration Services
- Reduced risk through pretested configurations. Parallel Data Warehouse is also built on a mature SQL Server platform with ten releases of technology
- PDW has in-built resource governor.
- Save on costs through industry-standard hardware. Low cost with appliances starting from approximately $13K per terabyte. PDW uses latest generation of industry standard hardware with 2-socket servers that use 6-core Intel® Westmere processors
- Take advantage of balanced reference architectures for better performance.
- For PDW details - http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/pdw.aspx
All components are balanced against each other to reduce performance bottlenecks, and all server and storage components are mirrored for enterprise-class redundancy.
A Control Node routes queries from applications to all Compute Nodes, then collects and returns the result
Because data is evenly distributed across multiple nodes and processing occurs in parallel, queries can be many times faster than on single SMP database servers
- Fast Track is a Scale up solution while Parallel Data Warehouse is a Scale out solution. PDW is quite expensive compare to Fast Track and page compression is enabled in PDW by default while you need to enable manually in fast track.
Sharing some of the links and resources for PDW and Fasttrack 3.0
- Free e-Learning clinic ‘Clinic 10333: Introduction to Microsoft SQL Server 2008 R2 Parallel Data Warehouse’: https://www.microsoftelearning.com/eLearning/courseDetail.aspx?courseId=187496&tab=overview
- Using SQL Server to Build a Hub-and-Spoke Enterprise Data Warehouse Architecture: http://msdn.microsoft.com/en-us/library/dd458815.aspx
- Detailed spec of HP’s PDW implementation: http://h18004.www1.hp.com/products/quickspecs/13830_na/13830_na.pdf and http://h18004.www1.hp.com/products/quickspecs/13830_na/13830_na.html
- Experience of Microsoft IT for 90TB PDW implementation: http://technet.microsoft.com/en-us/library/gg447063.aspx and http://technet.microsoft.com/en-us/edge/deploying-sql-server-2008-r2-parallel-data-warehouse-at-microsoft.aspx
- Start your learning from http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/pdw.aspx and http://www.microsoft.com/global/sqlserver/2008/en/us/RichMedia/Parallel%20Data%20Warehouse/Default.html
- SQL PDW blog : http://www.sqlpdw.com/
- Deep-dive talk about PDW at TechEd: http://www.msteched.com/2010/NorthAmerica/BIE309
- PDW and FastTrack best practices: http://www.msteched.com/2010/NorthAmerica/BIE306