[This article was contributed by the SQL Azure team.]
Earlier this week we published a whitepaper entitled Sharding with SQL Azure to the TechNet wiki. In the paper, Michael Heydt and Michael Thomassy discuss the best practices and patterns to select when using horizontal partitioning and sharding with your applications.
Specific guidance shared in the whitepaper:
- Basic concepts in horizontal partitioning and sharding
- The challenges involved in sharding an application
- Common patterns when implementing shards
- Benefits of using SQL Azure as a sharding infrastructure
- High level design of an ADO.NET based sharding library
- Introduction to SQL Azure Federations
So what is sharding and partitioning, and why is it important?
Often the need arises where an application's data requires both high capacity for many users and support for very large data sets that require lightning performance. Or perhaps you have an application that by design must be elastic in its use of resources such as a social networking application, log processing solution or an application with a very high number of requests per second. These are all use cases where data partitioning across physical tables residing on seperate nodes; sharding or SQL Azure Federations, is capable of providing a performant scale-out solution.
In order to scale-out via sharding, an architect must partition the workload into independent units of data or atomic units. The application then must have logic built into it to understand how to access this data through the use of a custom sharding pattern or through the upcoming release of SQL Azure Federations.
Multi-Master Sharding Archetype
Also introduced by the paper is a multi-master sharding pattern where all shards are considered read/write, there is no inherent replication of data between shards and the model is referred to as a "shared nothing" as no data is shared or replicated between shards.
Use the Multi-Master Pattern if:
Clients of the system need read/write access to data
The application needs the ability for the data to grow continuously
Linear scalability of data access rate is needed as data size increases
Data written to one shard must be immediately accessible to any client
To use sharding with SQL Azure, application architeture must take into account:
Current 50GB resource limit on SQL Azure database size
Multi-Tenant peformance throttling and connection management when adding/removing shards
Currently sharding logic must be written at the application level until SQL Azure Federations is released
Shard balancing is complicated and may require application downtime while shards are created or removed
SQL Azure Federations
SQL Azure Federations will provide the infrastructure to support the dynamic addition of shards and the movement of atomic units of data between shards providing built-in scale-out capabilities. Federation Members are shards that act as the physical container for a range atomic units. Federations will support online repartioning as well as connections, which are established with the root database to maintain continuity and integrity.
Additionally, we have spoken publicly about coming SQL Azure Federations technology at both PDC and PASS this year. Since that time we have published a number of blog posts and whitepapers for your perusal:
- Evaluation of Scale-up vs Scale-out: how to evaluate the scalability options when using SQL Azure
- Intro to SQL Azure Federations: SQL Azure Federations overview
- Scenarios and Applications for SQL Azure Federations: highlighting the power of SQL Azure Federations technology
- Scaling with SQL Azure Federations: quick walkthrough of building and app with SQL Azure Federations