In this post I wanted to focus on connectivity enhancements that come with SQL Azure Federations. SQL Azure Federation comes with a special FILTERING connection option that makes is safe to work with federated data and is critical for the fully-available-repartitioning component of federations. We’ll take an in depth look at this and build towards a great side effect of FILTERING connections; safe model for programming federated data and a great migration utility for multi-tenant applications. Let’s rewind back to the top. First; what sharded applications typically have to do for connectivity today, then look at how federation improve that…
Connecting to Sharded Databases
Today, when developing sharded applications, typically the data access layer will cache the shard directory (distribution of the data to the databases) at the application tier and will construct a connection string based on the shard key instance. Once the connection is established the application is responsible for writing queries that will target the specific shard key instance. Here is a quick ADO.Net sample; Lets assume again a tenant based system with a customers table and application targeting customer with id=55. dbname_postfix below refer to the actual database name you need to connect to that contain customer 55. With all your queries, the app needs to also remember to include customer_id=55 predicate in the WHERE clause to filter to the correct data subset.
1: SqlConnection cn = new SqlConnection(“Server=tcp:servername.db.windows.net;"+2: "Db=salesdb_”+dbname_postfix+”;User ID=username;Password=password;"+3: "Trusted_Connection=False;Encrypt=True”);4: cn.Open();5: …6: SqlCommand cm = new SqlCommand(“SELECT … FROM dbo.customers "+7: WHERE customer_id=55 and …”);8: cm.ExecuteQuery();9: …
Connection Pool Management: One issue that is obvious here is that once you reach to large number of databases, your connection pool will start fragmenting. Connection pooling today creates a connection pool per hash of a connection string. Imagine this, 200 databases in your sharded app… You will end up with 200 connection pools per application instance with mostly each connection pool containing a few connection to the database. This setup typically means that you don’t get much connection reuse. Idle connections after a while will get disconnected you will have to reestablish connections from scratch.
Cache Coherency: Caching the shard map also comes with another problem; cache coherency. As you repartition data, you need to build logic to invalidate the shard directory that is cached and ensure that during repartitioning operation, you are either offline or have the code to handle race conditions around invalidation of the cache and moment of physical movement of data to ensure connection always get routed to the correct database. Imagine a case where you are moving customer 55 to a new database. During the data movement you need to be able to capture all changes to customer 55 and at the moment of the switchover you need to make sure connection routed to the old database needs to be routed to the new database or could suffer from invalid results to queries such as the one above.
Connecting to Federations
Connection String: With SQL Azure Federations, connection string for the application always point to the root database name. Here is what the code would look like to the example above. Imagine in this case that you have a root database called salesdb. Realize there is no dbname_postfix anymore. Instead USE FEDERATION statement does the routing right after connection is opened.1: SqlConnection cn = new SqlConnection(“Server=tcp:servername.db.windows.net;"+2: "Db=salesdb;User ID=username;Password=password;"+3: "Trusted_Connection=False;Encrypt=True”);4: cn.Open();5: …
Safety in Atomic Units: Atomic units refer to instances of federation key, such as all rows in all federated tables of a federation that contain customer_id=55. In federations, atomic units provide the one important guarantee about physical placement of partitioned data. All federation operations guarantee that rows that belong to the same atomic unit are always physically in the same federation member. That is, we never split the atomic unit to multiple federation members. (you can find a good review of the federation concepts such as atomic units and federated tables here).
Large part of federated application workloads target a single atomic unit at a time and depend on this guarantee. In fact the default connection type to federation members scope connections to atomic units through the USE FEDERATION statement such as the one below.1: USE FEDERATION orders_federation(customer_id=55) WITH RESET, FILTERING=ON
What does USE FEDERATION give you? #1 you are guaranteed to land in the correct federation member even if data is being repartitioned. #2 with the FILTERING option set, your connection is scoped to data that is part of customer_id=55 and you no longer have to remember to include the filter in your WHERE clause. Here is the refactored version of the above sample with USE FEDERATION. Realize that, with the FITLERING connection option set below, the query no longer needs to include the customer_id=55 predicate in the where clause. However if the application did include customer_id=55 in the WHERE clause, things would still work as expected.1: SqlCommand cm = new SqlCommand(“USE FEDERATION orders_federation(customer_id=55) ”+2: ”WITH FILTERING=ON”)3: cm.ExecuteNoneQuery();4: SqlCommand cm = new SqlCommand(“SELECT … FROM dbo.customers WHERE …”)5: cm.ExecuteQuery();6: …
With the FILTERING connection option set, the query on customers table “SELECT … FROM dbo.customers” will return only rows that have customer_id=55. That is the query execution engine will insert the predicate, customer_id=55” automagically for you when the target of your query is a federated table. This is true for all types of statements including INSERT, DELETE and UPDATE statements. The connection works just like a constraint. If you try to INSERT or UPDATE a row to a different customer_id instance, you get an error. However DELETE and SELECT statements simply effect no rows if you try to reach outside of the scope of customer_id=55.
Federations also support turning FILTERING off. This is useful when doing global changes to the federation member such as schema changes, modifications to reference tables such as updating data in your zipcode lookup table in a federation member, in cases where you want to do bulk operations or querying over many atomic units at the same time for efficiency.
Migrating to Multi-Tenant Model with Federations: Multi-tenancy is a great way to improve economics of your app by improving density of your tenants. In most classic apps, typically the tenants get a first class database. However with large number of tenants, it is easy to see that database management overhead gets out of control fast. If you can pack more tenants to each database by allowing multiple tenants in a single database, you can reduce the number of databases you need to manage.
Not so obvious at first, but a great side effect of FILTERING connection is, they make migration of business logic to a multi-tenant model much easier… Most classic apps today are built with database-per-tenant model. With single-database-per-tenant approach, your business logic and queries won’t have the tenant filtering in place. That is, you won’t have the customer_id filtering in your database traffic. If you have a large number of queries and stored procedures, it could be a pain to port to a multi-tenant solution and validate your solution. With FILTERING connection option set, it could be much easier to migrate over to a multi-tenant model with SQL Azure Federations. Your application would still have to migrate to use SQL Azure federations, however cost savings would be greatly amplified for some apps.