Shipping Software Fast and Scoping It Right: Story of Sharding to Federations-v1 in 12 months

We first talked about SQL Azure Federation in Nov 2010 at PDC. We are getting ready to ship federations before the end of the calendar year 2011. It has been an amazing journey and with over a 100 customer in the federation technology preview program, we are fully charged. Thanks for a great deal of interest. You can still apply if you would like to get access to the early bits through the link in the following post; https://blogs.msdn.com/b/cbiyikoglu/archive/2011/05/12/federations-product-evaluation-program-now-open-for-nominations.aspx

One of the interesting stories that com up with federations is the scope we picked for the first release. Given we had multiple releases a year with SQL Azure, we picked an aggressive scope and chosen to iterate over the federations functionality. I get feedback about additional functionality folks would like to see but I believe we got the scope right for most folks with great data-dependent routing through USE FEDERATION and repartitioning operations through ALTER FEDERATION statements for v1.

For those of you asking for additional functionality I am happy to tell you that we’ll obviously continue to make improvement to federations after v1. I also would like to explain how we scoped v1 for federations. As a team we looked at many customer who are using the sharding pattern in SQL Azure and found many instances of the 80-20 rule.

- Most people wanted ALTER FEDERATION … SPLIT and not need MERGE: Turns out SQL Azure databases can already do scale-up elasticity with options to set the database size from 1GB to 50GB. Federation members can be set to any EDITION and MAXSIZE and take advantage of the pay as you go model. that mean a 50GB member that splits into 20GB and 30GB databases continue to pay the same $ amount as we only charge for the used GBs. In v1, we do not provide MERGE but give you DROP to get rid of your members if you don’t want to keep them around.

- Most people want to use the same schema in each federation member: Turns out most folks want to use the identical schema across federation members. Most tools will simplify schema management based around identical schemas even though it is possible to have completely different schemas across federations member if you want to. Independent schema per federation member also allow big data apps to update schema eventually as opposed to immediately (a.k.a in a single transaction).

- Most people wanted to control the distribution of data with federations: This is why we started with RANGE partitioning first as opposed to other flavors of partitioning styles such as HASH. Turns out, RANGE also provided easy ways to simulate HASH or ROUNDROBIN. RANGE also provides a reliable way to keep track of where each atomic unit lands as repartitioning operations take place and optimizes better for the repartitioning operations such as SPLIT.

- Most people wanted to iterate over atomic units from low-to-high and not high-to-low; This is why the range partitioning represents an low inclusive and high exclusive range for each federations member. A federation member reports its range through sys.federation_member_distributions view and range_low and range_high columns. The values of 100, 200 respectively for these columns indicate that 100 is included in the range of the federation member and value 200 reside in the adjacent federation member.

This is just a few but gives you the idea… Yes, we could have waited to add these all before we shipped v1 but that would hold up the existing feature set that seems to satisfy quite a few customers out there. Worth a mention that, we made sure that there isn’t any architectural limitation that would prevent us from adding support for any remaining items above and we can add them anytime.

We are motly done with v1 but it still isn’t too late to impact our scope. So keep the feedback coming on scope of v1 or vNext. Reach me; through the blog or email me at cihangib@microsoft.com

Many Thanks