Introduction to Fan-out Queries for Federations in SQL Azure (Part 1): Scalable Queries over Multiple Federation Members, MapReduce Style!

Happy 2012 to all of you! 2011 has been a great year. We now have federations live in production with SQL Azure. So lets chat about fanout querying.

What is a fan-out query?

Federations provide a model for partitioning parts of your schema over to multiple member databases for harnessing scalability of many nodes. However applications still need for querying all of the data across federation members. Fan-out is a technique for querying data in your federation, across many federation members. Fan-out queries are much like map/reduce in that it is formed in 2 parts;

  • Member query is the piece that is sent over to all members involved in the query and
  • Summary query is the query that is the post processing piece to allow condensing the results from the member query to desired final result-set.

With fan-out queries the member query is always there but summary query may not be needed. For example if you are simply doing DML (we’ll have some examples like data pruning or reference data management etc) or DDL (we’ll look at schema deployment in detail below), fan-out would only have a member query but no summary query is needed. It is only when you need post processing, you need the summary query.

Last I want to say that fan-out queries are not exotic animals. Member query generates a resultset that is fed into the summary query to allow post processing. This kind of staged processing of results is something that many SQL developers already do in their stored procedure logic. This kind of results pipeline is similar to cases where you choose to use temp tables or table variables for staging the results in similar ways. OR if you have used CTEs (common table expressions) or views, you are using them to stage your processing or abstract processing logic to multiple stages. The shape of fan-out queries is a deliberate breakdown of the query processing based on your federation key but have great similarity to other techniques built into TSQL.

In the rest of the post, I’ll give you a tour of a tool and sample code that lets you do fanout processing and detail how to write fan-out queries with examples. Lets drill in.

Fan-out Query Utility

The sample utility is there to show how easy it is to execute fanout queries. Source for the sample is available on the site. You can visit the live version of the tool here; Recommend using the deployment closer to your SQL Azure deployment for efficiency;

Americas Deployment: https://federationsutility-scus.cloudapp.net/
European Deployment: https://federationsutility-weu.cloudapp.net/
Asian Deployment: https://federationsutility-seasia.cloudapp.net/

The basic fan-out query utility page shows you an example of a tool that can do member query. We will cover other parts of the tool in future posts but member queries are dead simple so wanted to show this for those of you who are looking for ways to deploy schema, maintain ref data etc. Simply go under the Home Menu and select the Fan-out Query Utility to access the page;

image

The tool simply take a connection string, federation name a query and executes the query (or batch) in all federation members. You can also get to the source code for the page through the link.

the member query is executed with the button click event for the “Submit Fanout Query” . Lets quickly walk through the code first;

  • First the connection is opened and three federation properties are initialized for constructing the USE FEDERATION statement: federation name, federation key name and the minimum value to take us to the first member.

49: // open connection
50: cn_sqlazure.Open();
51:
52: //get federation properties
53: str_federation_name = txt_federation_name.Text.ToString();
54:
55: //get distribution name
56: cm_federation_key_name.Parameters["@federationname"].Value = str_federation_name;
57: str_federation_key_name = cm_federation_key_name.ExecuteScalar().ToString();
58:
59: cm_first_federation_member_key_value.Parameters["@federationname"].Value = str_federation_name;
60: cm_next_federation_member_key_value.Parameters["@federationname"].Value = str_federation_name;
61:
62: //start from the first member with the absolute minimum value
63: str_next_federation_member_key_value = cm_first_federation_member_key_value.ExecuteScalar().ToString();

  • In the loop, the app constructs and executes the USE FEDERATION routing statement using the above three properties and through each iteration connect to the next member in line.

67: //construct command to route to next member
68: cm_routing.CommandText = string.Format("USE FEDERATION {0}({1}={2}) WITH RESET, FILTERING=OFF", str_federation_name , str_federation_key_name , str_next_federation_member_key_value);
69:
70: //route to the next member
71: cm_routing.ExecuteNonQuery();

  • Once the connection to the member is established to the member, query is executed through the DataAdapter.Fill method. The great thing about the DataAdapter.Fill method is that it automatically appends the rows or merges the rows into the DataSet.DataTables so as we iterate over the members, there is no additional work to do in the DataSet.

76: //get results into dataset
77: da_adhocsql.Fill(ds_adhocsql);

  • Once the execution of the query in the current member is complete, app grabs the range_high of the current federation member. Using this value, in the next iteration the app navigates to the next federation member. The value is discovered through “select cast(range_high as nvarchar) from sys.federation_member_distributions

86: //get the value to navigate to the next member
87: str_next_federation_member_key_value = cm_next_federation_member_key_value.ExecuteScalar().ToString();

  • The condition for the loop is defined at line#71. Simply expresses looping until the range_high value returns NULL.

89: while (str_next_federation_member_key_value != String.Empty);

Fairly simple!

What can you do with Fan-out Query Utility tool?

Here are a few simple examples for member queries;

First, you can use the tool to do schema deployments to federations; To deploy schema to all my federation members, simply put a DDL statement in the query window… And it will run it in all federation members;

CREATE TABLE language_code_tbl(
id bigint primary key,
name nchar(256) not null,
code nchar(256) not null);

I can also maintain reference data in federation members with this tool; Simply do the necessary CRUD to get the data into a new shape or simply delete and reinsert language_codes;

TRUNCATE TABLE language_code_tbl
INSERT INTO language_code_tbl VALUES(1,'US English','EN-US')
INSERT INTO language_code_tbl VALUES(2,'UK English','EN-UK')
INSERT INTO language_code_tbl VALUES(3,'CA English','EN-CA')
INSERT INTO language_code_tbl VALUES(4,'SA English','EN-SA')

Here is how to get the database names and database ids for all my federation members;

SELECT db_name(), db_id()

Here is what the output looks like from the tool;

imageimage

Here are more ways to gather information from all federation members: This will capture information on connections to all my federation members;

SELECT b.member_id, a.*
FROM sys.dm_exec_sessions a CROSS APPLY sys.federation__member_distributions b

…Or I can do maintenance with stored procedures kicked off in all federation members. For example, here is how to update statistics in all my federation members;

EXEC sp_updatestats

For querying user data: Well, I can do queries that ‘Union All’ the results for me. Something like the following query where I get blog_ids and blog_titles for everyone who blogged about ‘Azure’ from my blogs_federation. By the way, you can find the full schema at the bottom of the post under the title ‘Sample Schema”.

SELECT b.blog_id, b.blog_title
FROM blogs_tbl b JOIN blog_entries_tbl be
ON b.blog_id=be.blog_id
WHERE blog_entry_title LIKE '%Azure%'

I can also do aggregations as long as grouping involves the federation key. That way I know all data that belongs to each group is only in one member. For the following query, my federation key is blog_id and I am looking for the count of blog entries about ‘Azure’ per blog.

SELECT b.blog_id, COUNT(be.blog_entry_title), MAX(be.created_date)
FROM blogs_tbl b JOIN blog_entries_tbl be
ON b.blog_id=be.blog_id
WHERE be.blog_entry_title LIKE '%Azure%'
GROUP BY b.blog_id

When do you need to think about a Summary Query?

Fanout queries with a simple member query is quite simple. However there are many cases you will need a summary query when writing fanout queries as well. Here is how you know that you need a summary query: If all you want to do it UNION ALL the results from member query, you don’t need a summary query. If I would like to do post processing such as order the whole resultset, aggregates data and get a grouping (GROUP BY) that does not align with the federation key, you need a summary query.

We’ll cover summary queries in a future “Part 2” post but as a teaser I’ll throw a few examples:

Imagine that I’d like to get the count of blog entries about ‘Azure’ that are created between Aug (8th month) and Dec (12th month) of the years.

SELECT DATEPART(mm, be.created_date), COUNT(be.blog_entry_title)
FROM blogs_tbl b JOIN blog_entries_tbl be
ON b.blog_id=be.blog_id
WHERE DATEPART(mm, be.created_date) between 8 and 12
AND be.blog_entry_title LIKE '%Azure%'
GROUP BY DATEPART(mm, be.created_date)

Here is the query results;

image

What’s wrong with this output? Well I was looking for a single row with full count for all blogs created on Aug 8th but I got a whole bunch of rows from each member.

Lets say I wanted to get the top 5 blog_ids with the busiest comment traffic. Here is TOP 5 query returning many more than 5 rows;

SELECT TOP 5 COUNT(*)
FROM blog_entry_comments_tbl
GROUP BY blog_id
ORDER BY 1

Just to make things exciting, I added the CROSS APPLY to sys.federation_member_distributions. that shows you the results are coming from many members. So the second column in the output below is the RANGE_LOW of the member the result is coming from.

image

Well, I think you get the idea. I’ll cover how we can process such queries using summary queries in Part II of the post.

So to wrap up, fan-out queries is the technique to execute queries across your federation members. With fanout queries, there are 2 parts to consider;

1- First, the part you execute in each federation member, that is the member query and

2- Second, the query part called summary query that is used to collapse the results from all members to a single result-set.

Yes this all sounds familiar: It is like map-reduce! The great thing about fan-out querying is that it can be done completely in parallel. All the member queries are executed by separate databases in SQL Azure. The down side is that you need to now consider your federation key and write 2 queries instead of 1.

Please let me know if you have feedback about that and everything else on this post; just comment on this blog post or contact me through the link that says ‘contact the author’.

Thanks and happy 2012!

Follow @cihangirb

If you like to read more about fan-out queries, here is Part 2 of the article.


*Sample Schema

Here is the schema I used for the sample queries above.

-- Connect to BlogsRUs_DB
CREATE FEDERATION Blogs_Federation(id bigint RANGE)
GO
USE FEDERATION blogs_federation (id=-1) WITH RESET, FILTERING=OFF
GO
CREATE TABLE blogs_tbl(
blog_id bigint not null,
user_id bigint not null,
blog_title nchar(256) not null,
created_date datetimeoffset not null DEFAULT getdate(),
updated_date datetimeoffset not null DEFAULT getdate(),
language_id bigint not null default 1,
primary key (blog_id)
)
FEDERATED ON (id=blog_id)
GO
CREATE TABLE blog_entries_tbl(
blog_id bigint not null,
blog_entry_id bigint not null,
blog_entry_title nchar(256) not null,
blog_entry_text nchar(2000) not null,
created_date datetimeoffset not null DEFAULT getdate(),
updated_date datetimeoffset not null DEFAULT getdate(),
language_id bigint not null default 1,
blog_style bigint null,
primary key (blog_entry_id,blog_id)
)
FEDERATED ON (id=blog_id)
GO
CREATE TABLE blog_entry_comments_tbl(
blog_id bigint not null,
blog_entry_id bigint not null,
blog_comment_id bigint not null,
blog_comment_title nchar(256) not null,
blog_comment_text nchar(2000) not null,
user_id bigint not null,
created_date datetimeoffset not null DEFAULT getdate(),
updated_date datetimeoffset not null DEFAULT getdate(),
language_id bigint not null default 1
primary key (blog_comment_id,blog_entry_id,blog_id)
)
FEDERATED ON (id=blog_id)
GO
CREATE TABLE language_code_tbl(
language_id bigint primary key,
name nchar(256) not null,
code nchar(256) not null
)
GO