First experiments with (new) SQL Data Services

Last week I got my new login to the new SQL Data Services. As a reminder for all readers:

SDS accelerates its plans to offer relational capabilities

May 11, 2009 - Based on customer feedback, SDS has accelerated its plans and will be offering true relational capabilities through SQL Server’s existing network protocol, Tabular Data Stream (TDS) and existing query language Transact-SQL (T-SQL). This will provide customers direct access to the familiar relational model, T-SQL programming language and the existing development and management tools, while continuing to deliver on our key value props of fault tolerance, high availability, friction free provisioning and pay as you grow scaling. For more information, see the SDS product site and the MSDN Library .

 

What I’ve done? After some initial “hello world-ish” tests, I wanted to try something more interesting so I decided to port IssueTracker into SDS.

As you know, IssueTracker was originally designed for SDS’ previous ACE model (Authority, Container, Entity), so my first task was to re-write the data access layer to use SQL Server.

One of my goals in this experiment was to test SDS “impedance match” with on-premises SQL Server. Also, I wanted to develop independently of the availability of SDS. Not that SDS is unreliable, but currently it is available only inside Microsoft’s corporate network. I didn’t want to VPN into corpnet for this when working from home.

So I chose to develop exclusively against my local SQL Express instance first and then make a switch to the real SDS.

Fortunately, the app was designed with a couple of layers that isolated the persistence details, so writing the new data tier was a fairly mechanical process.

This diagram roughly captures the architecture:

clip_image001clip_image002

The repository classes implement a common interface the app uses, the Model is just a collection of rather simple C# objects with no knowledge of the database being used. The Mappers are responsible for the transformations between the application model and the entities that do have knowledge of the database.

In the diagram, classes marked with * are new, the numbers indicate variability points in the implementation, meaning that I can switch between one implementation and the other. Because I used LINQ to SQL, the types in the box labeled as “SQL Model” were generated
automatically by the LINQ to SQL designer.

When my unit tests compiled again, I switched the connection string to point from the “.\SQLEXPRESS” to the SDS instance in our network and…it worked! First attempt! 

image

 

Overall, it was a rather painless and pleasant experience. Of course the data model in the app is simple and I’m not using any advanced queries or any sophisticated features in SQL yet.

 

Things missing and Possible next steps:

The original implementation had 2 requirements that leveraged features in SDS previous ACE model:

1- Multi-tenant isolation: achieved through containers. Each tenant got its own container.

2- Schema flexibility: tenants could customize the application, extending the schema of some core entities. Flexible entities made this very easy, because they are essentially property bags.

 

For #1, I considered two options:

1- Partitioning by tenant

2- Do not partition at all and have all tenants on the same database (single-instance, multi-tenant)

The first option is fairly straight forward. Each tenant gets its own database that is created at provisioning time. The “tenant id” is part of the calling context in the application, so I dynamically connect to each database as needed. Two advantages of this approach: there’s high isolation between tenants (no data from one can leak into another), and the application code is simpler, because from the data perspective, the application is “single-tenant”.

I haven’t implemented the extensibility feature yet, but I’m planning on reusing some techniques we did some research on in the past, probably through extension tables.

 

There’re other interesting areas for research such as:

1- Strategies for partitioning: in discussions with Ryan, he suggested I should consider more sophisticated ways of partitioning the information: by tenant, by tenant + project, etc. and I agree this would be interesting .

2- Unit of Work: currently I’m simply reusing the original ACE implicit UoW that comes with each interaction. This is, each time you called Create, Delete or Update on SDS, the operation was completed in the context of a unit of work. You could not logically group multiple operation (say, 2 creates and 1 delete). This is suboptimal with the SQL implementation, because the new SDS supports transactions and I would like to leverage that.

3- Performance and scalability issues: I haven’t spent any time looking at the application’s “chattiness” with the database that might lead to degraded performance, or any other data access optimizations. This is a whole area in itself, but not very different from “regular” application development. The only exception perhaps is that, in theory at least, the app and the database can be hosted in different datacenters (say the app in Amazon and the data in SDS). I’m not sure that would be a good idea anyway, probably not for this scenario. If the app was hosted in Windows Azure and used SDS, then they would be close in terms of network distance (low latency & high bandwidth).