SQL Data Services goes full relational


A few days ago we announced the big news about SQL Data Services (SDS) switching to being a full relational database on the cloud.

I’ve been a strong supporter of this path for a number of reasons. Relational databases are very well understood and there is a large base of expertise for them in the market. Also, a lot of the existing applications and libraries out there are ready to run against a relational database, so SDS is enabling them to be ported to the cloud with minimum (or perhaps sometimes, no) effort.

With SDS going relational not only you get to reuse all your knowledge and codebase in the cloud, but you also get all the benefits of a cloud-based infrastructure: high availability, piece-of-cake provisioning, pay-as-you-go growth, etc.

One of the concerns I read about is the impact on scalability. My observation is that when you look at most of the storage systems in the cloud, they don’t have some magic formula for scalability, the trick is partitioning. Some systems are smarter than others in how they partition data and how dynamic the partitioning scheme is to adapt to varying system workloads. But in the end, you need to partition your data such that it’s spread across a bunch of nodes; if across your system you never (or rarely) depend on cross-partition operations, then you have a sustainable scalability path. That is independent of the actual organization of the data (e.g. relational, flexible entities, etc.) The ACE model on top of SDS had partitioning embedded in the model through scale units that surfaced as “containers”. In the new SDS world you can just partition your data across nodes, where each node has full relational capabilities. So it’s similar (partitioning), but each node gives you very rich ways of organizing and interacting with your data (full SQL!).

The other concern I heard is around TDS, the SQL Server client-server protocol, and how it would play in the Internet. In many cases the actual application that connects to SDS will be running in Azure as “web” or “worker” roles, and things should go smoothly. For the scenarios where the client is connecting to SDS from across the web, there are two challenges: firewalls and latency.

The server side of TDS by default listens in TCP port 1433, which a lot of firewalls will just block; furthermore, TDS is not HTTP, so a packet-inspecting intermediary could choose not to let the traffic through, regardless of the port number. This could certainly create some trouble that will need to be addressed at some point.

From the latency perspective, the short story is that I think it’s fine. TDS follows a simple request/response model, so interactions between clients and servers are straightforward and not chatty at all (things are more complicated when MARS is enabled, but that’s another story). We have experience tuning TDS for large WANs with high latency and things work out well as long as you optimize for those scenarios (e.g. batch queries together, etc.).

As a final note, there is the question about the SOAP/REST interfaces. In my opinion whenever you’re building the kind of rich applications that needs full SQL, rarely the data in the database can stand alone for direct access by consumers. Most of the time there is code on front (in the form of a middle tier) that manages access control, shaping, and even application-level constraints that don’t belong to the database. If you need a REST head on top of an SDS database, you can add ADO.NET Data Services to the equation, which will let you add all that logic fronting your data.

All in all, I’m really exited to see this happening. This gives Azure a whole spectrum of storage services, from blobs in Azure Blob Storage, to schema-less tables in Azure Table Storage, now to full relational with SQL Data Services.

-pablo

Comments (8)

  1. rogerj says:

    I doubt if any network admin or DBA who remembers the Slammer worm will open port 1433 for TDS at either end of the proposed connection.

    This means REST or SOAP, neither of which are fast, for bulk loading data from an existing data sources over HTTPS.

    Who’s going to write the SSIS package?

    –rj

  2. pabloc says:

    Ah, slammer…yes, I remember getting the call and racing through the weekend the day it hit. I was responsible for SQL interfaces back then.

    To be clear: you need to open a 1433 port *outbound* and to a well-known endpoint, so it’s not the same thing. I’m not arguing for it or anything, but wanted to highlight that these are different scenarios.

    We will see where we land with the whole protocol thing, we need to mature our understanding of it a bit.

    Regarding HTTPS, if you ignore the few initial lines of headers, HTTP/SSL transfers “pure” data with no obverhead over TCP when you send the body in a single block. If you use chunked encoding then you get a bit of extra overhead but get the benefits of chunking. In any case, then the raw data is traveling, it’s no different than any other SSL connection over TCP, including TDS with encryption enabled.

    Bottom line is that I don’t think TDS is the solution for all cases from the protocol perspective, but perhaps some folks are focusing too much on this aspect. Having a relational database in the cloud is great (biased :), we’ll address whatever connectivity issues come up over time. The SDS folks will react to customer feedback just like they did already.

    -pablo

  3. <blockquote>As a final note, there is the question about the SOAP/REST interfaces. In my opinion whenever you’re building the kind of rich applications that needs full SQL, rarely the data in the database can stand alone for direct access by consumers. Most of the time there is code on front (in the form of a middle tier) that manages access control, shaping, and even application-level constraints that don’t belong to the database.</blockquote>

    Well said.

  4. Simon says:

    "SDS world you can just partition your data across nodes" – Does that mean that a logical table cannot be stored across nodes?  Wouldn’t that be similar to sharding with multiple instances.  I probably don’t get the difference between a node and an instance

  5. a.v says:

    If I build an application with the approach that I will use Azure for deploying at a later stage and use SQL Server for my DB, can I am assume that ‘I will be able to migrate my DB to SDS’?

  6. pabloc says:

    @Simon: that’s right, if you need to have a logical table that spans nodes you’ll have to partition it (just like you’d have to do it with the current SDS model where you would have multiple containers).

    I used the word "node" loosely on purpose to avoid using any database or SQL Server specific terminology such as "instance". SDS folks will probably have good terminology once documentation is released to avoid confusion. Also an instance in SQL Server can contain multiple databases and creates a unit of management that spans multiple databases and perhaps applications. That may not be the right model for the cloud where all the physical database management "just happens" and you worry more about the "within the database" aspects.

    -pablo

  7. pabloc says:

    @a.v: the goal is to enable that sort of migration in many scenarios. I can’t say that you’ll always be able to, because there are certain things you can do in SQL Server that you won’t be able to do in SQL Data Services. Some of these limitations will go away over time, whereas others are intrinsic to the fact that this is a service and not a custom software installation.

    -pablo

  8. Random Azure .net data services question says:

    Is there any way to make azure queries that need to do string equals be case insensitive?

    Eg

    {http://myapp.table.core.windows.net/MyEntity()?$filter=tolower(MyEntityColumn) eq ‘somelowercasevalue’}

    Or

    var results = from u in _ServiceContext.Users

    where u.UserName.Equals(userName,StringComparison.CurrentCultureIgnoreCase) select u;

    Neither of these work – I get "The requested operation is not implemented on the specified resource." and I think it is due to "tolower"

    Thanks a bunch for any help you can provide.