I have been playing around with SQL lately, trying to get it to sit and roll over. Sure, I’m familiar with all the normal paradigms; all the usual crud. I can select with the best of them, slicing and dicing data with the efficiency of a ginsu knife. But I’ve never actually been much of a SQL developer, not really big on stored procedures and what not. I know there out there, but have never had the need to build any large systems using them. That’s kind of hard to admit, given my background with data access at Microsoft. But that’s just the thing, I had been so focused on access to the server, API’s, sending queries, receiving data and at best invoking procedures, that I seldom found the need to think about the problem from the other side, building apps from the inside out.
Yet, working on programming languages instead of API’s has given me a different perspective. I now see the server in a different light.
I can illustrate this best by walking through an example I was thinking about many weeks ago. I was working on another translation problem, converting a higher-level abstraction into a database query. In this particular case, what I wanted to do was generate a SQL delete statement. It was the closest match to what I wanted to represent, but I also wanted to make sure that what I was doing was general enough. Two years of designing ObjectSpaces put a lot of funny ideas in my heads; lots of oddball scenarios that unfortunately happen all the time with real data.
Logically, I had information that was one piece like a row of a table, but physically this data was stored in two separate tables. Both of these real tables were related to each other in a one-to-one fashion, each sharing the same primary key. I suppose people do this all the time with O/R layers or even just with views in the database.
So what I wanted to do was delete a logical row from the database, but since there were two base tables involved I would have to actually generate two separate delete statements. So far, not so bad, but then I had this horrible thought. Since I also wanted to retain the set-ness of the SQL delete statement over my logical structure, (I wanted the option to be able to delete many of these things given a constraint), I would have to duplicate this constraint on each actual delete that I generated. This sounded like a good idea at first, but then I realized it could go horribly wrong. If the constraint itself reference information out of both base tables, there would be no way to execute both deletes. Once the first delete executed, the second one would always fail.
Imagine you had a logical view called Customers, which was broken down into two actual tables called Customers1 and Customers2; with things like the name and phone number in the first one and their address stored in the second one.
If you wanted to delete all customers with a specific phone prefix you could write:
delete c2 from Customers1 c1, Customers2 c2 where c1.Id = c2.Id and c1.prefix = @p
delete c1 from Customers1 c1 where c1.prefix = @p
Also, if you wanted to delete all customers from a particular city you could write:
delete c1 from Customers1 c1, Customers2 c2 where c1.Id = c2.Id and c2.city = @p
delete c2 from Customers1 c2 where c2.city = @p
Notice how I had to reverse the order of the deletes here, depending on which columns were used in the constraint. This required me to be particularly clever about what was in the query into order to determine this, but it’s not really the order that was at issue. If I could not figure out the order, I’d be stuck.
I know that this is what triggers were made for, etc, etc, etc. You’ll just have to go with me on this one and believe that a trigger was not a viable solution.
What really struck home was when I took the query to the next logical step. What if I wanted to delete all customers in a specific city with a specific prefix. Now the query actually spans both sides.
delete c2 from Customers1 c1, Customers2 c2 where c1.Id = c2.Id and c1.prefix = @prefix and c2.city = @city
delete c1 from Customers1 c1, Customers2 c2 where c1.Id = c2.Id and c1.prefix = @prefix and c2.city = @city
Now, no matter the order, the pair would never succeed. The first one to execute would get rid of data required to fire the next one. So I had a problem that I needed a solution for. I needed a guaranteed way to have mutually assured destruction in both tables.
I figured there were a few ways to solve this. First, I could always generate a temp table with all the ID’s that needed to be deleted. This would indeed work, but I worried about the overhead of generating that table. What if there were a billion rows?
So I decided the best thing for me to do here was use a cursor. I could create a cursor over the logical view using the constraint and while iterating the cursor delete each one independently. Of course, this would not lead to an atomic delete, but since I was splitting them into two anyway that did not concern me.
declare c cursor for select id from customers1 c1, customers2 c2 where c1.id = c2.id and c1.prefix = @prefix and c2.city = @city
declare @id int
fetch next from c into @id
while @@fetch_status = 0
delete from Customers1 where id = @id
delete from Customers2 where id = @id
fetch next from c into @id
Now, I’m not too sure about the perf of this guy relative to a pair of delete statements, but semantically it did exactly what I wanted.
After I was done, I looked at what I had written and the thought suddenly hit me; this is code. I mean it; this stuff is code like IL instructions are code. I had taken this logical set-based idiom and converted it into a for-loop with a couple statements inside. I was doing step-wise processing now.
Of course, it’s always been there in the server just like this for writing stored procedures and functions. People have doing this kind of stuff long before I’d ever touched a computer. But now I was thinking less about the server as a storage dump for data and now more as a data runtime; a data processor if you will.
What I started out wanting to do was translate a delete operation so I could send it remote to the server. I was thinking that the only semantics interesting to translate were queries and updates. Now I’m thinking the unification between normal programming languages and data services should go much deeper. It should involve all of the language.
Maybe I should just translate the whole program and send it lock stock and barrel to the server.