How Will LINQ Impact Database Development Best Practices?

Ever since I first heard about the LINQ project, I've been wondering how this would fit in with our best practices for database development. Slowly, I think I've arrived at an interim conclusion. Since this posting deals with future developments in best practices, it's entirely speculative, and it may very well turn out that I got it all wrong. In any case, I'll stick my neck out:

For a long time now, the established best practice for accessing SQL Server databases has been to use stored procedures exclusively. Although there are many good reasons for doing this, I've alway particularly thought that the ease with which you can secure a database and still keep it functional was one of the most compelling reasons: You simply create a database role that has execute rights on all the stored procedures in the database, and no other rights (not even read rights to the tables). This very effictively ensures that unless you are a dbo (or some other, high-privilige account), you can only access the database through its stored procedures.

Does this sound familiar? To me, this sounds an awful lot like a service boundary. Let's, for a brief moment, examine how the four tenets of service-orientation maps to the exclusive use of stored procedures in a database:

  • Boundaries are explicit. Indeed, if you can only access the database through a set of stored procedures, the boundary for the database becomes very explicit. In a sense, you can say that they create a shell around the innards of the database, protecting its schema and implementation against external clients.
  • Services are autonomous. As long as you don't change the stored procedures, you can version, update and deploy the database independently of its clients.
  • Services share schema and contract, not class. Any client only needs to know the shape and name of every stored procedure defined in the database. How these stored procedures, and the database schema itself, are implemented, are unknown to the client. As long as you speak (T-)SQL, you can communicate with the database.
  • Compatibility is based upon policy. Uhm, well, I can't really think of an analogy here, but three out of four isn't bad, I think :)

In many cases, there are lots of good reasons to model your database in this way. Historically, many databases were (and are still) effectively entire applications in their own right, complete with business logic etc. Against such a database, any client deals with the entire database application as a monolithic entity. Typically, such databases are the core of legacy line-of-business applications, and besides supplying data for a UI, integration to other back-end systems is also implemented at the database level. When the database is the core of the application, it makes a lot of sense modeling it as a service.

That, on the other hand, is not really the vision for SQL Server.

Although a very capable database platform, SQL Server is mostly positioned as a key but equal component of a distributed architecture. This was true back in the days when we called it Windows DNA, and is still true in relation to today's SOA hype. For at least ten years, it has been best practice to implement business logic in 'real' programming languages and deploy this in a tier separate from the data tier. This lessens the importance of the database: Although it is still possible to implement application-to-application integration at the database level, this would bypass the business logic layer, so this suddenly becomes much less desirable.

This trend is only enhanced by the whole SOA movement, where a core tenet is that the implementation of the service (that also means the database schema) should be protected by the service boundary (this even leads to such anti-patterns as the CRUDy interface).

Today, although virtually all enterprise applications use a database, the tendency is increasingly to model it as a proprietary data persistence store (a bit bucket). This is a natural evolution stemming from the forces outlined above. There's also an increasing adoption of ORM tools, and although some of these allow O/R mapping against stored procedures, many of them work by issuing dynamically created SQL. The interesting part about this is that most development organisations don't really have a problem with this, and in fact, they are happy with their ORM tools.

If you begin to think about databases as bit buckets, this shouldn't be at all surprising.

LINQ fits very well into this trend. With respect to database access, one of the most powerful features of LINQ is deferred execution. What this basically means is that LINQ will not issue the T-SQL query against the database before you actually begin iterating over the query result. This could have some very nice performance implications, because, if used wisely, every query will only fetch excactly the data that is needed at that particular place in the application, and nothing more. Obviously, this means that to harvest the full advantages of LINQ, you will have to accept dynamic SQL against your database (at least for read queries).

When I first heard that, I instantly recognized this as a conflict against established best practices. Although LINQ blew me away, I knew I had some thinking to do before I could reconcile the two. As should now hopefully be clear, I'm increasingly beginning to think that the established best practice is waning, and that a new best practice will emerge. What exactly that new best practice will be, I don't know yet, but it's going to be interesting to find out!