Great Debate: Store Procedure vs. Dynamic SQL

Article
04/28/2009

This is one of those topics that always generate a lot of heat among developers, architects and DBAs etc. People typically take a very strong stance on one approach over the other.

As most of the IT situations, the answer to this question is : “It Depends”. It really depends upon the type of your application and what the application is supposed to do. We are only talking about OLTP applications here…for BI applications where heavy data churning is required, it is always advisable to keep the code in the DB layer to use SET based logic to do your processing logic or use an ETLM tool for the processing logic.

It depends on whether you are doing things dynamically (lots of ad-hoc queries really aren’t a good match for sprocs), it depends on how your org is run (do you have locked down DBs and strong DBAs, or do the “enterprise architects” with an interest in ORMs who win every argument). It depends on whether the app is read-heavy or write-heavy, on whether the DB is strict 3rd normal or de-normalized for reporting reasons. It depends on whether the DB schema makes any sense at all to devs (was it created for this application, or is it some hodge-podge that was created 15 years ago and is now used for 200 different reasons). It depends on a lot of things.

Store Procedure Advantages

Although you can do most of the things a stored procedure can do with simple ad hoc Transact-SQL statements, stored procedures have a number of advantages over ad hoc queries, including

Execution plan retention and reuse
Query auto-parameterization
Encapsulation of business rules and policies
Application modularization
Sharing of application logic between applications
Access to database objects that is both secure and uniform
Consistent, safe data modification
Network bandwidth conservation
Support for automatic execution at system start-up
ENCRYPTION option

Needless to say secondary benefits include:

Syntax checking (pseudo-compiled i.e. a kind of normalized query tree as it saves SQL resources)
Object dependency model
Schema buckets
Consistent monitoring
Permissions and Limitations
Ability to view exec plans
You can use sp_usage to generate stored procedure usage info
Set environmental options (e.g., NOCOUNT, LOCK_TIMEOUT, user options, @@NESTLEVEL, @@TRANCOUNT, SET XACT_ABORT, SET CURSOR_CLOSE_ON_COMMIT, SET TEXTSIZE, SET IMPLICIT_TRANSACTIONS etc.) and Built-In protection such as max allowed nested is 32 levels deep
Dash board reports
Temporary procedures (# for current connection, ## for global)
Replace your complex SQL statements with a single stored procedure

Strategic Advantage

Clear choice between Top Heavy vs. Bottom Heavy Architecture
Migration of core business logic and IP is easy as SQL has not changed much over decades vs. programming languages (COM to .Net). How many apps still need to be completely re-written from VB6 to VB.NET?
Intentional use of indexes to cover the query
Proven, almost no learning curve (compare to new technologies like LINQ/ADO.NET EF)
No exposure of database design schema in applications (encapsulation)
Bandwidth savings!! No SQL over the wire repeatedly.
DBA Control and Optimizations (Throttling and saving resources, Increased data security and integrity: One can secure the tables for direct access and only grant privileges on the stored procedures/packages.
By using bind variables (parameterized queries), SQL injection mitigate
Single Vendor DB

Let me elaborate on some key points:

Network Bandwidth

As the procedures/functions are stored in the database there is no need to transfer the code from the clients to the database server or to transfer intermediate results from the server to the clients. This results in much less network traffic and again improves scalability.

Assume that the application server(s) and the database server(s) are separate servers. Since the source code is actually stored on the database server, the application only needs to send the name and the parameters for executing it and this in turn reduces the amount of data that is sent to the server. When you use embedded SQL or dynamically generated SQL through an ORM, then the full source of commands must be transmitted and in a scenario where there is a lot of concurrent activity going on and the load on the system requires a lot of users, this can very soon become a performance bottleneck. This can be mitigate in part by a judicious usage of views as well.

Abstraction Layer (Encapsulation)

SPs help in separating the business logic from data manipulation logic. Since the interface to the application remains the same, changes done internally to the stored procedures/packages do not affect the application and in turn leads to easy deployment of changes.

Unit of Work (Batching)

It offers simplified batching of the commands. Since stored procedures/packages are meant to do a “Unit of Work”, this procedural logic can be simple to maintain and offers additional advantages like making use of the rich feature functionality that the database engines provide. SQL is a SET based language and using SET based procedural logic is the easiest and most performant way of dealing with the data. With every new release of Oracle, SQL Server or DB2 LUW, new features are being introduced in PL/SQL, T-SQL and/or SQL/PL which makes handling of different types of requirements very easy in the DB layer code.

Adhoc Work (Temporary SPs)

Temporary procedures are useful when you want to combine the advantages of using stored procedures such as execution plan reuse and improved error handling with the advantages of ad hoc code. Because you can build and execute a temporary stored procedure at run-time, you get the best of both worlds. For the most part, sp_executesql can alleviate the necessity for temporary procedures, but they're still nice to have around when your needs exceed the capabilities of sp_executesql.

Data-Centric Tracking

Code Instrumentation and tracing logic can be built in very easily using the stored procedures. This is one thing that we implemented for one of our clients recently. We created a table which had a list of the DB code that was being used in the schema and this table had a trace_flag column in it which could have 4 different values:

0 (no code instrumentation),
1 (log the sequence of events),
2 ( log the sequence of events and the time taken by those SQL statements),
3 ( log the sequence of events + the time taken + the execution plan from that point of time - since the execution plan can very easily be different at the time of execution under a load scenario vs when you actually run it separately)
4 (Start the trace - example: starting 10046 level 12 trace in the case of Oracle).

Using this logic, code instrumentation and troubleshooting production issues became very easy. One could then run reports against the data that was logged and present it to the end user or the support personnel. Code instrumentation can be done in the application tier as well using the same guidelines (or using logging blocks like MSFT logging block in .Net) and a good programmer would always instrument their code. However, for the DB code, this code instrumentation becomes a lot more easier to implement.

Store Procedure Challenges

Applications that involve extensive business logic and processing could place an excessive load on the server if the logic was implemented entirely in stored procedures. Examples of this type of processing include data transfers, data traversals, data transformations and intensive computational operations. You should move this type of processing to business process or data access logic components, which are a more scalable resource than your database server.

Maintenance and the agility of your application becomes an issue when you must modify business logic in T-SQL. For example, ISV applications that support multiple RDBMS should not need to maintain separate stored procedures for each system. If your application runs on multiple RDBMS, example: You are a vendor and you need to provide your product that runs on Oracle, SQL Server and DB2 LUW in order to expand your customer base, then in that scenario, you have to code or put fixes in for three different code bases. Not only that, you need to have proper staffing to ensure that the code written is optimal since the locking and concurrency mechanisms are very different between these RDBMS. Also, the language used by all these “big 3″ is very different as well.

Writing and maintaining stored procedures is most often a specialized skill set that not all developers possess. This situation may introduce bottlenecks in the project development schedule. We have seen client shops which offload all of their CRUD operations on the DB tier - as a result they end up with one-liner stored procedures and if you have say 200 objects, you now have 200 * 4 (select/insert/update/delete) stored procedures or one procedure per object with the flag option to indicate the operation and need to code the procedure to use dynamic SQL in order to take into account the conditional parameter logic. Maintenance becomes a nightmare in this case. Also, developers/DBAs sometimes get carried away with this approach and forget that SQL is a set based language - one example is that using this scheme, a client shop was doing purges of the data and kept calling the delete procedure by passing in the ID (surrogate Key) value when they actually could have purged and archived the data using a SET based logic. Procedures are supposed to do unit of work - having one liner procedures and invoking that many calls in a batch does not yield any benefit. In addition, it has to un-necessarily incur the cost of checking permissions and plan associated with that one liner procedure - the cost is albeit a very miniscule one.

Parsing of strings is not one of the forte’s of the DB languages - that code is better suited in the application tier unless you start using CLR or Java code.

Another thing to remember is when people point out that an advantage of stored procedures is that the code is always pre-compiled, that is not always true, and there can be scenarios that can lead to re-compilation. Also, if proper bind variables are being used for the queries built using an ORM, it serves the same purpose (i.e. has the same advantage as that of a parsed/compiled stored procedure query) since the plan for that SQL statement is parsed and compiled. Depending upon a lot of factors (cache sizes, concurrent load, statistics updates etc.), that plan may or may not be available the next time the call gets made.

Needless to say Versioning issues, Max Size 128MB, Max Parameters 102, Lack of Documentation - Naming Convention and Headers.

Before we wind up, let us also review compilation mechanism as people assume many things here.

First Time Execution Mechanics

When you execute a stored procedure for the first time, it's compiled into an execution plan. This plan is not compiled into machine code or even byte codes, but is pseudo-compiled in order to speed execution. "Pseudo-compiled" means that object references are resolved, join strategies and indexing selections are made, and an efficient plan for executing the work that the procedure is to carry out is rendered by the SQL Server query optimizer. The optimizer compares a number of potential plans for performing the procedure's work and selects the one it thinks will cost the least in terms of total execution time. It bases this decision on a number of factors, including the estimated I/O cost associated with each plan, the CPU cost, the memory requirements, and so on.

Conclusion

So, the bottom line is that one needs to carefully evaluate which portion(s) of the application really belong as stored procedure/packages. In applications that work on volume of data and do bulk data processing, it is always advisable to have that logic in stored procedures/packages that reside on the DB layer so that one can take advantage of the SET based approach of the SQL language.

One can also use ORMs (Object Relational Mappers) to prepare their data access strategy - one can then extend it to make calls to the database stored procedure (if so desired in certain situations), have externalized SQL statements, have dynamic SQL statements being prepared by the ORM etc.. One just has to make the right decision depending upon the application and the requirements that are needed.

A good mix of an ORM plus DB code is usually the best compromise to get the best of both the worlds. Deciding what goes where is the key and though there are guidelines on that, it can vary from application to application depending upon the requirements.

Great Debate: Store Procedure vs. Dynamic SQL

Additional resources