2009 – Data, Design, and SQL Server

December 2, 2009January 9, 2019

Query performance, scalar UDFs, and predicate pushdown

Recently I had to troubleshoot a query that performed much slower than expected. The solution to that seemed sufficiently interesting to warrant a write-up, and the root cause of the problem reinforced a well-known best practice of avoiding scalar user-defined functions in set-based queries.

The original query looked like this:

SELECT 'Literal1', u.Col2 FROM dbo.Table1 AS u INNER JOIN dbo.Table2 AS l ON u.Col2 = l.Col2 WHERE l.Col3 = 'Literal2' AND NOT EXISTS ( SELECT 1 FROM dbo.Table2 AS l2 WHERE u.Col2 = l2.Col2 AND Col3 = 'Literal1' ) AND dbo.UDF1(u.Col1) IS NOT NULL;

Table1 had about 3 million rows, and Table2 had about 7 million rows. In its original form, the query was running for more than one hour, while the requirement was that it would complete within minutes.

The scalar user-defined function (UDF) in the WHERE clause immediately stood out as a warning sign (highlighted). Using scalar UDFs in set-based queries is generally not recommended for two reasons:

The function is invoked for each row in the set, and each invocation has a fixed and not insignificant cost, which is in addition to the cost of the statements in the function body.
The optimizer cannot accurately estimate the cost of the function (which could vary greatly), therefore it may come up with less than optimal query plans.

For the query above, removing the UDF from the WHERE clause brought down the execution time to 12 seconds. Looking at the query plan, it was clear why it took so long to execute the original statement – a full clustered index scan of Table1 (all 3 million rows of it) was followed by a filter operator, where the UDF was applied to the Col1 column of Table1. In other words, rather than first reduce the set via joins with Table2 and then apply the UDF to that much smaller set, the optimizer chose to first filter millions of rows with the UDF. This is a fairly typical example of a sub-optimal plan caused by an inaccurate estimate (reason #2 above), which was in turn caused by the usage of an “opaque” scalar UDF.

The obvious workaround for this problem would be to remove the UDF-based filter from the WHERE clause, insert the resulting intermediate result set into a temporary table, and then select from that temporary table, filtering the much smaller intermediate result set with the UDF. However, I wanted to avoid a procedural approach, which would be more complex and less robust than a single set-based statement.

The first attempt to come up with a single statement that avoids the original problem looked like this:

SELECT 'Literal1', up.Col2 FROM ( SELECT u.Col1, u.Col2 FROM dbo.Table1 AS u INNER JOIN dbo.Table2 AS l ON u.Col2 = l.Col2 WHERE l.Col3 = 'Literal2' AND NOT EXISTS ( SELECT 1 FROM dbo.Table2 AS l2 WHERE u.Col2 = l2.Col2 AND l2.Col3 = 'Literal1' ) ) AS up WHERE dbo.UDF1(up.Col1) IS NOT NULL;

Here we have a subquery in the FROM clause, and apply the filter with the UDF “on top” of the subquery, in the WHERE clause of the outer query. The intent is to evaluate the subquery first, before applying the filter. Unfortunately, this does not work – this query is logically equivalent to the original query, and SQL Server is at liberty to evaluate the predicates in the order that it deems optimal, which in this case still means filtering millions of rows in Table1 with the UDF-based filter, before joining with Table2.

Here is the query that did work (changes to previous attempt are highlighted):

The reason this succeeds in forcing the optimizer to evaluate the UDF filter after reducing the row set via joins is because we pass an expression, rather than a base column, to the function. The passed expression in this case is logically equivalent to the base column, but because it contains a reference to a column from a seemingly pointless outer query, the optimizer cannot push down the UDF-based filter to the inner subquery. In the plan generated for this query, the UDF executes only against a few hundred rows produced by the inner subquery, and the statement completes in 30 seconds – well within required time. Note that it wouldn’t be sufficient to use a literal empty string as the second part of the expression – to avoid predicate pushdown, it has to be a column from an outer query.

As a conclusion, this could be a useful query tuning technique for the specific case when the optimizer pushes down a filtering predicate (not necessarily UDF-based), and the resulting plan is less optimal than the one where the filter is evaluated later in the plan.

November 5, 2009January 9, 2019

Reading database transaction log with fn_dump_dblog()

While the format of SQL Server transaction log is not publicly documented, there is a number of ways to view the contents of the log. This is sometimes necessary for troubleshooting and forensic purposes. One way is the DBCC LOG command. Another is fn_dblog() table-valued function. Both are undocumented, however you can easily find unofficial documentation on various web sites and blogs.

If you use Intellisense in SSMS 2008, you may notice another function, called fn_dump_dblog():

This function provides the same rowset as fn_dblog(), but has some interesting functionality that makes it useful is some troubleshooting and recovery scenarios. Specifically, it can read not only transaction log of the current database, but also transaction log backups on either disk or tape.

The first two parameters, @start and @end, can be used to filter the output by a range of LSN values (you need to make sure that the LSN values start with ‘0x’). This is similar to fn_dblog(). If you use default values for all other parameters, the fn_dump_dblog() function behaves just like fn_dblog(), returning a rowset over the log of the current database.

The third parameter, @devtype, is where the differences between fn_dump_dblog() and fn_dblog() start. This parameter determines the type of backup device. You can specify either ‘DISK’ or ‘TAPE’. ‘DISK’ is the default.

The fourth parameter, @seqnum, is an integer that can be used to specify a particular backup within a backup device, when the same device has been used for multiple backups. Most backup devices contain only one backup, so usually this will be 1.

The fifth parameter is the fully-qualified path to the backup file on disk, when backup device type is ‘DISK’. Note that Intellisense displays this parameter simply as @fname1. Note that the SQL Server service account will need read permission on the backup file.

The sixth parameter, displayed as @fname2, can be used to provide the name of a backup file in the default backup directory for the SQL Server instance.

All other parameters appear to be unused (please post a comment if you find otherwise). Update 2012-05-18: Paul Randal just blogged about the fn_dump_dblog() function – the rest of the parameters are used to specify multiple media families in a media set used for a log backup, i.e. a log backup striped over multiple backup files.

The fn_dump_dblog() function exists in SQL Server 2005/2008. Do note that the function is undocumented – use it at your own risk.

Update 2012-02-08: I just found out that the function can also work against database backups, not just log backups. A database backup contains a portion of the log that will be rolled forward on restore to make the restored database transactionally consistent, and that portion of the log can be viewed with the fn_dump_dblog() function. This is potentially useful to discover the LSN and the timestamp of the checkpoint that occurs during the backup – look for LOP_END_CKPT in the Operation column.

September 16, 2009January 9, 2019

Transaction count during DML statements

Recently I was troubleshooting a blocking problem where a number of sessions were blocked, waiting to acquire a page latch. While blocking was occurring, there were several rows in the output from sysprocesses that looked like this (only relevant columns are shown):

spid status blocked open_tran waitresource cmd lastwaittype

—— ———- ——- ——— ————- ——- ————-

1001 suspended 1294 2 8:6:792624 INSERT PAGELATCH_UP

In this case, session 1001, executing an INSERT statement, is waiting to acquire a latch on page 792624, which happens to be a PFS page (792624 / 8088 = 98, a whole number of PFS intervals). While this may provide a clue as to the cause of blocking, this is not the main topic of this post.

Note that the value in the open_tran column is 2. The open_tran column is described in documentation as the “number of open transactions for the process.” The intuitive conclusion from this is that session 1001 has two explicit transactions open, one nested in the other. However, this system uses stored procedures exclusively, and a review of all stored procedures that insert rows did not find any code that used explicit nested transactions.

After some research, I found that explicit nested transactions are not the only reason why the transaction count can be greater than 1 during execution of a DML statement. Consider the following code fragment:

CREATE TABLE T1

(

Col1 int

);

INSERT INTO T1

SELECT @@TRANCOUNT;

SELECT Col1 FROM T1;

UPDATE T1 SET

Col1 = @@TRANCOUNT

WHERE Col1 = 2;

SELECT Col1 FROM T1;

DELETE

FROM T1

WHERE Col1 = @@TRANCOUNT;

SELECT Col1 FROM T1;

Here’re the results, with comments added after the output from each statement:

(1 row(s) affected)

^^ INSERT statement ^^

Col1

———–

(1 row(s) affected)

^^ first SELECT statement ^^

(1 row(s) affected)

^^ UPDATE statement ^^

Col1

———–

(1 row(s) affected)

^^ second SELECT statement ^^

(1 row(s) affected)

^^ DELETE statement ^^

Col1

———–

(0 row(s) affected)

^^ third SELECT statement ^^

This shows that during execution of a DML statement that is not within any explicit transaction, there are actually two open transactions reported. The results are the same if instead of @@TRANCOUNT we use the open_tran column from sysprocesses, or the open_transaction_count column from the sys.dm_exec_requests DMV. Effectively, in addition to the one transaction always associated with any DML statement, there is another nested transaction opened internally by SQL Server, lasting for the duration of statement’s execution. This behavior occurs on all recent versions of SQL Server, starting with SQL Server 2000 (I did not test on older versions).

To be clear, the second transaction is open only while a DML statement is executing. The @@TRANCOUNT function (as well as sysprocesses and sys.dm_exec_requests) behaves as expected if used in a non-DML statement in procedural T-SQL code, which is the typical use case.

So as a practical matter, if you see the number of reported open transactions that is greater than expected, consider the context where that number was obtained, before concluding that it must be due to explicit nested transactions being used.

August 31, 2009January 9, 2019

Disjoint subtyping in SQL

Disjoint subtyping is a scenario that is often encountered in data modeling. In one frequently used modeling approach, an entity of a certain type is represented by a database table, and each subtype of this entity is represented by another table. Subtyping is disjoint if an instance of a type corresponds to at most one instance of a subtype. For example, we may have a table named Animal, and three other tables named Extinct, Living, and Mythical (perhaps some would argue that these are not really disjoint, but let’s ignore this for now). In this example, these three tables represent entities that are disjoint subtypes of the Animal type.

When implementing disjoint subtyping in SQL, it is necessary to enforce the rule that for each row in the type table, there is one related row in at most one subtype table. It is possible to implement this rule declaratively by creative use of foreign key and check constraints, as described by David Portas in his blog.

The other day it occurred to me that one could simplify this implementation a little bit. Rather than add single-value subtype columns to each of the subtype tables, each such column with a check constraint and a default constraint, one could use SQL Server computed columns. This way, there is no need to declare the check and the default constraints, and the column could be ignored for all intents and purposes, while still enforcing the disjointness data integrity rule.

With this modification, David’s schema becomes as follows (change highlighted):

CREATE TABLE Products

(SKU INT NOT NULL PRIMARY KEY,

ProductType CHAR(1) NOT NULL

CHECK (ProductType IN (‘B’,‘C’,‘D’ /* Book, CD or DVD */)),

Title VARCHAR(50) NOT NULL,

UNIQUE (SKU,ProductType));

CREATE TABLE Books

(SKU INT NOT NULL PRIMARY KEY,

ProductType AS ISNULL(CAST(‘B’ AS CHAR(1)), ”) PERSISTED,

Pages SMALLINT NOT NULL,

FOREIGN KEY (SKU,ProductType) REFERENCES Products (SKU,ProductType));

(I’m omitting the CDs and DVDs tables for brevity.)

Note that the computed column still needs to be persisted – SQL Server will refuse to create the foreign key otherwise – so this approach can only be used with SQL Server 2005 and later. I also had to explicitly cast the column to match the data type of the referenced column. The ISNULL() function makes the column not nullable in the table’s metadata. Since the column actually cannot have NULLs, this might avoid some confusion.

Update 2009-10-19: Here is a Connect suggestion to add that support in T-SQL.

August 30, 2009January 9, 2019

Surrogate keys in distributed databases

In this post, the term “distributed database” refers to a set of SQL Server databases, each managed by a SQL server running on a separate computer. All databases have identical schemas, and data that originates in one database is replicated to all other databases, or nodes. A common example would be a system with a central database server and a number of remote machines used by field personnel, each with a local database replica. Data replication in such system can be implemented using a number of techniques, for example, merge replication, peer-to-peer replication, or Sync Services for ADO.NET.

A common problem arising during design of such distributed databases is surrogate key generation. Keys must be unique across the entire distributed database, rather than unique just within a particular database node. A common approach used in practice is to use GUID columns as keys. While straightforward and simple from a developer’s perspective, this however has a big disadvantage – the random nature of GUIDs quickly leads to extensive index fragmentation in the database. Additionally, the size of GUID keys is four times larger than the size of integer keys, leading to corresponding increase in index size.

An alternative approach is to use compound two-column keys: one column identifies the database node, while the other column identifies a row within a table on that node. The combination of two columns creates a key that is unique across the distributed database. This works reasonably well, however using compound keys may be somewhat awkward in practice: for example, a table with multiple foreign keys that reference such compound keys will have twice the number of foreign key columns. Storage issues aside, the likelihood of developer confusion and errors would be higher if this approach is used.

In this post, I will present a method of generating single column keys that avoids these problems. The method is based on combining multiple values of smaller numeric data types into a single value of a larger numeric data type.

We have two values to be combined: the database node identifier (DBNodeID), and the row identifier for a particular table on a particular database node (RowID). Let’s assume that both are integer values, which would not be unreasonable in practice. Each integer uses four bytes of storage, so to combine the two values without loss of information, we need eight bytes. We will use bigint as the data type for the combined value, which does require eight bytes of storage. To combine two integer values into one bigint value we will use a technique called bit shifting.

Here’s an example. Let’s say we need to pack DBNodeID 2 and RowID 14 into a single bigint value. In bit representation, these two values appear as follows:

DBNodeID (2): 00000000 00000000 00000000 00000010

RowID (14): 00000000 00000000 00000000 00001110

Using bit shifting, we shift the bits of the first integer to the left, into the two high words of the bigint, and use the bits of the second integer for the two low words of the bigint. Here’s the result:

DistributedRowID: 00000000 00000000 00000000 00000010 00000000 00000000 00000000 00001110

In decimal, this corresponds to 8589934606 – a single bigint value that can be used as a key value for a row. This method will generate values that are derived from both DBNodeID and RowID values, and are guaranteed to be unique across the distributed database. In a sense, this method is similar to the two-column compound key approach mentioned earlier, with the advantage that only one key column is needed.

So how can we implement this bit shifting operation in SQL Server? Left-shifting a value by N bits is equivalent to multiplying that value by 2^N. This means that in order to pack the DBNodeID integer into the two high words of a bigint, we need to multiply it by 2^32 (there are 32 bits to represent an integer). Once the DBNodeID integer is packed in the two high words of a bigint, we add the second integer (RowID) to the result to obtain the key value. Here’s a T-SQL example (assuming SQL Server 2008 from here on):

DECLARE @DBNodeID int = 2;

DECLARE @RowID int = 14;

SELECT @DBNodeID * POWER(CAST(2 AS bigint), 32) + @RowID AS DistributedRowID

There are multiple ways to implement this approach in a particular database design. One is to have a table, replicated to each database node, to be used as a key generator for all tables that have distributed surrogate keys. Switching to T-SQL again:

CREATE TABLE dbo.KeyGenerator

(

DBNodeID int NOT NULL,

TableName sysname NOT NULL,

RowID int NOT NULL,

DistributedRowID AS ISNULL((DBNodeID * POWER(CAST(2 AS bigint), 32) + RowID), 0),

CONSTRAINT pkKeyGenerator PRIMARY KEY (DBNodeID, TableName)

);

We can populate this table with data for a hypothetical three-node two-table distributed database:

INSERT INTO dbo.KeyGenerator (DBNodeID, TableName, RowID)

VALUES

(1, ‘Table1’, 1),

(2, ‘Table1’, 1),

(3, ‘Table1’, 1),

(1, ‘Table2’, 1),

(2, ‘Table2’, 1),

(3, ‘Table2’, 1);

SELECT * FROM dbo.KeyGenerator produces the following (note the computed key values in the last column):

DBNodeID TableName RowID DistributedRowID

———– ———- ———– ——————–

1 Table1 1 4294967297

1 Table2 1 4294967297

2 Table1 1 8589934593

2 Table2 1 8589934593

3 Table1 1 12884901889

3 Table2 1 12884901889

If an application needs to insert a row into Table2 on database node 1, it can run the following UPDATE query to obtain the key value for the new row, and increment the corresponding RowID value in a single statement:

DECLARE @NewDistributedKey bigint;

UPDATE dbo.KeyGenerator SET

RowID += 1,

@NewDistributedKey = DistributedRowID

WHERE DBNodeID = 1 AND TableName = ‘Table2’;

SELECT @NewDistributedKey;

The selected value is 4294967297. Executing SELECT * FROM dbo.KeyGenerator one more time produces this result:

DBNodeID TableName RowID DistributedRowID

———– ———- ———– ——————–

1 Table1 1 4294967297

1 Table2 2 4294967298

2 Table1 1 8589934593

2 Table2 1 8589934593

3 Table1 1 12884901889

3 Table2 1 12884901889

Note that the RowID and DistributedRowID values for the second row have been incremented by 1, so the next time the UPDATE query is executed, it will obtain 4294967298 as the next key value for the Table2 table.

In summary, using the approach presented in this post, you can implement a distributed database system that uses single-column numeric surrogate keys, instead of widely used but problematic GUIDs, or more cumbersome compound keys.