Advanced Performance Tuning – 1 :: Importance of data-types

Why did we develop DBMS systems? Not just for storing data – but also to be able to search for and retrieve stored data, and we define what our search criteria should be.

This was probably true a couple of decades back, but we have come a long way now. With the huge advancement of technology and the computing powers being doubled probably a few hundred times, we are now not satisfied with just being able to retrieve stored data; we want to retrieve data faster, we want our operations to complete in the least amount of time possible.

All of us are aware of the fact that queries must be tuned to achieve optimal performance, some of the common ways of tuning database systems include creating appropriate indexes, updating statistics, rebuilding indexes to reduce fragmentation. However, sometimes we tend to forget that each database system is just another piece of code running on a bunch of dumb hardware; unless we design our databases properly, unless we write efficient code, we will never be able to achieve the speeds we desire. In this and subsequent posts, I will try to explain the most common mistakes that I have experienced, and how these can affect system performance.

In this post, I will try to explain the importance of appropriate data-types. To start off, let’s create a new table and populate it with data:

 CREATE TABLE [dbo].[Employee](
       [EmpID] [int] IDENTITY(1,1) NOT NULL,
       [EmpFName] [varchar](50) NOT NULL,
       [EmpMName] [char](2) NULL,
       [EmpLName] [varchar](50) NULL,
       [EmpSal] [numeric](18, 2) NOT NULL,
       [EmpAddress] [varchar](250) NULL,
       [EmpCountry] [char](2) NOT NULL,
CONSTRAINT [PK_Employee] PRIMARY KEY CLUSTERED 
([EmpID] ASC))
GO

We now, insert 1,000,000 rows of data into the table. We have the following Stored Procedure:

 CREATE PROCEDURE sp_GetAvgSal
    @Country NCHAR(2)
AS
    BEGIN
        SELECT AVG(EmpSal) FROM Employee WHERE EmpCountry = @Country
    END

We will execute this Stored Procedure and see how much time it takes:

 SET STATISTICS IO ON
SET STATISTICS TIME ON
SET STATISTICS PROFILE ON
GO
EXEC dbo.sp_GetAvgSal @Country = 'IN'
GO

It takes ages for the query to return back the results. Finally, when it does give us back the results, here is what we get:

 SQL Server Execution Times:
   CPU time = 10093 ms,  elapsed time = 282344 ms.

We can clearly see that there are lots of Reads, and we know why – there is no Index on the EmpCountry column. Hence, the Clustered Index is being Scanned for getting us back the results. Great! Since we know the source of the problem, let’s create the index to speed things up…

 CREATE NONCLUSTERED INDEX [IX_Employee_Country] ON [dbo].[Employee]
   ([EmpCountry] ASC) INCLUDE ([EmpSal])

We execute the same Stored Procedure with the same parameters again. We get better results now; but not quite to our expectations.

 SQL Server Execution Times:
   CPU time = 202 ms,  elapsed time = 216 ms.

What’s going on? We have a proper Index, still we are not able to achieve the performance we deserve. Let’s have a look at the execution plan. A section of the plan is:

 StmtText                                                                                
----------------------------------------------------------------------------------------
SELECT AVG([EmpSal]) FROM [Employee] WHERE [EmpCountry]=@1                              
  |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1005]=(0) THEN NULL ELSE [Expr1006
       |--Stream Aggregate(DEFINE:([Expr1005]=Count(*), [Expr1006]=SUM([Test].[dbo].[Emp
            |--Index Scan(OBJECT:([Test].[dbo].[Employee].[IX_Employee_Country])

An Index Scan? Why is the Index [IX_Employee_Country] being scanned? Shouldn’t we see an Index Seek? The answer lies in the same execution plan. Let’s have a look at that branch:

 Index Scan(OBJECT:([Test].[dbo].[Employee].[IX_Employee_Country]),
   WHERE:(CONVERT_IMPLICIT(nchar(2),[Test].[dbo].[Employee].[EmpCountry],0)=[@1])) 

The EmpCountry column in the Employee table is CHAR(2) which cannot be directly compared with NCHAR(2); hence, SQL Server is forced to CONVERT the EmpCountry column values to NCHAR(2) before it can perform the comparison. Again, SQL Server does not know what the column values will look like after converting the Non-Unicode character values to Unicode. Hence, it converts each column value to Unicode and then compares the result with the value passed. Hence, the Index Scan.

How can we eliminate the Index Scan? Easy, isn’t it? Rewrite the Stored Procedure so that the @Country parameter data-type matches the data-type of the underlying table. We did this, and do you know what the result was?

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.StmtText                                                                                
----------------------------------------------------------------------------------------
SELECT AVG([EmpSal]) FROM [Employee] WHERE [EmpCountry]=@1                              
  |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1004]=(0) THEN NULL ELSE [Expr1005
       |--Stream Aggregate(DEFINE:([Expr1004]=Count(*), [Expr1005]=SUM([Test].[dbo].[Emp
            |--Index Seek(OBJECT:([Test].[dbo].[Employee].[IX_Employee_Country])

So what did we learn? While doing comparison, you must make sure that the data-types used in the underlying tables match the data-types of the data we are comparing to. This will make sure we eliminate the IMPLICIT CONVERSIONS and thus, we can utilize the Indexes in a much better way.

Disclaimer: All information provided here is my personal opinion and is neither verified nor approved by Microsoft before it is published. All information, and code samples, if any, is provided "AS IS" with no warranties and confers no rights.