Character data type conversion when using SQL Server JDBC drivers

Author: Kun Cheng
Reviewers: Wanda He, Kevin Cox, Michael Thomassy, Peter Scharlock

In a recent customer engagement, I ran into an interesting situation, which I’d like to share, especially with SQL Server users using Microsoft JDBC drivers. It started with typical SQL Server performance tuning exercise to identify the most expensive queries running in SQL Server. The top query looks quite simple:

SELECT contact FROM tbl_Customers WHERE ID = @P0

 

The query took about 3ms to run, which was not bad. However it got executed millions of times and played a major role in limiting the overall throughput. Let’s take a look at the query plan, which doesn’t appear to be that straightforward (note only portion of the execution plan is shown here due to size).

Nested Loops(Inner Join, OUTER REFERENCES:([Expr1004],

  |--Compute Scalar(DEFINE:(([Expr1004],[Expr1005],[Expr1003])=GetRangeThroughConvert([@P0)

  | |--Constant Scan

  |--Clustered Index Seek(OBJECT:([Prod].[dbo].[tbl_Customers]..[ID].. SEEK:( [ID] > [Expr1008] AND [ID] < [Expr1009]), WHERE:(CONVERT_IMPLICIT(nvarchar(255), [ID],0)=[@P0])

We have the proper indexes in place. Statistics are updated with fullscan. This is a singleton lookup. But why does SQL Server compile a plan to loop join instead of direct index seek to get the value? If you look closer at the above plan, there is an implicit conversion as part of the clustered index seek:

CONVERT_IMPLICIT(nvarchar(255), [ID],0)

That explains why there is loop join. In SQL Server, nvarchar() has higher precedence than varchar(). SQL can’t convert nvarchar() to varchar() and then conduct seek since conversion could potentially lose information when nvarchar character doesn’t have matching one in varchar code page. But something is wrong here. ID is defined as varchar() in the table. And I know the application is passing the parameter @P0 as varchar() as well (see below the code snippet). How come it’s become nvarchar at SQL Server?

pStmt.setObject(2,Id,Types.VARCHAR);//Java app code

It turns out that the JDBC driver sends character data including varchar() as nvarchar() by default. The reason is to minimize client side conversion from Java’s native string type, which is Unicode.

So how do you force the JDBC driver not to behave this way? There is a connection string property, named sendStringParametersAsUnicode. By default it’s true. When it’s set false, problem resolved: @P0 is sent as is (varchar()), no more loop join in query plan.

One would ask what if I want to pass both varchar and nvarchar parameters at the same time? Well, even with the connection property set false, you can explicitly specify nvarchar type like this:

pStmt.setObject(2,Id,Types.NVARCHAR); //Java app code

After the connection string property change, the above mentioned query run-time is reduced from 3ms to 1.5ms. Overall workload throughput improved more than 20%!!! What a subtle change makes such a huge difference!