Performance Testing the Microsoft OLE DB Provider for DB2 (Part 3)

Article
06/03/2009

In Part 1 of this series I covered testing SQL queries using the Microsoft OLE DB Provider for DB2 using a linked server. In Part 2 I showed how you can test performance with SSIS. However, in many cases, customers may be using C# or VB.NET, writing web services or stand-alone applications to query DB2 for data. So, let’s begin.

First I wanted to again test different settings for the Rowset Cache Size option on the parameter. But, since these queries were going to be smaller, I was wondering if using Connection Pooling would help. The Microsoft OLE DB Provider that comes with Host Integration Server (and the SQL Feature Pack) implements its own connection pooling internally.

When Connection Pooling is enabled, the data provider does not immediately issue a close to the DB2 database when it receives a close from the consumer (application). It waits before closing the connection (this is controlled by the “Connection Timeout” property, which defaults to 30 seconds). When a new Open() is issued, for the SAME connection string, this helps speed up reconnecting as some frames do not have to be repeated to the host system (security for instance). This is most beneficial when you are making many small queries to the host, one behind the other. It may not be as beneficial when queries take many seconds or minutes to complete.

So, to begin with, I decided to limit my query to 5,000 rows. I also wanted to make the DB2 server do a bit of work, so I decided to use the following query (with the fetch first it would return over 2 million rows):

const string DB2QueryString = "select * from schema.table where amount BETWEEN 5.00 AND 10.00 FETCH FIRST 5000 ROWS ONLY";

I created my initial connection string with the Data Access Tool, and then copied this into my C# application, after striping out the “Connection Pooling” and “Rowset Cache Size” options, and I ended up with this:

const string cnStringDB2OLEDB = @"Provider=DB2OLEDB;User ID=USERNAME;Password=password;Initial Catalog=DB2400;Network Transport Library=TCP;Host CCSID=37;PC Code Page=1252;Network Address=DB2400;Network Port=446;Package Collection=SCHEMA;Default Schema=SCHEMA;Process Binary as Character=False;Units of Work=RUW;Default Qualifier=SCHEMA;DBMS Platform=DB2/AS400;Persist Security Info=True;Defer Prepare=False;Derive Parameters=False;";

At this point I was almost ready, but I needed to decide how many times I wanted my query to run, and how many different settings of Rowset I wanted to use. I decided I wanted to the query to run 10 times for each connection, and increment the Rowset by 30 each time, starting at 0 and finishing at 300.

// number of times to run each query

const int iQueryIteration = 10;

// rowset cache size increments from 0 to 300 using this value

const int iCacheInc = 30;

From various other tests, I know that simulating working with the incoming data makes a difference, so I setup a variable that can be adjusted as necessary, which would cause the data to actually be read into a variable later. For my purposes, I left it at true.

// set bReadValues to true to simulate working with the data coming in

bool bReadValues = true;

Now, at this point, everything is setup to begin. The main procedure controls building the connection string, and then calling another procedure to make the calls.

The 1^st thing that the Main procedure does is calls a procedure to create a database in SQL Server. Then it builds the connection string as follows:

// multiple tests at this point

// test with pooling off, then on

// each test to be done with different settings of rowset

// to gauge performance

int iParamTest = 1;

while (iParamTest < 3)

{

string poolString = "";

switch (iParamTest)

{

case 1:

poolString = "Connection Pooling=False;";

break;

case 2:

poolString = "Connection Pooling=True;";

break;

}

iRowSet = 0;

while (iRowSet < 301)

{

myconstr = "";

rowString = "Rowset Cache Size=" + iRowSet.ToString() + ";";

// build final connection string here

myconstr = cnStringDB2OLEDB + poolString + rowString;

After the connection string is built, it runs the query, 10 times as follows:

for (int iquery = 1; iquery < iQueryIteration + 1; iquery++)

{

testno++;

// start timer watch

watch.Start();

rows = RunDbConnectionTest(myconstr, DB2QueryString);

watch.Stop();

Console.WriteLine("Test # {0}, ParamString: {1}, Rowset: {2}, Elapsed Time {3}", testno, poolString, iRowSet, (int)watch.Elapsed.TotalMilliseconds);

UpdateSQLPerfTable(testno, poolString, iRowSet, /*(int)*/watch.Elapsed.TotalMilliseconds);

watch.Reset();

if (rows == -1) // error happened, exit

{

Environment.Exit(rows);

}

iRowSet = iRowSet + iCacheInc;

The UpdateSQLPerfTable() call logs the times into SQL Server for each query. Then the watch is reset. If an error was to happen (rows returns = -1), the program will exit. The loops will then repeat until all tests are run.

The RunDbConnectionTest() procedure builds the OleDbConnection object, creates a command object, and a data reader, and the command is executed. At this point the data is read in (by assigning each column to a string value). After all columns and rows are read in, the objects created are closed and disposed of accordingly. Any errors are caught in Try/Catch statements, which will return a -1 to the main procedure.

Once the data is in SQL Server, you can query against the table to see the differences at each setting. I ran a pivot query against the table, with the following results (times in seconds):

rowset	Connection Pooling=False;	Connection Pooling=True;
0	2.05654372	1.28185158
30	1.41009238	0.87079308
60	1.44409547	0.87172373
90	1.46562513	1.10562903
120	1.3914555	1.05005861
150	1.38745682	0.88520811
180	1.44412404	0.84566784
210	1.41111907	0.87019778
240	1.41200047	0.83981306
270	1.42906141	0.82953586
300	1.44249608	1.1492816

So, just looking at the averages, you can see there is a difference when using connection pooling. But looking at the raw numbers gives a better idea. My first query was to look at rowset 0:

TestNumber	Pooling	rowset	Seconds
1	Connection Pooling=False;	0	2.65
2	Connection Pooling=False;	0	2.44
3	Connection Pooling=False;	0	1.78
4	Connection Pooling=False;	0	2.50
5	Connection Pooling=False;	0	1.94
6	Connection Pooling=False;	0	1.91
7	Connection Pooling=False;	0	1.83
8	Connection Pooling=False;	0	1.83
9	Connection Pooling=False;	0	1.84
10	Connection Pooling=False;	0	1.84
111	Connection Pooling=True;	0	1.89
112	Connection Pooling=True;	0	1.31
113	Connection Pooling=True;	0	1.21
114	Connection Pooling=True;	0	1.20
115	Connection Pooling=True;	0	1.19
116	Connection Pooling=True;	0	1.20
117	Connection Pooling=True;	0	1.20
118	Connection Pooling=True;	0	1.20
119	Connection Pooling=True;	0	1.20
120	Connection Pooling=True;	0	1.21

Now, looking at the above, you can see that times without connection pooling were higher than when using connection pooling.

Next, I wanted to see if setting the Rowset Cache Size made any difference with and without connection pooling. Using a query to get the minimum of the duration across all the results, I was able to determine that in my test it was test #208, with a rowset of 270. I then issued a query to pull out the tests results using “rowset = 270”:

SELECT TestNumber, ParamString AS Pooling, rowset, CAST(durationMs / 1000 AS decimal(6, 2)) AS Seconds

FROM CSPerfTrials

WHERE (rowset = 270)

ORDER BY TestNumber, Pooling

This query gives me the following results:

TestNumber	Pooling	rowset	Seconds
91	Connection Pooling=False;	270	1.39
92	Connection Pooling=False;	270	1.41
93	Connection Pooling=False;	270	1.41
94	Connection Pooling=False;	270	1.40
95	Connection Pooling=False;	270	1.40
96	Connection Pooling=False;	270	1.41
97	Connection Pooling=False;	270	1.46
98	Connection Pooling=False;	270	1.52
99	Connection Pooling=False;	270	1.41
100	Connection Pooling=False;	270	1.48
201	Connection Pooling=True;	270	1.30
202	Connection Pooling=True;	270	0.87
203	Connection Pooling=True;	270	0.78
204	Connection Pooling=True;	270	0.77
205	Connection Pooling=True;	270	0.76
206	Connection Pooling=True;	270	0.77
207	Connection Pooling=True;	270	0.76
208	Connection Pooling=True;	270	0.75
209	Connection Pooling=True;	270	0.76
210	Connection Pooling=True;	270	0.78

In the above, with connection pooling turned off, the times for each query were between 1.39 and 1.52 seconds. However, with connection pooling, the initial query (TestNumber 201) took 1.30 seconds, and subsequent queries were significantly faster, and in most cases below .8 seconds. The savings in time (about 0.4 to 0.5 seconds) would be a benefit as more and more queries were being ran, and with this particular query, you can run 2 queries with connection pooling (after the initial connection) almost as fast as 1 query without connection pooling.

I ran another pivot query which summed all the times over 30 iterations of the query, and came back with this chart (in seconds), which shows the times savings a bit better:

rowset	Connection Pooling=False;	Connection Pooling=True;
0	55.7799701	36.5997885
30	42.7464447	23.7765142
60	42.1101308	24.9684609
90	42.5139897	23.7936823
120	41.9289347	24.7713371
150	42.5943495	24.1104076
180	42.2610654	23.8675952
210	42.2377875	24.3038118
240	43.31585	24.105081
270	42.7937661	25.1228602
300	42.2332467	23.6615868

And again, with 100 iterations of the query at each setting, using this SQL pivot query:

USE [DB2TestDatabases]

declare @comstr nvarchar(4000) -- command string

declare @success int -- variable for executing SQL commands

declare @params nvarchar(255) --

declare @tempstr nvarchar(1500)

declare param_cursor CURSOR FOR

select distinct ParamString

from [dbo].CSPerfTrials

set @tempstr = ''

OPEN param_cursor

FETCH NEXT FROM param_cursor into @params

WHILE @@FETCH_STATUS = 0

BEGIN

set @tempstr = @tempstr + '[' + @params + ']'

FETCH NEXT FROM param_cursor into @params

if @@FETCH_STATUS = 0

begin

set @tempstr = @tempstr + ', '

end

END

CLOSE param_cursor

DEALLOCATE param_cursor

set @comstr = 'select rowset, ' + @tempstr + ' from (select rowset, ParamString, durationms/1000 as durationms '

set @comstr = @comstr + 'from dbo.CSPerfTrials) AS rf PIVOT ( sum(durationms) FOR ParamString IN ('

set @comstr = @comstr + @tempstr + ')) AS p'

exec @success=sp_executesql @comstr

rowset	Connection Pooling=True;	Connection Pooling=False;
0	121.6649042	190.0007958
150	78.565167	144.2463663
210	78.1296505	141.0669099
30	77.8703857	142.5785544
270	78.3527013	142.036522
90	78.0573694	141.4086889
120	77.8508053	145.7663853
300	78.8497313	141.7899574
240	77.4381023	146.7418084
60	78.2401951	141.6710446
180	78.3692842	142.6525498

Executing 100 queries with connection pooling takes almost half the time it takes without, when used in conjunction with RowSet Cache Size, with this particular query.

For my final test, I wanted to see what impact this would have on a large table, where the results were not limited. I used this query (select count(*) from schema.table where amount BETWEEN 100.00 AND 110.00), takes 51 seconds from SQL to return the count of 1296 rows. So a good test to pull in those 1296 rows and read the data, 100 times for each connection string (100 * 11 * 2) for 2,200 reads against the table. I thought this would more accurately demonstrate trying to pull in a small amount of data across a large table.

After the tests were run, I ran a query to sum up the times for each setting, with these results (times are in seconds):

rowset	Connection Pooling=True;	Connection Pooling=False;
0	3721.2146623	4052.1730781
150	3720.9639454	3976.7106773
210	3712.2296273	4068.358878
30	3724.0935948	4026.1043348
270	3675.8706398	4084.5275822
90	3718.429541	3970.0594041
120	3689.3583401	4035.6269915
300	3785.033983	4010.1079362
240	3703.0019921	4007.4356928
60	3689.1961373	3973.2904402
180	3685.1553973	3944.8446177

So, I was thinking wow. Some savings, 200 – 300 seconds total. But, before you get excited, remember this was only across 100 queries. So, in reality, the savings is only 2 to 3 seconds for each query. Changing the query to show the average confirms this:

rowset	Connection Pooling=True;	Connection Pooling=False;
0	37.212146623	40.521730781
150	37.209639454	39.767106773
210	37.122296273	40.68358878
30	37.240935948	40.261043348
270	36.758706398	40.845275822
90	37.18429541	39.700594041
120	36.893583401	40.356269915
300	37.85033983	40.101079362
240	37.030019921	40.074356928
60	36.891961373	39.732904402
180	36.851553973	39.448446177

So, the savings, on average, is only 2 – 3 seconds for each query when accessing a large table, in my particular case.

As the above charts show, using Rowset by itself helps, but when you combine that with pooling, on small queries, a significant time savings can be achieved. Since the numbers are somewhat close once Rowset is implemented here, you should see significant improvement with a setting of at least 30 for the RowSet Cache Size. Also, when querying against a large table, implementing pooling can save a few seconds, but you may not see as much of a performance impact as you would against ‘smaller’ queries where data can be retrieved much faster.

One thing to note here – all the times above were generated on a system that had little overhead against a DB2 server with minimal resources being used. Your results will vary, depending on hardware, network speed and utilization, loads on DB2, loads on the servers running the tests. But, this test and the previous 2 should give an indication of which settings may be most beneficial in your environment.

The test hardware being used in all 3 tests were identical. Windows 2003 SP2 + SQL Server 2005 (with all service packs and fixes applied for both), 2GB of memory on a 100MB network. The queries were run against DB2/400 V5R4, which was a couple of hops away from the test servers.

I hope this series of articles will help in your performance testing. If you have any questions, please do not hesitate to ask.

The code for this particular test is below.

using System;

using System.Collections;

using System.Text;

using System.Data;

using System.Data.OleDb;

using System.Data.Common;

using System.Data.SqlClient;

using System.Diagnostics;

namespace DB2PerfTest

{

class Program

{

// set bReadValues to true to simulate working with the data coming in

static bool bReadValues = true;

// connection string is missing rowset cache size and connection pooling options

// number of times to run each query

const int iQueryIteration = 10;

// rowset cache size increments from 0 to 300 using this value

const int iCacheInc = 30;

// below query returns 3,581,029 rows without the fetch first statement

// limiting to 5000 as it is not realistic that many C# applications will be handling

// millions of rows at one time

const string DB2QueryString = "select * from schema.table where amount BETWEEN 5.00 AND 10.00 FETCH FIRST 5000 ROWS ONLY";

static void Main(string[] args)

{

int rows = 0;

string rowString = "";

string myconstr = "";

int iRowSet = 0;

CreateSQLPerfTable();

Stopwatch watch = new Stopwatch();

int testno = 0;

// multiple tests at this point

// test with pooling off, then on

// each test to be done with different settings of rowset

// to gauge performance

int iParamTest = 1;

while (iParamTest < 3)

{

string poolString = "";

switch (iParamTest)

{

case 1:

poolString = "Connection Pooling=False;";

break;

case 2:

poolString = "Connection Pooling=True;";

break;

}

iRowSet = 0;

while (iRowSet < 301)

{

myconstr = "";

rowString = "Rowset Cache Size=" + iRowSet.ToString() + ";";

// build final connection string here

myconstr = cnStringDB2OLEDB + poolString + rowString;

for (int iquery = 1; iquery < iQueryIteration + 1; iquery++)

{

testno++;

// start timer watch

watch.Start();

rows = RunDbConnectionTest(myconstr, DB2QueryString);

watch.Stop();

Console.WriteLine("Test # {0}, ParamString: {1}, Rowset: {2}, Elapsed Time {3}", testno, poolString, iRowSet, (int)watch.Elapsed.TotalMilliseconds);

UpdateSQLPerfTable(testno, poolString, iRowSet, /*(int)*/watch.Elapsed.TotalMilliseconds);

watch.Reset();

if (rows == -1) // error happened, exit

{

Environment.Exit(rows);

}

iRowSet = iRowSet + iCacheInc;

}

iParamTest++;

} // iparamtest loop

} // end main

static int RunDbConnectionTest(string cnString, string cmString)

{

int result = 0;

try // outer try

{

OleDbConnection cn = new OleDbConnection(cnString);

cn.Open();

try

{ // inner try

OleDbCommand cm = cn.CreateCommand();

cm.Connection = cn;

cm.CommandText = cmString;

if (bReadValues)

{

OleDbDataReader dr = cm.ExecuteReader();

try

{

while (dr.Read())

{

int cnt = dr.FieldCount;

for (int i = 0; i < cnt; i++)

{

// simulate actually doing something with the data so it is 'read'

string msg = dr.GetValue(i).ToString();

}

finally

{

dr.Close();

cm.Dispose();

}

} // bReadValues

else // not reading data, just execute

{

try

{

result = cm.ExecuteNonQuery();

}

catch (Exception ex)

{

System.Console.WriteLine("");

System.Console.WriteLine(ex.Message);

result = -1;

}

finally

{

cm.Dispose();

}

} // !bReadValues

} // inner try

finally

{

cn.Close();

cn.Dispose();

}

} // outer try

catch (Exception e)

{

System.Console.WriteLine("");

System.Console.WriteLine(e.Message);

result = -1;

}

finally

{

}

return result;

}

static void CreateSQLPerfTable()

{

SqlConnection mySqlConnection = new SqlConnection("Persist Security Info=True;Integrated Security=SSPI;database=DB2TestDatabases;server=(local);pooling=false");

SqlCommand mySqlCommand = mySqlConnection.CreateCommand();

mySqlConnection.Open();

// drop perf table

Console.WriteLine("Dropping CSPerfTrials table");

mySqlCommand.CommandText =

"IF EXISTS (SELECT * FROM sys.objects " +

"WHERE object_id = OBJECT_ID(N'[dbo].[CSPerfTrials]') " +

"AND type in (N'U')) " +

"DROP TABLE [dbo].[CSPerfTrials]";

int result = mySqlCommand.ExecuteNonQuery();

Console.WriteLine("mySqlCommand.ExecuteNonQuery() = " + result);

// create new table

mySqlCommand.CommandText =

"CREATE TABLE dbo.CSPerfTrials" +

"( TestNumber int NULL" +

", ParamString Nvarchar(2000) NULL" +

", rowset int NULL" +

", durationMs float NULL" +

")";

Console.WriteLine("Creating dbo.CSPerfTrials table");

result = mySqlCommand.ExecuteNonQuery();

Console.WriteLine("mySqlCommand.ExecuteNonQuery() = " + result);

mySqlConnection.Close();

}

static void UpdateSQLPerfTable(int testno, string ParamString, int rowset, double durationMS)

{

// added pooling=false to the sql connection string to resolve an error condition

// may not be needed when in all cases but on 'fast' reads this appears to be needed

SqlConnection mySqlConnection = new SqlConnection("Persist Security Info=True;Integrated Security=SSPI;database=DB2TestDatabases;server=(local);pooling=false");

SqlCommand mySqlCommand = mySqlConnection.CreateCommand();

mySqlConnection.Open();

Console.WriteLine("Connection state is " +

mySqlConnection.State.ToString());

Console.WriteLine("inserting data into SQL table dbo.CSPerfTrials");

mySqlCommand.CommandText = @"insert into dbo.CSPerfTrials " +

"(TestNumber" +

",ParamString" +

",rowset" +

",durationMS"

+ ") values (" +

testno.ToString() + ",'" +

ParamString + "'," +

rowset.ToString() + "," +

durationMS.ToString() + ")";

int result = mySqlCommand.ExecuteNonQuery();

Console.WriteLine("mySqlCommand.ExecuteNonQuery() = " + result);

mySqlConnection.Close();

Console.WriteLine("Connection state is " +

mySqlConnection.State.ToString());

}

The SQL Queries I ran to give me the pivot charts is below. Modifying the “avg(durationms” to “sum(durationms” will give you the total times for each test run:

-- SSISPerf Pivot report

-- ***********************

-- Builds a Pivot report

USE [DB2TestDatabases]

declare @comstr nvarchar(4000) -- command string

declare @success int -- variable for executing SQL commands

declare @params nvarchar(255) --

declare @tempstr nvarchar(1500)

declare param_cursor CURSOR FOR

select distinct ParamString

from [dbo].CSPerfTrials

set @tempstr = ''

OPEN param_cursor

FETCH NEXT FROM param_cursor into @params

WHILE @@FETCH_STATUS = 0

BEGIN

set @tempstr = @tempstr + '[' + @params + ']'

FETCH NEXT FROM param_cursor into @params

if @@FETCH_STATUS = 0

begin

set @tempstr = @tempstr + ', '

end

END

CLOSE param_cursor

DEALLOCATE param_cursor

set @comstr = 'select rowset, ' + @tempstr + ' from (select rowset, ParamString, durationms/1000 as durationms '

set @comstr = @comstr + 'from dbo.CSPerfTrials) AS rf PIVOT ( avg(durationms) FOR ParamString IN ('

set @comstr = @comstr + @tempstr + ')) AS p'

exec @success=sp_executesql @comstr

Performance Testing the Microsoft OLE DB Provider for DB2 (Part 3)

Additional resources