Performance Testing the Microsoft OLE DB Provider for DB2 (Part 3)

In Part 1 of this series I covered testing SQL queries using the Microsoft OLE DB Provider for DB2 using a linked server. In Part 2 I showed how you can test performance with SSIS.  However, in many cases, customers may be using C# or VB.NET, writing web services or stand-alone applications to query DB2 for data. So, let’s begin.

First I wanted to again test different settings for the Rowset Cache Size option on the parameter. But, since these queries were going to be smaller, I was wondering if using Connection Pooling would help. The Microsoft OLE DB Provider that comes with Host Integration Server (and the SQL Feature Pack) implements its own connection pooling internally. 

When Connection Pooling is enabled, the data provider does not immediately issue a close to the DB2 database when it receives a close from the consumer (application). It waits before closing the connection (this is controlled by the “Connection Timeout” property, which defaults to 30 seconds). When a new Open() is issued, for the SAME connection string, this helps speed up reconnecting as some frames do not have to be repeated to the host system (security for instance). This is most beneficial when you are making many small queries to the host, one behind the other. It may not be as beneficial when queries take many seconds or minutes to complete.

So, to begin with, I decided to limit my query to 5,000 rows. I also wanted to make the DB2 server do a bit of work, so I decided to use the following query (with the fetch first it would return over 2 million rows):

const string DB2QueryString = "select * from schema.table where amount BETWEEN 5.00 AND 10.00 FETCH FIRST 5000 ROWS ONLY";

I created my initial connection string with the Data Access Tool, and then copied this into my C# application, after striping out the “Connection Pooling” and “Rowset Cache Size” options, and I ended up with this:

const string cnStringDB2OLEDB = @"Provider=DB2OLEDB;User ID=USERNAME;Password=password;Initial Catalog=DB2400;Network Transport Library=TCP;Host CCSID=37;PC Code Page=1252;Network Address=DB2400;Network Port=446;Package Collection=SCHEMA;Default Schema=SCHEMA;Process Binary as Character=False;Units of Work=RUW;Default Qualifier=SCHEMA;DBMS Platform=DB2/AS400;Persist Security Info=True;Defer Prepare=False;Derive Parameters=False;";

At this point I was almost ready, but I needed to decide how many times I wanted my query to run, and how many different settings of Rowset I wanted to use. I decided I wanted to the query to run 10 times for each connection, and increment the Rowset by 30 each time, starting at 0 and finishing at 300.

// number of times to run each query

const int iQueryIteration = 10;

// rowset cache size increments from 0 to 300 using this value

const int iCacheInc = 30;

From various other tests, I know that simulating working with the incoming data makes a difference, so I setup a variable that can be adjusted as necessary, which would cause the data to actually be read into a variable later. For my purposes, I left it at true.

// set bReadValues to true to simulate working with the data coming in

bool bReadValues = true;

Now, at this point, everything is setup to begin. The main procedure controls building the connection string, and then calling another procedure to make the calls.

The 1st thing that the Main procedure does is calls a procedure to create a database in SQL Server. Then it builds the connection string as follows:

// multiple tests at this point

// test with pooling off, then on

// each test to be done with different settings of rowset

// to gauge performance

int iParamTest = 1;

while (iParamTest < 3)

{

       string poolString = "";

       switch (iParamTest)

       {

              case 1:

                     poolString = "Connection Pooling=False;";

                     break;

              case 2:

                     poolString = "Connection Pooling=True;";

                     break;

       }

       iRowSet = 0;

       while (iRowSet < 301)

       {

              myconstr = "";

              rowString = "Rowset Cache Size=" + iRowSet.ToString() + ";";

              // build final connection string here

              myconstr = cnStringDB2OLEDB + poolString + rowString;

After the connection string is built, it runs the query, 10 times as follows:

              for (int iquery = 1; iquery < iQueryIteration + 1; iquery++)

              {

                     testno++;

                     // start timer watch

                     watch.Start();

                     rows = RunDbConnectionTest(myconstr, DB2QueryString);

                     watch.Stop();

                     Console.WriteLine("Test # {0}, ParamString: {1}, Rowset: {2}, Elapsed Time {3}", testno, poolString, iRowSet, (int)watch.Elapsed.TotalMilliseconds);

                     UpdateSQLPerfTable(testno, poolString, iRowSet, /*(int)*/watch.Elapsed.TotalMilliseconds);

                     watch.Reset();

                     if (rows == -1) // error happened, exit

                     {

                           Environment.Exit(rows);

                     }

              }

              iRowSet = iRowSet + iCacheInc;

The UpdateSQLPerfTable() call logs the times into SQL Server for each query. Then the watch is reset.  If an error was to happen (rows returns = -1), the program will exit. The loops will then repeat until all tests are run.

The RunDbConnectionTest() procedure builds the OleDbConnection object, creates a command object, and a data reader, and the command is executed. At this point the data is read in (by assigning each column to a string value). After all columns and rows are read in, the objects created are closed and disposed of accordingly. Any errors are caught in Try/Catch statements, which will return a -1 to the main procedure.

Once the data is in SQL Server, you can query against the table to see the differences at each setting. I ran a pivot query against the table, with the following results (times in seconds):

rowset

Connection Pooling=False;

Connection Pooling=True;

0

2.05654372

1.28185158

30

1.41009238

0.87079308

60

1.44409547

0.87172373

90

1.46562513

1.10562903

120

1.3914555

1.05005861

150

1.38745682

0.88520811

180

1.44412404

0.84566784

210

1.41111907

0.87019778

240

1.41200047

0.83981306

270

1.42906141

0.82953586

300

1.44249608

1.1492816

 So, just looking at the averages, you can see there is a difference when using connection pooling. But looking at the raw numbers gives a better idea. My first query was to look at rowset 0:

TestNumber

Pooling

rowset

Seconds

1

Connection Pooling=False;

0

2.65

2

Connection Pooling=False;

0

2.44

3

Connection Pooling=False;

0

1.78

4

Connection Pooling=False;

0

2.50

5

Connection Pooling=False;

0

1.94

6

Connection Pooling=False;

0

1.91

7

Connection Pooling=False;

0

1.83

8

Connection Pooling=False;

0

1.83

9

Connection Pooling=False;

0

1.84

10

Connection Pooling=False;

0

1.84

111

Connection Pooling=True;

0

1.89

112

Connection Pooling=True;

0

1.31

113

Connection Pooling=True;

0

1.21

114

Connection Pooling=True;

0

1.20

115

Connection Pooling=True;

0

1.19

116

Connection Pooling=True;

0

1.20

117

Connection Pooling=True;

0

1.20

118

Connection Pooling=True;

0

1.20

119

Connection Pooling=True;

0

1.20

120

Connection Pooling=True;

0

1.21

Now, looking at the above, you can see that times without connection pooling were higher than when using connection pooling.

Next, I wanted to see if setting the Rowset Cache Size made any difference with and without connection pooling. Using a query to get the minimum of the duration across all the results, I was able to determine that in my test it was test #208, with a rowset of 270. I then issued a query to pull out the tests results using “rowset = 270”:

SELECT     TestNumber, ParamString AS Pooling, rowset, CAST(durationMs / 1000 AS decimal(6, 2)) AS Seconds

FROM         CSPerfTrials

WHERE     (rowset = 270)

ORDER BY TestNumber, Pooling

This query gives me the following results:

TestNumber

Pooling

rowset

Seconds

91

Connection Pooling=False;

270

1.39

92

Connection Pooling=False;

270

1.41

93

Connection Pooling=False;

270

1.41

94

Connection Pooling=False;

270

1.40

95

Connection Pooling=False;

270

1.40

96

Connection Pooling=False;

270

1.41

97

Connection Pooling=False;

270

1.46

98

Connection Pooling=False;

270

1.52

99

Connection Pooling=False;

270

1.41

100

Connection Pooling=False;

270

1.48

201

Connection Pooling=True;

270

1.30

202

Connection Pooling=True;

270

0.87

203

Connection Pooling=True;

270

0.78

204

Connection Pooling=True;

270

0.77

205

Connection Pooling=True;

270

0.76

206

Connection Pooling=True;

270

0.77

207

Connection Pooling=True;

270

0.76

208

Connection Pooling=True;

270

0.75

209

Connection Pooling=True;

270

0.76

210

Connection Pooling=True;

270

0.78

In the above, with connection pooling turned off, the times for each query were between 1.39 and 1.52 seconds. However, with connection pooling, the initial query (TestNumber 201) took 1.30 seconds, and subsequent queries were significantly faster, and in most cases below .8 seconds. The savings in time (about 0.4 to 0.5 seconds) would be a benefit as more and more queries were being ran, and with this particular query, you can run 2 queries with connection pooling (after the initial connection) almost as fast as 1 query without connection pooling.

I ran another pivot query which summed all the times over 30 iterations of the query, and came back with this chart (in seconds), which shows the times savings a bit better:

rowset

Connection Pooling=False;

Connection Pooling=True;

0

55.7799701

36.5997885

30

42.7464447

23.7765142

60

42.1101308

24.9684609

90

42.5139897

23.7936823

120

41.9289347

24.7713371

150

42.5943495

24.1104076

180

42.2610654

23.8675952

210

42.2377875

24.3038118

240

43.31585

24.105081

270

42.7937661

25.1228602

300

42.2332467

23.6615868

And again, with 100 iterations of the query at each setting, using this SQL pivot query:

USE [DB2TestDatabases]

go

declare @comstr nvarchar(4000) -- command string

declare @success int -- variable for executing SQL commands

declare @params nvarchar(255) --

declare @tempstr nvarchar(1500)

 

declare param_cursor CURSOR FOR

       select distinct ParamString

       from [dbo].CSPerfTrials

 

set @tempstr = ''

OPEN param_cursor

FETCH NEXT FROM param_cursor into @params

WHILE @@FETCH_STATUS = 0

BEGIN

       set @tempstr = @tempstr + '[' + @params + ']'

FETCH NEXT FROM param_cursor into @params

       if @@FETCH_STATUS = 0

       begin

              set @tempstr = @tempstr + ', '

       end

END

CLOSE param_cursor

DEALLOCATE param_cursor

 

set @comstr = 'select rowset, ' + @tempstr + ' from (select rowset, ParamString, durationms/1000 as durationms '

set @comstr = @comstr + 'from dbo.CSPerfTrials) AS rf PIVOT ( sum(durationms) FOR ParamString IN ('

set @comstr = @comstr + @tempstr + ')) AS p'

 

exec @success=sp_executesql @comstr

 

rowset

Connection Pooling=True;

Connection Pooling=False;

0

121.6649042

190.0007958

150

78.565167

144.2463663

210

78.1296505

141.0669099

30

77.8703857

142.5785544

270

78.3527013

142.036522

90

78.0573694

141.4086889

120

77.8508053

145.7663853

300

78.8497313

141.7899574

240

77.4381023

146.7418084

60

78.2401951

141.6710446

180

78.3692842

142.6525498

Executing 100 queries with connection pooling takes almost half the time it takes without, when used in conjunction with RowSet Cache Size, with this particular query.

For my final test, I wanted to see what impact this would have on a large table, where the results were not limited. I used this query (select count(*) from schema.table where amount BETWEEN 100.00 AND 110.00), takes 51 seconds from SQL to return the count of 1296 rows. So a good test to pull in those 1296 rows and read the data, 100 times for each connection string (100 * 11 * 2) for 2,200 reads against the table. I thought this would more accurately demonstrate trying to pull in a small amount of data across a large table.

After the tests were run, I ran a query to sum up the times for each setting, with these results (times are in seconds):

rowset

Connection Pooling=True;

Connection Pooling=False;

0

3721.2146623

4052.1730781

150

3720.9639454

3976.7106773

210

3712.2296273

4068.358878

30

3724.0935948

4026.1043348

270

3675.8706398

4084.5275822

90

3718.429541

3970.0594041

120

3689.3583401

4035.6269915

300

3785.033983

4010.1079362

240

3703.0019921

4007.4356928

60

3689.1961373

3973.2904402

180

3685.1553973

3944.8446177

So, I was thinking wow. Some savings, 200 – 300 seconds total. But, before you get excited, remember this was only across 100 queries. So, in reality, the savings is only 2 to 3 seconds for each query. Changing the query to show the average confirms this:

rowset

Connection Pooling=True;

Connection Pooling=False;

0

37.212146623

40.521730781

150

37.209639454

39.767106773

210

37.122296273

40.68358878

30

37.240935948

40.261043348

270

36.758706398

40.845275822

90

37.18429541

39.700594041

120

36.893583401

40.356269915

300

37.85033983

40.101079362

240

37.030019921

40.074356928

60

36.891961373

39.732904402

180

36.851553973

39.448446177

So, the savings, on average, is only 2 – 3 seconds for each query when accessing a large table, in my particular case.

As the above charts show, using Rowset by itself helps, but when you combine that with pooling, on small queries, a significant time savings can be achieved. Since the numbers are somewhat close once Rowset is implemented here, you should see significant improvement with a setting of at least 30 for the RowSet Cache Size. Also, when querying against a large table, implementing pooling can save a few seconds, but you may not see as much of a performance impact as you would against ‘smaller’ queries where data can be retrieved much faster.

One thing to note here – all the times above were generated on a system that had little overhead against a DB2 server with minimal resources being used. Your results will vary, depending on hardware, network speed and utilization, loads on DB2, loads on the servers running the tests. But, this test and the previous 2 should give an indication of which settings may be most beneficial in your environment.

The test hardware being used in all 3 tests were identical. Windows 2003 SP2 + SQL Server 2005 (with all service packs and fixes applied for both), 2GB of memory on a 100MB network. The queries were run against DB2/400 V5R4, which was a couple of hops away from the test servers.

I hope this series of articles will help in your performance testing. If you have any questions, please do not hesitate to ask.

The code for this particular test is below.

using System;

using System.Collections;

using System.Text;

using System.Data;

using System.Data.OleDb;

using System.Data.Common;

using System.Data.SqlClient;

using System.Diagnostics;

 

namespace DB2PerfTest

{

       class Program

       {

              // set bReadValues to true to simulate working with the data coming in

              static bool bReadValues = true;

              // connection string is missing rowset cache size and connection pooling options

              const string cnStringDB2OLEDB = @"Provider=DB2OLEDB;User ID=USERNAME;Password=password;Initial Catalog=DB2400;Network Transport Library=TCP;Host CCSID=37;PC Code Page=1252;Network Address=DB2400;Network Port=446;Package Collection=SCHEMA;Default Schema=SCHEMA;Process Binary as Character=False;Units of Work=RUW;Default Qualifier=SCHEMA;DBMS Platform=DB2/AS400;Persist Security Info=True;Defer Prepare=False;Derive Parameters=False;";

              // number of times to run each query

              const int iQueryIteration = 10;

              // rowset cache size increments from 0 to 300 using this value

              const int iCacheInc = 30;

              // below query returns 3,581,029 rows without the fetch first statement

              // limiting to 5000 as it is not realistic that many C# applications will be handling

              // millions of rows at one time

              const string DB2QueryString = "select * from schema.table where amount BETWEEN 5.00 AND 10.00 FETCH FIRST 5000 ROWS ONLY";

 

              static void Main(string[] args)

              {

                     int rows = 0;

                     string rowString = "";

                     string myconstr = "";

                     int iRowSet = 0;

                     CreateSQLPerfTable();

                     Stopwatch watch = new Stopwatch();

                     int testno = 0;

 

                     // multiple tests at this point

                     // test with pooling off, then on

                     // each test to be done with different settings of rowset

                     // to gauge performance

                     int iParamTest = 1;

                     while (iParamTest < 3)

                     {

                           string poolString = "";

                           switch (iParamTest)

                           {

                                  case 1:

                                         poolString = "Connection Pooling=False;";

                                         break;

                                  case 2:

                                         poolString = "Connection Pooling=True;";

                                         break;

                           }

                           iRowSet = 0;

                           while (iRowSet < 301)

                           {

                                  myconstr = "";

                                  rowString = "Rowset Cache Size=" + iRowSet.ToString() + ";";

                                  // build final connection string here

                                  myconstr = cnStringDB2OLEDB + poolString + rowString;

                                  for (int iquery = 1; iquery < iQueryIteration + 1; iquery++)

                                  {

                                         testno++;

                                         // start timer watch

                                         watch.Start();

                                         rows = RunDbConnectionTest(myconstr, DB2QueryString);

                                         watch.Stop();

                                         Console.WriteLine("Test # {0}, ParamString: {1}, Rowset: {2}, Elapsed Time {3}", testno, poolString, iRowSet, (int)watch.Elapsed.TotalMilliseconds);

                                         UpdateSQLPerfTable(testno, poolString, iRowSet, /*(int)*/watch.Elapsed.TotalMilliseconds);

                                         watch.Reset();

                                         if (rows == -1) // error happened, exit

                                         {

                                                Environment.Exit(rows);

                                         }

                                  }

                                  iRowSet = iRowSet + iCacheInc;

                           }

                           iParamTest++;

                     } // iparamtest loop

              } // end main

 

              static int RunDbConnectionTest(string cnString, string cmString)

              {

                     int result = 0;

                     try // outer try

                     {

                           OleDbConnection cn = new OleDbConnection(cnString);

                           cn.Open();

                           try

                           { // inner try

                                  OleDbCommand cm = cn.CreateCommand();

                                  cm.Connection = cn;

                                  cm.CommandText = cmString;

                                  if (bReadValues)

                                  {

                                         OleDbDataReader dr = cm.ExecuteReader();

                                         try

                                         {

                                                while (dr.Read())

                                                {

                                  int cnt = dr.FieldCount;

                                                       for (int i = 0; i < cnt; i++)

                                                       {

                                                              // simulate actually doing something with the data so it is 'read'

                                                              string msg = dr.GetValue(i).ToString();

                                                       }

                                                }

                                         }

                                         finally

                                         {

                                         dr.Close();

                                                cm.Dispose();

                                         }

                                  } // bReadValues

                                  else // not reading data, just execute

                                  {

                                         try

                                         {

                                                result = cm.ExecuteNonQuery();

                                         }

                                         catch (Exception ex)

                                         {

                                                System.Console.WriteLine("");

                                  System.Console.WriteLine(ex.Message);

                                                result = -1;

                                         }

                                         finally

                                         {

                                                cm.Dispose();

                                         }

                                  } // !bReadValues

                           } // inner try

                           finally

                           {

                                  cn.Close();

                                  cn.Dispose();

                           }

                     } // outer try

                     catch (Exception e)

                     {

                           System.Console.WriteLine("");

                           System.Console.WriteLine(e.Message);

                           result = -1;

                     }

                     finally

                     {

                     }

                     return result;

              }

 

              static void CreateSQLPerfTable()

              {

                     SqlConnection mySqlConnection = new SqlConnection("Persist Security Info=True;Integrated Security=SSPI;database=DB2TestDatabases;server=(local);pooling=false");

                     SqlCommand mySqlCommand = mySqlConnection.CreateCommand();

                     mySqlConnection.Open();

                     // drop perf table

                     Console.WriteLine("Dropping CSPerfTrials table");

                     mySqlCommand.CommandText =

                           "IF EXISTS (SELECT * FROM sys.objects " +

                           "WHERE object_id = OBJECT_ID(N'[dbo].[CSPerfTrials]') " +

                           "AND type in (N'U')) " +

                           "DROP TABLE [dbo].[CSPerfTrials]";

                     int result = mySqlCommand.ExecuteNonQuery();

                     Console.WriteLine("mySqlCommand.ExecuteNonQuery() = " + result);

 

                     // create new table

                     mySqlCommand.CommandText =

                           "CREATE TABLE dbo.CSPerfTrials" +

                           "( TestNumber int NULL" +

                           ", ParamString Nvarchar(2000) NULL" +

                           ", rowset int NULL" +

                           ", durationMs float NULL" +

                           ")";

                     Console.WriteLine("Creating dbo.CSPerfTrials table");

                     result = mySqlCommand.ExecuteNonQuery();

                     Console.WriteLine("mySqlCommand.ExecuteNonQuery() = " + result);

                     mySqlConnection.Close();

              }

 

              static void UpdateSQLPerfTable(int testno, string ParamString, int rowset, double durationMS)

              {

                     // added pooling=false to the sql connection string to resolve an error condition

                     // may not be needed when in all cases but on 'fast' reads this appears to be needed

                     SqlConnection mySqlConnection = new SqlConnection("Persist Security Info=True;Integrated Security=SSPI;database=DB2TestDatabases;server=(local);pooling=false");

                     SqlCommand mySqlCommand = mySqlConnection.CreateCommand();

                     mySqlConnection.Open();

                     Console.WriteLine("Connection state is " +

                           mySqlConnection.State.ToString());

                     Console.WriteLine("inserting data into SQL table dbo.CSPerfTrials");

                     mySqlCommand.CommandText = @"insert into dbo.CSPerfTrials " +

                           "(TestNumber" +

                           ",ParamString" +

                           ",rowset" +

                           ",durationMS"

                           + ") values (" +

                           testno.ToString() + ",'" +

                           ParamString + "'," +

                           rowset.ToString() + "," +

                           durationMS.ToString() + ")";

                     int result = mySqlCommand.ExecuteNonQuery();

                     Console.WriteLine("mySqlCommand.ExecuteNonQuery() = " + result);

                     mySqlConnection.Close();

                     Console.WriteLine("Connection state is " +

                           mySqlConnection.State.ToString());

              }

       }

}

The SQL Queries I ran to give me the pivot charts is below. Modifying the “avg(durationms” to “sum(durationms” will give you the total times for each test run:

-- SSISPerf Pivot report

-- ***********************

-- Builds a Pivot report

USE [DB2TestDatabases]

go

declare @comstr nvarchar(4000) -- command string

declare @success int -- variable for executing SQL commands

declare @params nvarchar(255) --

declare @tempstr nvarchar(1500)

 

declare param_cursor CURSOR FOR

      select distinct ParamString

      from [dbo].CSPerfTrials

 

set @tempstr = ''

OPEN param_cursor

FETCH NEXT FROM param_cursor into @params

WHILE @@FETCH_STATUS = 0

BEGIN

      set @tempstr = @tempstr + '[' + @params + ']'

FETCH NEXT FROM param_cursor into @params

      if @@FETCH_STATUS = 0

      begin

            set @tempstr = @tempstr + ', '

      end

END

CLOSE param_cursor

DEALLOCATE param_cursor

 

set @comstr = 'select rowset, ' + @tempstr + ' from (select rowset, ParamString, durationms/1000 as durationms '

set @comstr = @comstr + 'from dbo.CSPerfTrials) AS rf PIVOT ( avg(durationms) FOR ParamString IN ('

set @comstr = @comstr + @tempstr + ')) AS p'

 

exec @success=sp_executesql @comstr