Hopeless performance. The ADO.NET 2.0 is slower than ADO 1.1

1.0 Warm up.

Firstly my mentor shared the following question with me:

Why the image gets reversed in mirror horizontally, but vertically.

I asked many people about this question, but seldom gave me the accurate answer in 3 mins. Some interesting answers here:

1. People’s eye is horizontal not vertical. (Also cited in some “interview bible”)

2. If we change the mirror by 90 degree, then the result changes

3. We live in the Northern Hemisphere

From the tech part, the knowledge involved here is only mirror reflection, far simple than Windows memory management. This case just shows, besides the knowledge itself, problem solving needs the way of clear thinking.

1.1 Hopeless performance. The ADO.NET 2.0 is slower than ADO 1.1!

Problem Description:

Based on the following article, the customer decides to upgrade to CLR2 for performance:

DataSet and DataTable in ADO.NET 2.0

https://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx

But based the customer’s test, the performance get worse with CLR2.

Look at the code, quite simple:

         OracleConnection conn = new OracleConnection();

         conn.ConnectionString = "...";

         conn.Open();

         OracleCommand cmd = new OracleCommand();

         cmd.Connection = conn;

         OracleDataAdapter dap = new SOracleDataAdapter("select * from mytesttable", conn);

         DataTable dt = new DataTable();

         DateTime start = System.DateTime.Now;

         dap.Fill(dt);

         TimeSpan span = DateTime.Now - start;

         conn.Close();

         Console.WriteLine(span.ToString());

         Console.WriteLine("The Columns.Count is" +dt.Columns.Count.ToString());

         Console.WriteLine("The Rows.Count is"+dt.Rows.Count.ToString());

The data used for test is quite simple either. 250000 rows, 4 fields. By checking the result of span.ToString, ADO.NET 2 is 1 time slower than ADO1.1. dap.Fill method takes 6 seconds in CLR2, comparing with 3 seconds in CLR1.

Hopeless

How to handle it?

When I finished the test locally, my feeing is hopeless only. Look at what I could do:

1. The backend datatable is quite simple. 4 fields are int, no index, no primiary key, no foreigen key. In other words, the problem does not relate to data schema, purely client side problem.

2. The code is very simple. There is nothing we could further fine tune.

3. The test machines gets both CLR1 and CLR2 installed. The test and comparing are done in the same machine. The hardware and software environment is the same. The only difference is the CLR version, which is the customer’s concern.

Hopeless, no way to try. I have to confess that “No idea. ADO.NET 2 dev did a bad thing! Sorry”. And also say “BTW, the MSDN paper is based on CLR2 beta2, not accurate.”

Change the Role

The good thing is that I did not give up my trying. After careful research, I found:

1. The sample code in MSDN does do a better than CLR 1.1. However, the schema of the data is a bit complex, while the customers is simple

2. The performance is worse in above sample. However, in other cases, the ADO.NET 2.0 does a better job to gain 10 times better result in the same customer’s project.

Meanwhile, when discussing with my colleague, they do not care about the percentage of how worse it is. They concern about why the customer wants to operate on 250000 rows of data at the same time. After careful discussion, got the following points:

1. The performance is 3 seconds worse with 250000 rows, 8 microseconds per row. From the tech part, db reading operation needs to send request to DB client engine firstly. The DB client engine pushes the request to DB server, and then fill the result back to client application. In the whole chain, any little change is likely to be enlarged, based on butterfly effect. In other words, if we look at percentage, the performance is indeed 100% worse. However, if we look at the delay of seconds, and compare with the data amount, we get further understanding on this problem.

2. The customer’s code reads 250000 rows at the same time. What kind of situation that the dev wants to write code like this? Such kind of code is likely to run in performance sensitive situation? Let’s cacuate, 250000 rows* 4 fileds*20 bytes each=19MB data. If each request involves 3 seconds delay, plus 20MB memory cost, and heavy network operations, the design is bad. In normal condition, such kind of code should be used to initialize some global data, which may be put in cache. And this kind of code should only run once per application’s life. In other words, the performance hit in real situation is really visible?

3. We did not evaluate all the data operations. The customer only tests the performance of fetching data. We do not consider how to use the data later. If ADO.NET 2 sacrifice the performance on loading, to benefit the performance in other places like iteration or deleting, we gain better overall performance.

Based on above analysis, I decide to ask the customer what the real situation it is. Meanwhile, I used the following method to seek the root cause:

Troubleshooting:

First I use reflector to check the implementation on DataAdapter.Fill. (I am not CPR and I cannot see the source code)

Reflector for .NET

https://www.aisto.com/roeder/dotnet/

I found the DataAdapter.Fill method can be divided in the following two parts:

1. Use DataReader to read data

2. Construct DataTable and fill the data in.

It means the DataAdapter.Fill just combine DataReader.Read and DataTable.Insert. So I can create my own app to use the two functions to replace DataAdapter.Fill for test:

static void TestReader()

         {

             OracleConnection conn = new OracleConnection();

             conn.ConnectionString = "…";

             conn.Open();

             OracleCommand cmd = new OracleCommand();

             cmd.Connection = conn;

             cmd.CommandText = " select * from mytesttable ";

             OracleDataReader reader = cmd.ExecuteReader();

             object[] objs=new object[4];

             DateTime start = System.DateTime.Now;

             while (reader.Read())

             {

                  reader.GetValues(objs);

             }

             System.TimeSpan span = System.DateTime.Now - start;

             reader.Close();

             conn.Close();

             Console.WriteLine(span.ToString());

         }

         static void TestDT()

         {

             DataTable dt = new DataTable();

             dt.Columns.Add("col1");

             dt.Columns.Add("col2");

             dt.Columns.Add("col3");

             dt.Columns.Add("col4");

             DateTime start = System.DateTime.Now;

             for (int i = 1; i <= 250000; i++)

             {

                  dt.Rows.Add(new object[] { "abc123", "abc123", "abc123", "abc123" });

             }

             System.TimeSpan span = System.DateTime.Now - start;

             Console.WriteLine(span.ToString());

         }

Based on my test, the performance hit is mainly caused by DataReader.Read. It is a bit hard to use Reflector to go further. Here I change to use the profiler in VS2005. (From Tools -> Performance Tools -> Performance Wizard menu to activate. About how the detailed case study on this tool, we will talk on chapter 4)

This tool shows up how long it costs for each function, and the percentage in overall performance cost. To make the issue more obvious, I added more 100000 rows to test. Followed by the DataReader.Read function, I find the function is divided by two parts, managed and unmanaged, each occupies 35% and 65% cost. With this information, comparing with the CLR1.1 version, I find the issue is mainly related to the managed part. The most obvious is that ADO.NET 2 involves SafeHandle.DangerousAddRef/DangerousReleas call, which is new in CLR2. Each pair occupies 7% cost. In data reading of each row, we need 3 pairs of the SafeHhandle call. With careful comparing, this is the root cause.

Database reading is done by unmanaged data engine eventually, thus the managed code needs to pass buffer to unmanaged code. The SafeHandle is used to manage this kind of buffer. In CLR1.1, due to the lack of SafeHandle, heaving loading application is in the danger of corruption, and there is no very good solution. (You can use reflector to check the File operation related classes). In CLR2, we can use SafeHandle to simplify the code and help the stability. However, the cost is the 3 seconds for 250000 rows. (The dev confirmed that we can even save the 3 seconds in further CLR).

Conclusion and Lesson:

The customer eventually understood and agreed with this design:

1. The MSDN is correct. With the sample in MSDN, the performance is better.

2. The 3 seconds in the customer’s environment is not visible.

3. It is a good deal for the better stability with 3 seconds sacrifice.

At last, we did not get the 3 seconds back, but we learned more:

1. The way to think is more important than others. Do not get lost in the problems and behaviors. Think about what really cares.

2. Put the problem in real situation, and think it in different roles. Think about the background.

3. Use correct tool like reflector and profiler

4. Make the problem more obviously for debugging. Leaking 1GB memory is much more easily to debug than leaking 1k memory.

5. Tradeoff is necessary sometime.

Further discussion:

About Safehandle:

SafeHandle: A Reliability Case Study [Brian Grunkemeyer]

https://blogs.msdn.com/bclteam/archive/2005/03/16/396900.aspx

About Balance, Tradeoff and Win-Win, RFC 1925

RFC 1925 (RFC1925)

https://www.faqs.org/rfcs/rfc1925.html

For tradeoff, another real case is about /3GB. For /3GB, look at:

Large memory support is available in Windows Server 2003 and in Windows 2000

https://support.microsoft.com/kb/283037/en-us

The case is, the customer gets a large ASP.NET site. When the loading is high, Invalid Operation exception occurs frequently. Based on the debugging of network engineer and db engineer, the problem is related to /3GB. /3GB increases the usermode space, but the benefit is based on the reduce of kernel mode pool. Since heavy network IO needs lots of kernel pool, the problem occurs.

But we cannot ask the customer to remove the /3GB. This switch is added based on another case half year ago. There are so many ASP.NET pages and 2GB user mode space is not enough. So to solve the problem eventually, the best solution is 64 bits. Anyway, with good luck, we found a good solution in 32 bits with the following article to balance.

How to use the /userva switch with the /3GB switch to tune the User-mode space to a value between 2 GB and 3 GB

https://support.microsoft.com/kb/316739/en-us

Resource for Profiler:

CLR Profiler for the .NET Framework 2.0

https://www.microsoft.com/downloads/details.aspx?FamilyID=a362781c-3870-43be-8926-862b40aa0cd0&DisplayLang=en

No Code Can Hide from the Profiling API in the .NET Framework 2.0

https://msdn.microsoft.com/msdnmag/issues/05/01/CLRProfiler/default.aspx

CLR Debugging vs. CLR Profiling

https://blogs.msdn.com/jmstall/archive/2004/10/22/246151.aspx

next I will discuss how a single ShellExecute call opens a TXT file, with another GIF open unexpectedly.