OutOfMemory exceptions when using the SAP ADO Provider .. and how to work around them.

I ran the following two queries using the SAP ADO Provider

SELECT * FROM Table1 (this table contains 8 million rows in our SAP server)

SELECT * FROM Table2 (this table contains 40 thousand rows in our SAP server).

The first SELECT query ran fine, but the second one threw an OutOfMemory exception. What's going on?

Here's what happens - when you do a SELECT * or SELECT TOP n * (explanation holds good even when you select just a few columns) from a table, the ADO provider doesn't retrieve all the rows in a single shot. The reason being, that due to the way the RFC SDK works, we would end up loading all the <n> number of rows in memory, and that would take up a lot of memory. Instead, the provider retrieves the data from the SAP table in chunks, i.e., it loads a few rows at a time. By default, we retrieve 10,000 rows at a time. Therefore, in Table1 above, the provider ends up retrieving 10,000 rows in a single call to SAP, and makes 800 calls in total to SAP. For Table2, the provider makes just 4 calls to SAP.

However, each batch retrieved from SAP does need to be stored entirely in memory. Therefore, there can be a maximum of 10,000 rows of the table in memory. Well, not exactly - due to the way the provider is written, there can end up being twice as much, i.e., 20,000 rows in memory.

Now, assume Table1 had just one column - an RFC_CHAR of length 10. Since, 20,000 rows are stored in memory, this means, 20,000 * 10 = 200,000 characters. Also, since we use the unicode version of the RFC SDK, each RFC_CHAR takes two bytes. Hence, 400,000 bytes of memory approximately are consumed by the SAP ADO Provider (Note - I am focussing solely on the amount of memory required for storing the table data - the actual total memory usage of your process will be higher due to other overhead - e.g., your client app, the .NET runtime, etc, etc, etc).

In my test, what had happened was, that each row in Table2 contained an LRAW column of size 20,000. Therefore, 20,000 rows means 20,000 * 20,000 = 400,000,000 bytes of memory.

Now that you know what the problem is, how do you solve it? How do you prevent OutOfMemory exceptions from being raised when using the SAP ADO Provider against tables whose rows are of large sizes?

Luckily, we have a solution :)

SELECT * FROM Table2 OPTION 'batchsize 1000'

When the above command is executed, the ADO provider will extract the data in batches of 1,000 rows each. Therefore, 2,000 rows in memory * 20,000 bytes for each row * twice the amount due to Unicode = approximately 80 mb of memory consumed for the table data.