Does ADO.NET update batching really do something?


I’ve gotten this same question several times in the last few weeks, so I thought it would be a good idea to put together some details on the topic. Update batching is a new ADO.NET 2.0 feature that, when enabled, introduces an optimization to the process of executing a DataAdapter.Update call (which basically takes the changes from a DataTable and pushes them into a database using the INSERT/UPDATE/DELETE commands).


Without batching, Update walks through each change in the DataTable and for each one of them it executes the corresponding command (obtained through the InsertCommand, UpdateCommand and DeleteCommand properties of the adapter). Each execution is separated from the next one, so your application keeps going back and forth over the network between the client and the server.


With batching, we can take a bunch of these operations and send them in a single round trip. This optimizes the operation in a number of ways, the most notorious one being having less network round-trips, which saves quite a bit of overhead in most (although not all) cases.


Batching implementation strategy #1
One way of implementing batching is to concatenate a bunch of SQL statements together. For example, if you have to do an insert and an update, you can simply put a batch together like this (in T-SQL):


INSERT INTO Person (id, name) VALUES (1, ‘John’);
UPDATE Person SET name = ‘Juan’ WHERE id=2


That will execute both statements in a single shot. However, the thing gets tricky if you want to do it in a generic way. For starters, we have to use parameters, because the data-adapter needs to replace the values for INSERT/UPDATE/DELETE for each particular DataRow. So this becomes:


INSERT INTO Person (id, name) VALUES (@id, @name);
UPDATE Person SET name = @name WHERE
id=@id


Now this doesn’t really work, because we’re using the same parameter names, but we actually want different values (the INSERT line wants the values from one DataRow, and the UPDATE line from another DataRow). So we have to rename the parameters so they are unique. Renaming parameters means that before the adapter executes any command, it has to parse the SQL statement (at least enough parsing to isolate parameter markers) and replace the original parameter names with new ones so their names don’t clash. That’s bad, parsing SQL statements is problematic and the operation consumes CPU and causes more allocations.


There are also two more subtle issues with this approach:
1. It turns out that some databases such as SQL Server have an upper limit to the number of parameters that you can pass to a single parametrized statement invocation. In the case of SQL Server this is around 2100. That means that your batch cannot sum up more than those parameters total (not per statement). Since each (INSERT/UPDATE/DELETE) command can have a different number of parameters, it’s hard to predict how many rows you’ll be able to process.


2. The second side-effect has to do with plan caching. Database servers usually cache query plans for efficiency. Batching can cause trouble here. For example, let’s say you have the two commands we used above for INSERT and UPDATE setup in a data-adapter. In one call, you have 1 updated row and 2 inserted rows, so your batch becomes (simplified version):


UPDATE Person SET name = @name1 WHERE id=@id1;
INSERT INTO Person (id, name) VALUES (@id2, @name2);
INSERT INTO Person (id, name) VALUES (@id3, @name3);


Now, if right after that you get another request with the exact same statements, but with a table that has 1 insert followed by 1 update, the batch will be:
INSERT INTO Person (id, name) VALUES (@id1, @name1);
UPDATE Person SET name = @name2 WHERE
id=@id2;


That is, for every permutation of changes in a DataTable we’ll produce a different batch. That will bloat the server’s procedure cache introducing inefficiencies. This is the approach that we used for batching in for both SqlClient and OracleClient during Whidbey beta 1. In beta 2 OracleClient still uses this technique (mapped to its PL-SQL equivalent), but SqlClient doesn’t.


Batching implementation strategy #2
The other way of doing batching is “simpler”, but happens at a lower level. I’m going to describe the SQL Server implementation here, but other database servers might have a similar feature. At the client-server protocol level (TDS), each command is executed against the server using a “request”. For parametrized and batched statements, this request is an “RPC request” (nothing to do with the DCE RPC, just same name). It turns out that in SQL Server we can put one RPC request on the network, and right before putting the “we’re done” mark on the wire, we can actually put a “here comes another RPC” mark. This way, we can keep sending RPCs over the wire all together. Once we sent all of them, we switch to listen for responses, and we get the responses for each RPC in the order we sent them. The protocol has the appropriate infrastructure that allows us to distinguish when a response for a given RPC ends and the next starts.


This means that we get rid of the following problems:
– We can send multiple request for statement execution in a single network round-trip, which was the original goal
– No need to parse statements and rename parameters, because each statement goes in a different request
– No procedure cache bloat, because regardless of the order of execution, each row has an RPC of its own
– No limit to the number of rows that can be updated with a single batch, because we don’t depend on lots of parameters for a single RPC
 
There are some minor specific drawbacks to RPC-level batching, such as paying a little bit of extra cost in the server per-RPC execution, but the benefits above in general out-weigh those.


Sure, but is it really batching?
With all this background in place, now I can tackle the initial question: does batching really work? How can I see it working?
If you have a SqlDataAdapter and set the UpdateBatchSize property to 0 (meaning all changes in a single round-trip) or a value greater than 1 (indicating the number of rows per batch), we’ll enable batching. The first thing you’ll want to do is use SQL Profiler to see how the statements are being batched. However, you’ll find out that you still see one RPC execution for each DataRow that needs an INSERT/UPDATE/DELETE.


For example, let’s take this trivial code snippet that uses batching:


DataTable t = new DataTable();
t.Columns.Add(“a”, typeof(int));
t.Rows.Add(1);
t.Rows.Add(2);
// don’t hardcode connection strings in your apps, this is just a quick sample :)
// pre-create the table with “CREATE TABLE t (a INT)”
using(SqlConnection conn = new SqlConnection(“server=tcp:localhost; integrated security=true; database=test”)) {
    conn.Open();
    SqlDataAdapter da = new SqlDataAdapter(null, conn);
    da.InsertCommand = new SqlCommand(“INSERT INTO t VALUES (@a)”, conn);
    da.InsertCommand.Parameters.Add(“@a”, SqlDbType.Int, 0, “a”);
    da.InsertCommand.UpdatedRowSource = UpdateRowSource.None;
    da.UpdateBatchSize = 10;
    da.Update(t);
}

 
The filtered SQL profiler output will look like this:

SQL Profiler Trace

What’s happening there is that you effectively have a separate RPC for each statement, so the profiler is accurate. Batching is actually happening at a lower level in SQL Server. If you really want to see batching working, you’ll have to use a lower-level tool such as “netmon” or some other network monitor. Using netmon you would be able to see that all the RPCs are effectively sent together, and then all the responses are also sent together by the server. For example, for this trivial program that does 2 inserts in a single batch, here is the netmon trace for the packet that contains the execution:
00000000  00 06 5B BD 30 E5 00 30 6E 4C 31 63 08 00 45 00 ..[.0..0nL1c..E.
00000010  01 20 B9 EC 40 00 80 32 11 13 9D 37 79 43 9D 37
….@..2…7yC.7
00000020  7A FA E9 BA 57 A2 00 00 01 0D 12 1F 05 99 83 57 z…W……….W
00000030  47 89 AF 31 33 AF 50 18 42 C8 36 94 00 00 03 01 G..13.P.B.6…..
00000040  00 DB 00 00 01 00 FF FF 0A 00 00 00 00 00 E7 32 ……………2
00000050  00 09 04 D0 00 34 32 00 49 00 4E 00 53 00 45 00 …..42.I.N.S.E.
00000060  52 00 54 00 20 00 49 00 4E 00 54 00 4F 00 20 00 R.T…I.N.T.O…
00000070  74 00 20 00 56 00 41 00 4C 00 55 00 45 00 53 00 t…V.A.L.U.E.S.
00000080  20 00 28 00 40 00 61 00 29 00 00 00 E7 0C 00 09 ..(
.@.a.)…….
00000090  04 D0 00 34 0C 00 40 00 61 00 20 00 69 00 6E 00
…4..@.a…i.n.
000000A0  74 00 02 40 00 61 00 00 26 04 04 01 00 00 00 80
t..@.a..&…….
000000B0  FF FF 0A 00 00 00 00 00 E7 32 00 09 04 D0 00 34 ………2…..4
000000C0  32 00 49 00 4E 00 53 00 45 00 52 00 54 00 20 00 2.I.N.S.E.R.T…
000000D0  49 00 4E 00 54 00 4F 00 20 00 74 00 20 00 56 00 I.N.T.O…t…V.
000000E0  41 00 4C 00 55 00 45 00 53 00 20 00 28 00 40 00 A.L.U.E.S…(.@.
000000F0  61 00 29 00 00 00 E7 0C 00 09 04 D0 00 34 0C 00 a.)……….4..
00000100  40 00 61 00 20 00 69 00 6E 00 74 00 02 40 00 61 @.a…i.n.t..@.a
00000110  00 00 26 04 04 02 00 00 00 01 02 03 04 05 06 07 ..&………….
00000120  07 06 90 CE 9B 4E 42 21 15 E4 56 DD 44 E4       …..NB!..V.D.
 
 
As you can see, both inserts are in the same network packet. No need to understand the whole thing – it’s enough to notice that there are two insert statements in the payload.I hope this helps clarify doubts around the topic. If you have further questions feel free to post them here and we’ll take a look.


Pablo Castro
Program Manager – ADO.NET Team
Microsoft Corp
.


Disclaimer: This posting is provided “AS IS” with no warranties, and confers

Comments (23)

  1. Ricky Datta says:

    Do you plan to implement batching for

    OracleClient just like SqlClient

    by release timeframe ?

  2. MSDNArchive says:

    Ricky, Yes, we will support Batching in Oracle client in RTM. Our implementation will be based on strategy#1 mentioned in the blog. As pointed by Pablo in the article above " ..OracleClient still uses this technique (mapped to its PL-SQL equivalent)".

    Sorry for the confusion, if any.

    Thanks!

    Sushil Chordia [MS]

  3. Paul@Nortel says:

    I like the reduced round trips but the fact that the data is sent over the wire in string form seems to be inefficient. Is there a way in ado.net 2 to send a group of inserts (1000+) over in binary form? In binary form an int stays and int (4 bytes). I know in sql server 2000 I have to use bulk insert and that too requires everything to be string.

  4. venu says:

    Nice one Pablo

  5. Pablo Castro [MS] says:

    Hi Paul,

    Actually, only string data is sent to the server in string form. Other data is sent in their original format; that is, if you give us an int in a SqlParamter value where the parameter type is set to SqlDbType.Int, then we put an int on the wire (4 bytes). Same thing for all other types.

    -pablo

  6. Steve Strong says:

    Looks good – any reason why we can’t access the AddBatchCommand and ExecuteBatchRPCCommand? It would be very useful to have this level of functionality available without having to go through the DataAdapter.

  7. John Papa says:

    Pablo Castro, Microsoft’s Program Manager of the ADO.NET Team, has a good post last week that helps clarify…

  8. Vyas Bharghava says:

    Thanks Pablo, for the very informative post.

    Yes, we’ve reduced network round trips. But how do I enable an all-or-nothing batch transaction across these multiple RPCs? Is it my imagined problem? All I need to do is to call BeginTransaction() on the connection? Or is there something more to it?

  9. Chandra Nathani says:

    Hi Pablo,

    I havent seen a very dramatic improvement in performance. Here’s what I observed.

    BatchUpdateSize = 100

    For 3000 rows

    with 4 columns being updated takes 3 Mins

    With out setting the batchupdatesize it takes 3:40 seconds.

    Its just 40 seconds. I was expecting it to be really fast.

  10. Chris Ongsuco says:

    Nice to see your solution but how about if I have 3 insert sql to different tables and I want to batch it in 1 round trip to the DB?

    1. Insert into Table1 with 3 parameters

    2. Insert into Table2 with 4 parameters

    1. Insert into Table3 with 10 parameters

    Is this possible with the sample you gave? Or do I still need to concat these sqls with ";"?

  11. MSDNArchive says:

    Chris, We dont have batching at command level for .Net Framework version 2.0. Batching is only supported at the Adapter level. Meaning all the update, insert and delete commands on the adapter are the only ones that get batched. Hope this helps.

  12. Chris Ongsuco says:

    That helped a lot. Thanks!

  13. Ali says:

    Quick question:

    Assuming the application is a web based and assuming a thin client architecture.

    Are the batching RPC calls originating from the web-based client or from the application host server (the web server).  

    Thanks

  14. Andy L. says:

    Thanks.

    This clears up the confusion caused after following your webcast which demoed the new features of ADO.NET 2.0 – standalone DataTables, binary serialization, db agnostic access, batching etc.

    In my own tests, I saw 3 updates in Sql profiler after setting UpdateBatchSize to 100 unlike your example in the webcast which showed only 1 so I thought that either batching was not working or some other parameter needed to be set to turn batching on.

    This really cleared it up. Thanks again.

  15. Rem says:

    Hi,

     Is there any way to get the Row on which an error occurred after/during processing of a batch.

     ie. I am doing a batch insert of 2000 records and record 30 has an error (eg. data is too long for field).  In the RowUpdated event it tells me of the error but the DataRow is always the last one in the batch?  I have also tried looking at the datatable AFTER the update has been done and ALL the rows have their ‘error flag’ set?

     Can I find out which row/s really had the error?

    thanks.

  16. The following links to .NET resources have been collated over time with the assistance of colleagues. …

  17. albert-einstein says:

    dear sir i want to change the update record by record done by the gridview control and make it bayched – i mean group of records together in one ransaction

    can i di this

  18. aa says:

    I had a quick look at the doco but couldn’t find a way to accomplish this with my own custom collections. Is this possible?

    eg something like the following:

    SqlCommand cmd = new SqlCommand();

    cmd.BeginBatch();

    foreach (Item i in Coll c)

    {

     cmd.Command = "insert into t1 … ";

    }

    cmd.ExecuteBatch();

  19. An excellent post by Pablo Castro of  the ADO.NET team explaining the mechanics of how batch updates…

  20. biology high school science fair ideas