SYSK 60: How Does SqlBulkCopy Deal with Duplicate Records?


The other day, I was asked the following question “Does the SqlBulkCopy update records if they already exist or must it be used only for INSERTS only?”  So, I thought others might benefit from knowing the answer…


SqlBulkCopy is nothing more than a more efficient INSERT…  That means that if there is an identical row in the destination table, it will be repeated.  That is, if it’s possible given the table constrains.


Q:  What happens if you have an identity primary key that’s the same as a primary key of a record being inserted via SqlBulkCopy? 
A:  The primary key will be auto-incremented, and you’ll see a duplicate row with only difference being in the primary key value.


Q:  What happens if you have a non-identity primary key that’s the same as a primary key of a record being inserted via SqlBulkCopy? 
A:  The SqlBulkCopy method will throw an exception “Violation of PRIMARY KEY constraint ‘Your_PK_Constrain_Name_Here’. Cannot insert duplicate key in object ‘dbo.Test’.   The statement has been terminated.”


Now, since I believe in teaching to fish rather than giving a fish, here is how you could’ve figured this out:




1. Create a simple table (e.g. table Test with pk_id (identity), SomeValue, LastUpdatedTimestamp columns)
2. Add a row (e.g. Insert Into Test(SomeValue, LastUpdatedTimestamp) Values (123, getdate())
3. Create a simple .NET windows app with the following code under button click:
       SqlConnection cn = new SqlConnection(“integrated security=SSPI;data source=(local);initial catalog=Test”);
       SqlDataAdapter da =  new SqlDataAdapter(“select top 1 * from Test”, cn);
       DataSet ds = new DataSet();
       da.Fill(ds, “Test”);


       System.Data.SqlClient.SqlBulkCopy bc = new SqlBulkCopy(cn);            
       bc.DestinationTableName= “Test”; // copy to self
       cn.Open();            
       bc.WriteToServer(ds.Tables[0]);
       cn.Dispose();
4. Run it
5. In Sql Management Studio, run the following query:  
       select * from Test
   You should now see two rows (pk_id 1 and 2)
6. Now change pk_id to not be an identity field
7. Run the C# program again – you should get an exception


Now you know!


Comments (6)

  1. Eduardo Scoz says:

    thanks for the explanation.

  2. neoscoob says:

    The real problem is, if your table doesnt allow duplicates, you are apparently screwed.  Its a shame there is no switch or param setting you can issue the SqlBulkCopy class (if there is, please let me know!!)

    Example:

    Say I have a table "Cars" with fields "CarId" (identity) and "CarName" (varchar) and the CarName field has a unique constraint.

    Now, I have a DataTable that contains a bunch of CarNames to insert.  If there are duplicates on CarName, the entire insert fails.  This is nothing to do with the PK or identity field.  But the problem I have, I would much rather have an option to silently not insert that dup row, but continue to insert the rest.

    Any work arounds for this would be much appreciated.

  3. irenak says:

    You might want to set the BatchSize property to something that’s a good compromise between getting the speed of bulk load and not having it be one transaction.  Each batch is a transaction, i.e. all or nothing.  For batches that fail, you have two options:

       1. Set batch size to 1, which then becomes a simple Insert

       2. Repeat the bulk copy into a so called working table, which has same columns as the destination table minus the unique constraint.  Then you filter out the duplicates and move the data from the working table to the destination.

    I have not profiled either of these solutions, so do your tests to see what solutions makes more sense and to determine the batch size…

  4. michael says:

    if sqlbulkcopy throws an exception will it continue processing if that exception is caught? also how would I get the batch that failed to resubmit it?

  5. christo says:

    but how will i copy the pk if i want to copy that using SqlBulkCopy…..???

  6. Ivan Paniagua Monroy says:

    Use a Trigger with INSTEAD OF INSERT and use inside MERGE statement.

    But don't forget add the parameter SqlBulkCopyOptions.FireTriggers in the SqlBulkCopy.

    This is my two cents.