Ventured into BULK INSERTS

Recently, whilst working with bulk insert queries I came across an unexpected result set which led to some more research. Therefore, following facts/tips/tricks/workarounds might help you:

 

If you execute the following code for example:

 

NOTE: Please bear in mind that the log file used to import the data in this example (errorlog.1) has 2972 lines.

 

create table #ErrLog(LogTe nvarchar(max) NULL)

 

bulk insert #ErrLog from 'PATH TO YOUR SQL LOG FOLDER\LOG\ERRORLOG.1' with (firstrow = 6)

 

select count(*) from #ErrLog

 

(import from your sample error log file)

 

Execution Results:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Bulk load: DataFileType was incorrectly specified as char. DataFileType will be assumed to be widechar because the data file has a Unicode signature.

Bulk load: DataFileType was incorrectly specified as char. DataFileType will be assumed to be widechar because the data file has a Unicode signature.

 

------------------------------------------------------------

Single Transaction Row Count: 2967

-----------------------------------------------------------

And

 

bulk insert #ErrLog from ''PATH TO YOUR SQL LOG FOLDER\LOG\ERRORLOG.1' with (firstrow = 6, batchsize=1000)

 

select count(*) from #ErrLog

 

Execution results :

 

Bulk load: DataFileType was incorrectly specified as char. DataFileType will be assumed to be widechar because the data file has a Unicode signature.

Bulk load: DataFileType was incorrectly specified as char. DataFileType will be assumed to be widechar because the data file has a Unicode signature.

 

-----------------------------------------------------------

Multi Transaction Row Count: 996

----------------------------------------------------------

 

You will start wondering!!

 

1. Why are the two rowcounts different?

 

2. Why are we getting the “Bulk load:” error?

 

On further research and several hit and tries, following are the infrerences drawn:

 

For the first query, the row count appears to be correct. In this case, the total no. of rows in the log file appears to be 2967+5=2972 (leave out the first 5 rows (including blank rows) and import the rest till the end).

 

For the second query, by using BATCH SIZE hint you expect SQL to import the next 1000 rows everytime you run that query. The results displayed for the first time is correct, however if you were to run this query again, it will not import anything from the log file. If you read the documentation available in books online for the BATCH SIZE query hint, it says:

 

" Specifies the number of rows in a batch. Each batch is copied to the server as one transaction. If this fails, SQL Server commits or rolls back the transaction for every batch. By default, all data in the specified data file is one batch. "

 

This also inclines toward the thinking that if we execute the second query again, next 1000 rows from the log file should be imported. However, that’s not the case. Its not meant to do that. Therefore, if you were to import the next 1000 rows by using the Batch Size hint in the query, you should change the query to:

 

bulk insert #errlog from 'PATH TO YOUR SQL LOG FOLDER\LOG\ERRORLOG.1' with (firstrow = 997,batchsize=2000)

 

This will execute the next 1003 rows.

 

Tips/Tricks/Options:

 

In order to achieve this in a better way, there are other hints available with the bulk insert command that you may use:

 

1. rows_per_batch :

Usage:

 

bulk insert #errlog from 'PATH TO YOUR SQL LOG FOLDER\LOG\ERRORLOG.1' with (firstrow=6,rows_per_batch=200)

 

This will insert all the rows in the log in the batch of 200 rows at a time. This will go till the last entry of the error log, however do it in a batch of 200 rows. This is usually helpful, if you want to reduce the load on SQL server whilst doing heavy bulk inserts.

 

2. Lastrow:

 

Usage:

 

bulk insert #errlog from 'PATH TO YOUR SQL LOG FOLDER\LOG\ERRORLOG.1' with (firstrow=6,lastrow=200)

 

This will leave out the first five rows, start from the 6th, and go up until the 200th row. However, since it is manually difficult to open all the files and figure out the last row, this option should only be used when you are sure of the last row to be imported from the log.

 

3. Conjunction of first row, last row and batch per size:

 

Usage:

 

bulk insert #errlogwithbatch from 'D:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\ERRORLOG.1' with (firstrow=6,rows_per_batch=1000,lastrow=2000)

 

 

This will leave out the first 5 rows, start from the 6th, go up until the last row which is 2000, and import twice in batches of 1000 rows each. The first import will import 1000 rows, however the next one will import only 995 as those many are left as per the lastrow parameter.

 

Finally, To fix the message:

 

Bulk load: DataFileType was incorrectly specified as char. DataFileType will be assumed to be widechar because the data file has a Unicode signature.

Bulk load: DataFileType was incorrectly specified as char. DataFileType will be assumed to be widechar because the data file has a Unicode signature.

 

You must use the WideChar hint. WideChar hint is used to bulk import Unicode character data. This is specific to this example as the errorlog.1 file which is used in this example has UNICODE data in it, hence I was getting this message. For more information on WideChar hint, please go through the following:

 

https://msdn.microsoft.com/en-us/library/ms188289(v=sql.105).aspx

 

 

Usage:

 

bulk insert #ErrLog from 'C:\ERRORLOG.1' with (firstrow = 6, DataFileType='widechar')

 

Hope this helps.

Author: Amit Arora, Support Engineer, Microsoft India GTSC

Reviewed by : Sumit Sarabhai, Support Escalation Engineer, Microsoft India GTSC

                     Pradipta Das, Technical Lead, Microsoft India GTSC