Before SQL2005, you could use Bulk Insert or BCP to import data into SQL Server. While both of these mechanisms have been popular and used widely, they don’t provide any processing of input data before bulk importing into the target table. If you need to preprocess the data, you can use DTS (aka SSIS) or, now in SQL2005, you may be able to use OPENROWSET with BULK rowset provides to accompolish the same, but for simpler tasks using SELECT statement functionality.
Recently, I was asked how to Bulk Import data after removing whitespaces into a varchar column in the target table. The customer desribed the problem as “We’re loading data from a text file with fixed column widths into a table in SQL Server 2005.All table fields are varchars of the same length as the source fields. The problem is that the BULK INSERT process doesn’t truncate whitespace, and since the source fields can contain variable-length values or nothing at all we wind up with fields ending with or containing nothing but spaces. Our best plan at the moment is to RTRIM() every column and insert the data into another table, but since we’ll be loading millions of rows every day we were looking for any more efficient methods”
You can easily accompolish this task using OPENROWSET. Here is one example
create table t_cust (c1 varchar(20), c2 varchar(20))
You can use the following command to remove the whitespaces and importing the data into the target table simultaneously. This Bulk Import follows the bulk optimizations rules (i.e. bulk logging) similar to the ones outlined for Bulk Insert in BOL.
INSERT into t_cust with (TABLOCK)
SELECT RTRIM(c1), RTRIM(C2)
from OPENROWSET (BULK <data-file>,<format-file> as t1
You will note that OPENROWSET reads the data from the datafile and then you can remove the trailing blanks by calling the function RTRIM for each value.
An example format file is
<?xml version=”1.0″ ?>
<BCPFORMAT xmlns=”http://schemas.microsoft.com/sqlserver/2004/bulkload/format“ xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance“>
<FIELD ID=”1“ xsi:type=”CharTerm“ TERMINATOR=”,“ MAX_LENGTH=”20“ COLLATION=”SQL_Latin1_General_CP1_CI_AS” />
<FIELD ID=”2“ xsi:type=”CharTerm“ TERMINATOR=”rn“ MAX_LENGTH=”20“ COLLATION=”SQL_Latin1_General_CP1_CI_AS” />
<COLUMN SOURCE=”1“ NAME=”c1“ xsi:type=”SQLVARYCHAR” />
<COLUMN SOURCE=”2“ NAME=”c2“ xsi:type=”SQLVARYCHAR” />