My main customer utilizes SSIS for some SQL and Oracle integration requirements. One challenge they’ve had is timeouts from the Oracle side. We’ve tried everything to eliminate the issue in terms of extending timeout values within SSIS, but the problem appears to be that for very large data transfers into Oracle, Oracle simply gets tired of waiting and the connection gets closed. One of the developers working on this project has found a way around this by transferring data in smaller chunks to Oracle. He asked me about a way to automate this in SSIS so that the records are loaded in smaller chunks. I pointed him to the FOR LOOP construct and recommended creating a second table with an identity column that could be used as a means to ensure that there was a key value associated with the record range.
That’s about all I did, the rest of this is his work. My thanks to Robert Skinner, HP for sharing the finished package with me. I’ve modified the package to remove the application-specific data/fields and reduced down to just a simple table with a couple of fields, so we can focus on the approach and not get bogged down in the schema. For our example, we will use a flat file for import and just a SQL database for output, the destination connectors can be changed as needed to support exporting to another OLE DB destination such as Oracle without changing the actual components. Let’s walk through the design and implementation.
There are 3 main steps to the process:
- Ensure that a staging table exists and is prepared to store the imported rows that will need to be exported to Oracle. The table needs to be empty prior to importing. The staging table has the following attributes:
- All of the columns from the imported table
- A primary identity key.
- Load the imported data into the staging table.
- Prepare the destination (in this scenario, this requires truncating the data in the destination).
- Cycle through the staging table selecting the range of keys associated with the “chunking” number and copying to the destination table. For example, if only 50,000 rows are being loaded per insertion, each selection would be the next 50,000 rows – for example: 1 – 50,000, 50,0001 to 100,000, etc.
Here is a screen snapshot of the package.
Before loading the import data, we want to have a clean table to load into and make sure the identity range is reset. The easiest way to do this is with a TRUNCATE table if it already exists, otherwise simply create it. Robert actually uses the same logic as is generated by scripting using the SQL Server tasks/script that will check for the table, truncate it if empty, and if not actually create the table. For our scenario, we just truncate the 2 tables in the first 2 steps using a OLE DB Source with a SQL Command:
- Truncate Table Quote_Staging
- Truncate Table Quote
Next, we load from the flat file into the Staging Table. This will result in all of the original rows being assigned to a sequential identifier. Since we truncated the table, we are guaranteed that each number is sequential. Note the use of the Row Count component, this will capture the total number of rows in the flat file that get loaded into the staging table.
Here are the first few rows after the load step.
Let’s look more closely at how the FOR LOOP works with the script task.
The For Loop Properties controls the initialization, evaluation, and assignment. For this scenario, we have the following variables:
- RowCountLast: The identity of the last row loaded.
- RowCountIncrement: The amount of rows to load in each chunk.
- RowCountTotal: The total number of rows in the table
- SqlCommand: Contains the SQL command to execute on each iteration of the loop to “chunk” the data rows. Initially this should be set to just “Select * from the staging table – this is “Select * from Quote_Staging” for our example.
The script is used to build the SQL command variable that selects on the range identified for each iteration of the for loop. For example, on iteration 1, using an increment of 500, the SQL command would be set to query all of the rows between 0 and 500. On the next iteration, the last row would be set to 500 and the SQL Command altered to load from 501 – 1000. This continues until the number of rows loaded exceeds the total row count. Each iteration of the for loop establishes a new connection and only load the desired number of rows. Although, this is normally not optimal for most loading, this works around the Oracle issue. This is also useful if there are a huge number of rows, and monitoring of the process at a lower granularity is needed than for the whole table.
Here is the script. Note that this script is highly reusable, the only item that needs to be changed for another package would be the name of the table and the identity column. To make this even more reusable, variables could be created.
The data flow task in the for loop simply uses the SQL Command variable for the source and the final table for the destination.
Attached is a zip file containing a sample database schema, flat file with import data, and SSIS package.
My thanks again to Robert Skinner for providing me a sample SSIS package for using this including the script.