Using SSIS to get data out of Oracle: A big surprise!


Since late last year, Microsoft has made the Attunity connectors to Oracle databases available to Enterprise Edition customers. We all recognized right away that these connectors were faster than the pre-existing options from either Microsoft or Oracle, when moving data into or out of an Oracle database. It wasn’t immediately obvious what speeds we could expect from the connectors, so I did some experimenting to see. This posting summarizes some findings from those experiments done earlier this year, but rather than report all the details I will then zero in on the key lessons and one big surprise that came out of the work.

Before getting in to my findings, let me give a little anecdote heard from a colleague: There is at least one SSIS customer that now uses SSIS and the Attunity connectors to move data from an Oracle database to an Oracle database, because SSIS with Attunity outperforms any of the Oracle options! While I can’t corroborate that, the information to follow is from my own measurements.

To do this work, I used two machines each with 24 cores (64-bit, 2.4 GHz), one for SSIS and one for Oracle. The machines were practically an embarrassment of riches for this simple benchmark. The SSIS machine had flat files to read (when loading data into Oracle) or write (when extracting data from Oracle). The SSIS packages were super simple, just a flat file source and an Oracle destination, or an Oracle source and a flat file destination. The data was 75 million rows, 133 bytes per row, of the LINEITEM table from the TPC-H benchmark, as generated by the DBGEN utility.

Some basic findings (remember, you mileage will vary):

  • Putting data into an Oracle database using the Attunity connectors clocked 30,000 rows per second, on the order of 20 times faster than using the OLE DB connectors to Oracle.
  • Extracting data from Oracle moved 36,000 rows per second, about 20% faster than using the OLE DB connectors.
  • The above measurements were taken using “mixed” data types: Numbers were put in NUMBER fields, dates were put in DATE fields, etc. A funny thing happened though when all the data was put in string fields (VARCHAR2 use used for everything). Now we could hit 42,000 rows per second loading data into Oracle, and 76,000 rows per second extracting from Oracle!
  • The Fast Load option is supposed get higher performance through the use of the DirectPath API. In my experiments, I didn’t see a consistent advantage of Fast Load over non-Fast Load. The thing that Fast Load did seem to do was shift more of the CPU time from the Oracle process to the SSIS process. This could mean that if you have multiple concurrent SSIS packages sending data to Oracle, using Fast Load might let Oracle receive the data faster. Given my experience with Fast Load, I can only recommend that you check its performance in your own situation. Note: I’ve been told that Fast Load will be fixed in a maintenance release later this calendar year. So while I’m not promising anything, it’s likely that this will change.
  • The default batch size for the Oracle destination connector is 100 rows. Setting the batch size to 10,000 rows gave a boost of 10% to 50%, depending on other elements of the configuration. (When using Fast Load, you specify the buffer size instead of the row count. So estimate the buffer size needed to hold the number of rows you want, and use that number.)
  • When using the Oracle source, setting batch size to 10,000 rows gave a boost of around 10%, depending on other elements of the configuration.
  • I wanted to know how important it was for SSIS to be on a separate machine from the Oracle database. There was a good network connecting the source and destination servers, and also plenty of CPUs and memory on the servers. What I saw was a negligible difference between the case where SSIS and Oracle were on the same server and the case where SSIS and Oracle were on separate systems. My recommendation: Look at what resource is the most loaded in your environment, and configure to lighten the load on that resource.

The idea that performance with string data would be so different from performance with natural data types was a big surprise. The difference was especially pronounced when extracting data from Oracle. Now let’s face it, we would prefer to see data extracted from Oracle and placed in SQL Server databases! Given the big speed disparity and the fact that most real-world data needs to be in natural data types, I wondered if the same thing would happen if data was cast to string types in the query that SSIS issues against Oracle. So instead of having SSIS simply read the table, I gave it this query to run:

select
    TO_CHAR(L_SHIPDATE),
    TO_CHAR(L_ORDERKEY),
    TO_CHAR(L_DISCOUNT),
    TO_CHAR(L_EXTENDEDPRICE),
    TO_CHAR(L_SUPPKEY),
    TO_CHAR(L_QUANTITY),
    L_RETURNFLAG,
    TO_CHAR(L_PARTKEY),
    L_LINESTATUS,
    TO_CHAR(L_TAX),
    TO_CHAR(L_COMMITDATE),
    TO_CHAR(L_RECEIPTDATE),
    L_SHIPMODE,
    TO_CHAR(L_LINENUMBER),
    L_SHIPINSTRUCT,
    L_COMMENT
from ATTUSER.LINEITEM

Then before inserting the data into SQL Server using the SQL Server destination, I put in a data conversion task to get all the data into the correct types.

New Bitmap Image

At this point you must be thinking, “Surely converting the data twice can’t be the fastest way!” Well, here are the results: The first run below read the mixed data types using the Attunity Oracle source with default settings, converted to SQL Server types, then wrote to the SQL Server destination. The second run was like the first, with the addition of setting the batch size larger. The third run was like the first, but on reading from Oracle all the columns were converted to text as discussed above. The last test was like the third, with the addition of setting the batch size larger. Using the string conversion and larger batches, the run was over two times faster than the obvious out-of-the-box configuration.

New Bitmap Image - Copy

Overall, the Attunity connectors for Oracle really were fast, as expected. In doing this work a few lessons turned up that hopefully help you get optimal performance.

– Len Wyatt


Comments (17)

  1. bill-kline@cox.net says:

    Can we use Attunity to replace our linked server?

  2. sqlperf says:

    The Attunity connectors are for use with SSIS.  If you are using linked servers to Oracle and then using that as an SSIS source or destination, it might very well make sense to replace that with the Attunity connector to connect your SSIS data flow directly to your Oracle database.

  3. jholcomb says:

    Is the Oracle Attunity connector available for SQL Server enterprise customers the CDC (continuous data change) version (i.e. the one that hooks into the logs)?

  4. sqlperf says:

    The connectors you can download from Microsoft at http://www.microsoft.com/downloads/details.aspx?FamilyID=d9cb21fe-32e9-4d34-a381-6f9231d84f1e&DisplayLang=en do not contain the CDC functionality.

    Attunity Oracle-CDC for SSIS is available from Attunity.  Information is available at  http://www.attunity.com/oracle_cdc_for_ssis.  

  5. sjk35 says:

    One drawback I saw with this is that you can not use an expression for the SQL command on the object.

    I have a package that I dynamically build a SQL statement to only pull in rows with certain criteria.  I suppose I could pull in the entire table and use some of the filtering data flow objects to only take the records i need, but that would somewhat defeat the purpose of limiting the selected rows.

    Anyone have the same experience with this tool?

    Hopefully it can be built into a newer version, because I would love to use it.

    Steve

  6. MSDNArchive says:

    Thanks for Input , we will look into this request for the update .

    Ramakrishnan

    PM SSIS Team

  7. moosport says:

    A year later, Attunity still does not support SQL variable command. Transfer 10 M records from Oracle to SQL using Attunity takes about 5 mins. Other OLE providers takes over an hour.

  8. gmelhaff@comcast.net says:

    This is true and also note that you cannot use variables in dynamic sql like you can in ado.net. However you CAN use an expression in your oledb source query – just build the sql in a variable (say using vbscript in previous command task) and then set the property SqlCommand for your Oracle source in the expressions editor to that variable. Works like a charm albeit a bit of work to do something you'd think would be simpler.

  9. Gary says:

    Set variable to the sql and then use in expression for the oledb source

  10. nwebsolution says:

    thanks for providing this info ..it helped me

    Regards

    nwebsolution

    W:www.nwebsoluiton.com

  11. Steve Fu says:

    Does the Attunity support Windows Server 2008 R2 (SP1) environment?

  12. Bjorn Hoj says:

    All works fine within BIDS with the Attunity driver. Now I want to schedule my package running it from the file system using my sql agent service and checking the 32 bit execution. But it's like attunity dll is not known:

    The connection type "MSORA" specified for connection manager "Oracle Connector" is not recognized as a valid connection manager type. This error is returned when an attempt is made to create a connection manager for an unknown connection type. Check the spelling in the connection type name.

    Please help?

  13. mmasson says:

    Hi Bjorn – make sure you've installed the 32bit provider on the machine that is running SQL Agent. If you're sure you've installed the provider, restart the SSIS Service and try again.

  14. Eswar says:

    How can we perform testing on SQL data base while everything data  will be reached r not

  15. I cant believe I need to set the connection managers sql expression to use a variable with these data flow components. what the heck!

    I'm floored that this functionality isnt standard!

  16. GaryM says:

    There is an advantage of using variable via expression to derive SQL statement – that is you can see the actual SQL issued via the value change in the variable if you use BiXpress audit framework – that's an incredible debugging capability to have when needed. I like the use of parameter substitution for source sql and use it because it's easy but since BiXpress doesn't yet monitor package parameter values, I lose the simple exposure that I get with the use in combination of variable/expression derived SQL and BiXpress

  17. Dan says:

    I have run some similar performance tests and can confirm that the Attunity Oracle Destination is orders of magnitude faster than the Oracle Provider for OLEDB, I also found the Direct Path option offered no performance improvement.

    However, as these can only be used with SQL Server Enterprise Edition I was looking into other options and found that using an ODBC connection to Oracle gave exactly the same performance as the Attunity Oracle Destination for inserting data  (if not using Direct Path). I guess that's not surprising as it's using ODBC under the covers apparently. The only thing you lose is the compile time errors that the Attunity component provides, which could be quite a big deal I guess.