SQL Server 2008 (Katmai) introduces VSTA as our new scripting engine, which replaces the old VSA engine used in SQL Server 2005 (Yukon). There are a number of advantages to the new engine, but I think the feature users are most excited about is that it allows you to use C# as a scripting language (as well as VB.NET). It also makes it easier to reference all .Net assemblies, which enables a number of common scenarios (such as easier web service integration).
Data Flow Task Improvements
Michael Entin provided a great write-up for the enhancements to our data flow engine. It essentially boils down to smarter thread allocation and processing of our execution trees. Long chains of synchronous transforms are optimized, and most packages will see a performance increase on multi-processor machines.
One CPU usage graph I really like is the one shown at PASS this year:
This shows Yukon package execution on a quad CPU machine. You can see CPU #4 is used steadily, while two others spike occasionally. It also looks like usage goes down on #2 when it goes up on #3, and vice versa.
Running the same package using the Katmai engine looks like this:
This graph shows much better CPU usage distribution. In this case, the package received an 80% performance increase, but your mileage may vary.
Enhanced ADO.Net Support
Katmai now has an ADO.Net source (replacing the DataReader source) and destination adapters. They function a lot like the OLEDB ones, supporting a custom UI, drop down list of tables/views, and query builder. Its properties are also expression-able. This means we now support ODBC as a destination, as well.
The SQL Server Import/Export Wizard (which generates SSIS packages) has been improved as well.
- ADO.Net support (which gives you ODBC)
- Data type conversion – mappings are read from customizable (xml) files, and data convert transforms will be inserted into the data flow if required
- Scales better – large numbers of tables will be split across multiple data flows (up to five sources per data flow)
Here is an example of the data conversion page:
As I mentioned in an earlier post, Jamie Thompson provided a very good write up of the new lookup component. The main feature is that you can now create a lookup cache in a separate data flow using the new Lookup Cache Transform and Cache Connection Manager, and persist it to disk. This cache can be created from any data source supported by SSIS, which removes the need to stage your lookup table in a database.
This persisted cache is especially useful in scenarios where reference data is updated periodically – say, once a week. You can run a package to generate your cache when the update comes in, and re-use it in the rest of your packages until it needs to be updated again.
Another scenario is managing large lookup tables, where a small number of keys are used a large percentage of the time. For example, 5% of your customer IDs show up in 80% of your rows. Instead of caching the entire lookup table, you can use a cached lookup for the top 5%, and cascade another lookup in partial / no-cache mode for the infrequent rows.
As you can see from the screen shot, Lookup also has a new “No Match” output – rows with no matches aren’t directly sent to the error output anymore. We also maintain a “Miss Cache” internally, so missing keys aren’t constantly re-queried.
More to follow…