Using the Cache Connection Manager (CCM) is a new option for the Lookup transform in SSIS 2008. The CCM provides an alternative to doing lookups against a database table using an OLEDB connection.
This post will suggest some best practices for using the Cache Connection Manager, and illustrate a couple of common scenarios that the CCM was meant to handle.
For more information on the Cache Connection Manager and how to make use of it with the Lookup Transform, please see the books online entries linked at the end of this post.
- Reuse the cache to reduce database load
- Share the cache between lookups to reduce memory usage
- Using the CCM is not always faster than OLEDB - the cost of disk access can out weight the benefits of pre-creating the cache
- The cache is essentially clear text - do not store sensitive data inside of the cache
- In terms of Cache Modes and the best practices that surround them, using a cache connection manager is equivalent to using a Full Cache mode
Reducing database and memory usage
If your reference database is remote, or under heavy load, consider using the Cache Connection Manager instead of an OLEDB connection.
Once a cache is used (or created) in an SSIS package, it will be kept in memory until the package has finished executing. The cache can be reused across multiple data flows, and shared between multiple lookups in the same data flow. It can also be persisted to disk, and reused across package executions.
Create the cache from any SSIS data source
To use the CCM, you need to create a lookup cache in a separate data flow using the Cache Transformation. Because the cache is created in a regular data flow, this means that you can now use any data source that SSIS can connect to as a source for your lookup reference (flat file, excel, SAP, etc).
With SSIS 2005, a common approach when using non-OLEDB accessible lookup sources was to stage the data first. If this data is only being used for your lookups, consider creating a persisted cache instead.
Cache the most common values
Sometimes you might have a large reference table, but the majority of your incoming data only uses a small portion of it. For example, you have a very large custom list, and the top 5% of your customers generate 90% of your transactions. In a scenario like this, you could pre-cache the information of your most active customers. Your data flow could use a cascading lookup pattern where you have one lookup which uses the cache, with its No Match output falling through to a second lookup running in a partial cache mode to hit the database to handle the remaining 10% of rows.