On a couple of my recent projects, I was responsible for developing the SQL Server Integration Services (SSIS) packages to load data into a data warehouse. SSIS is no different than any other development activity in that there are any number of ways to do the same thing. One of the techniques that worked really well has been leveraging views instead of directly querying the data in the staging database. Let’s take a step back for the sake of context. Whenever possible, I “land” the source data in a staging database in a format that is very close to the source format. My primary objective in doing this is to get the data off the source systems while placing the least amount of strain on the source system. Now, back to the topic at hand.
I always create my views based on the destination table schema, then add any fields that come from the source, are used in SSIS, but do not get sent to the destination (i.e. – fields used for lookups). I place these views on top of the staging tables (instead of directly querying the staging table). This approach can provide a number of benefits. Creating a view that mimics the destination (field names, data types, etc.) enables SSIS to automatically do a lot of the field mapping for us. A lot of data cleanup (such as removing null values) can be handled in the view, which means that the SSIS package can be a little leaner. If you have multiple sources that need to be combined so that a common set of processing can occur, using views can save a lot of work as new sources are added. For example, one of my projects involved combining contracting data from multiple child organizations so that the parent organization could have a complete view of the contracting activity. We needed an approach that would allow us to add new sources (child organization contracting systems) without causing us to rework our existing packages or reinvent the wheel by creating multiple packages that performed the same logic. The solution was to use a view (schema of output was the same for every source) on each source. Adding a new source into the existing SSIS packages was as simple as adding a new source and handling the source updates (marking complete/error). Since the view guaranteed that the schema of the new source matched our existing schemas, SSIS didn’t complain about type mismatches, size differences, etc. It was basically a plug-and-play.