In the previous blog article, I described how to use SSIS DQS Domain Value Import. An error may occur during the execution of the project due to problems with the data. There are several available configuration options to deal with the error, related to:
- Flow Execution Status
- Knowledge Base publication
Flow Execution Status
In the SSIS DQS Domain Value Import configuration editor, there is an option for "Specify how to handle the incorrect values":
The following are the options:
- Fail Component: By default, the SSIS component execution fails when there is an import failure.
- Ignore Failure: Any failure in importing data into DQS Knowledge Base is ignored and the component continues to attempt importing all data.
- Redirect rows to error output: Same as Ignore Failure (execution complete successfully despite error in the data import) except that all error information are outputted (see Logging section below)
There are two main options for recording an error:
- Write error to the log
- Write error to the output file
Write error to the log
When using Fail Component or Ignore Failure, you can select option to Write every error as a warning to the Log. Let's use the Knowledge Base from the last article:
Consider to import the following data:
CREATE TABLE DQSCOLORS5 (
INSERT INTO DQSCOLORS5
('white' ,0 ,NULL)
,('gray' ,NULL ,NULL)
,('orange' ,3 ,NULL)
,('clear' ,NULL ,'tbd')
,('dark blue' ,NULL ,'deep blue')
,('light green' ,NULL ,'green')
Set the option to Write every error as a warning to the log:
Execute the component, when it fails, click on the Progress window:
You should see the warning log entries as follows:
The option for Write every error as a warning to the log is unselected by default to minimize performance overhead. However, this is a useful option during troubleshooting.
Write error to the output file
Alternatively, you can output the error information and write to a file.
- Change the configuration for Specify how to handle rows with incorrect values to Redirect rows to error output
- Drag and drop the Flat File Destination on the data flow canvas and connect the output line from the SSIS DQS Domain Value Import.
- On the Configure Error Output window, Set the value to the selected cells to Redirect Flow and click OK
- Double click the Flat File Destination to configure (note: prior to this step, you may want to create an empty text file on your machine)
- Click New to configure the connection and select the file format on the pop-up dialog box.
- Click Browse to locate to the empty flat file on your machine.
- Click Mappings to review the mapping and click ok to finish
- Your dataflow should look like:
Execute the project and open the resulting error file.
Let's review the error log. The first three errors are self-explanatory. Below is the additional explanation for the failure in the last three:
|Input||Reason for error|
|'clear', NULL, 'tbd'||Both 'clear' and 'tbd' already exists in the domain. Setting synonym when both values already exists in the domain is not supported. You will need to do this through DQS client|
|'dark blue', NULL, 'deep blue'||You can't set synonym to a value that is not a leading value. 'dark blue' is already set to be corrected to 'blue'. Chaining leading value from 'deep blue' to 'dark blue' to 'blue' is not supported. You must set synonym ('deep blue') directly to ultimate leading value ('blue')|
|'light green', NULL, 'green'||This is a tricky one. The reason for the error is because the 'light green' and 'green' are in the wrong order. 'green' should be set as leading value (the first column) while the new synonym should be set in the last column. Reversing the order may partially corrupt the domain as the 'light green' will be imported (see the screenshot below) while the synonym is not set.|
Knowledge Base Publication
Ok, I admit that this option is not necessary related to error handling, but more on how you manage the integrity of the DQS Knowledge Base. When you updating the Knowledge Base, you effectively creating a new version of the Knowledge Base and it is "checked out" to you during editing. While you are working on updating the Knowledge Base, any DQS projects continue to use the (previously) published version of the Knowledge Base.
When you automate updating DQS Domain Value through SSIS, there is a possibility that the Knowledge Base is being "checked out" for editing by a Knowledge Base owner through DQS Client. SSIS DQS Domain Value Import fails when the target Knowledge Base is not published to prevent conflict. Hence, you should check to make sure the Knowledge Base is published before running SSIS. However, what happen when there is a failure in updating the Knowledge Base during SSIS execution?
The following are the available options:
- Publish When There Is No Error: By default, the SSIS DQS Domain Value Import is set to this option. If there is any error, then the Knowledge Base will remain in unpublished state to allow the Knowledge Base owner to investigate and all other DQS projects continue using the "last known good state" of the Knowledge Base (before the import failure).
- Always Publish: Depending on your scenario, you may tolerate the error and change to this option. For example, if your automation is a simple case of importing valid values (without synonym or different types), then you may consider to change the option to "Always Publish". Typically when importing correct value, you encounter failure due to duplicate (value already exists in the domain). However, DQS prevents duplicate values being inserted -- if you try to import duplicate values, only 1 will be inserted. The fact that there is duplicate value in the import files -- and the duplicate values 'failed' to be imported -- do not affect integrity of the Knowledge Base.
- Never Publish: You may use this option if you want to be conservative and prevent the Knowledge Base to be used for cleansing until the Knowledge Base owner reviews the import job (regardless there is any error or not)
The default option works in most cases, but the other options provide nice flexibility when designing your process flow.
In this article, we discussed the error handling option for SSIS DQS Domain Value Import. It is a good complement to the DQS Cleansing component and enables DQS to be integrated with other system. Let us know what you think about the component.