SQL Server Data Quality Services (DQS) is a knowledge-driven data quality solution. You can define knowledge manually; acquire it from data samples of your organization, or knowledge provided by third party services. We call this Reference Data Services (RDS). DQS makes it easy to cleanse and enrich your data using leading reference data service providers from Windows Azure Marketplace, such as Melissa Data, Digital Trowel, Loqate and CDYNE Corp. You can attach a DQS domain or DQS composite domain to a reference data service provider and use it to cleanse and enrich your data.
This post will focus on getting you up and running with the DQS RDS feature, and demonstrate how you can cleanse US Address data using Melissa Data’s AddressCheck service and DQS.
Subscribing to the Reference Data Providers Service –
After DQS installation and in order to start using the RDS features, the first step is to subscribe to the RDS service provider. To subscribe, go to the DataMarket site here and select the reference data provider you would like to subscribe to. For this specific example we will use Melissa Data’s AddressCheck Service:
Once subscribed go to your account data and copy your account key (you will need this key for the DQS configuration)
Configuring DQS to use the DataMarket and the RDS provider –
Now that you have an account key and you are subscribed to a reference data provider go to the DQS client and do the following:
- In the DQS client home screen, under Administration, click Configuration. The configuration General Settings tab appears.
- Click the Reference Data tab
- Enter/Paste your DataMarket AccountID in DQS:
- Click to validate the account ID.
Attaching a domain/composite domain to a RDS
After you configured the reference data services settings in DQS, you need to attach and map the RDS to a specific domain in your knowledge base. Let’s create a knowledge base and attach a composite domain to the Melissa Data AddressCheck service:
- In the DQS client home screen, under Knowledge Base Management, click New knowledge base.
- In the New knowledge base screen, type a name for the new knowledge base, click the Domain Management activity, and click Create.
- In the Domain Management screen, click the Create a domain icon ( ) to create a domain. Create the following four domains:
- Address Line
- Click the Create a composite domain icon ( ) to create a composite domain. In the Create a composite domain dialog box, type Address Validation in the Composite Domain Name box, and include all the domains created in step 3 in the composite domain. Click Ok.
- In the Domain pane on the left side, click Address Validation, and then click the Reference Data tab on the right side.
- Click the Attach Online Reference Data Service Providers icon ( ).
- In the Online Reference Data Service Providers dialog box:
- Select the Melissa Data – Address Check box.
- Map the reference data service schema (data columns) with the appropriate domains (Address Line, City, State, and Zip) in the composite domain. RDS schema fields which contain (M) are mandatory schema fields that must be mapped to a domain to, others schema fields are optional.
- Click OK to save the changes, and close the Online Reference Data Providers Catalog dialog box.
- You will return to the reference data tab –
- Click Finish to publish the knowledge base.
Cleansing and enriching your data
Now that we have attached our composite domain to a RDS service we can use the knowledge base we created in a cleansing project to cleanse and enrich data.
- In the DQS client home screen, under Data Quality Projects, click New data quality project.
- In the New Data Quality Project screen:
- Type a name for the new data quality project.
- In the Use knowledge base list, select the knowledge base that you created earlier.
- Select the Cleansing activity, and click Create.
- Map your data source to the domains –
- Click Next, and Start to run the cleansing process.
Now your data is being sent to Melissa Data’s AddressCheck service for cleansing via DataMarket. Once completed click Next.
- Now you can manage and view the cleansing results. You can see that in my cleansing project Melissa’s Data Address check service completed missing postal codes, fixed address from North Fourth Street to the USPS format of N4th St and more.
Have fun with the Reference Data Service feature, we look forward to hearing your comments, suggestions, and bug reports.
The DQS Team