Fuzzy Grouping is a component which, when used as part of a data flow in an SSIS (SQL Server Integration Services) package, enables you to identify groups of records in a table where each record in the group potentially corresponds to the same real-world entity. The grouping is resilient to commonly observed errors in real data, because records in each group may not be identical to each other but are very similar to each other.
As you can imagine, the functionality provided by the complex and highly efficient algorithms implemented in this transformation can help you save a huge amount of time in data cleaning tasks. You just should know that in order for the SSIS runtime to let you run these powerful Fuzzy Grouping transformation, it requires SQL Server to be running an Enterprise Edition.
Trying to run it in any non-Enterprise edition will cause the SSIS runtime to throw error 8228 (The task "%1!s!" cannot run on installed %2!s! of Integration Services. It requires %3!s! or higher.) There is an exception to this rule though: if you run any package from the IDE (BIDS) in debugging mode, it will run no matter what edition of SQL is installed in the machine from where the transformation is being invoked. It has been deliberately decided to make it work that way so that developers don’t need to run an Enterprise edition in their development environments.
This post is part of the a series I decided to call “Understanding the value of the Enterprise Edition, one feature at a time” which I started in September 2011 and will grow on a weekly basis, having a new post incorporated to the family every Friday.