Choosing MDM Hub styles

Choosing MDM Hub styles

A couple weeks ago, someone asked me how to choose which MDM hub style would work best for an application. I thought I had covered this in one of my white papers but I couldn’t find a good reference to give him so I thought I would write up something here. To review what I’ve cover elsewhere, there are basic three types of MDM hubs:

· Registry – the hub doesn’t contain the actual master data. It contains links to where the master data exists in the source systems. In most cases, the link takes the form of the primary key and system name of the source system.

· Repository - the master data is actually moved from the source systems to the MDM hub and the source systems are rewritten to get their master data from the MDM hub instead of from their local database. Mapping to the source systems isn’t required because the master data isn’t stored in the source system. This style is often called Transactional.

· Hybrid - as the name implies, hybrid is a combination of the other two styles. The hub contains references to the master data entities in the source systems but also contains the shared portion of the master data. This means it can supply links to source records when required and also serve as the master data source for new applications.

So which style should you use?

Repository

The repository style seems like the best option. There are no synchronization or latency issues with updates getting propagated to multiple copies of the master data. There are no update conflicts caused by updates to more than one copy of the master data. In general, a single copy of the master data is significantly easier to manage and will generally be of a higher quality than multiple copies with all the potential synchronization and mapping issues. On the other hand, if we look at what is required to get a repository style hub up and running, you may see why this style isn’t very common:

1. Decide on a common data model for all applications – this will be a difficult task both politically and technically.

2. Transform and load all the current databases into the hub, removing duplicates in the process.

3. Change all your applications to use the new master data tables and database. This can be a huge effort. If your current applications use a variety of databases you will need to deal with multi-database distributed transactions. If you use purchased applications, you may not have the source to change the application to use the new data source and even if you do, you are likely to run into support issues.

4. Figure out how to handle history – you are changing your databases to use a new key for all you master data so you have to deal with many years of history that was created using different keys for the master data. In many cases you will need to create the same kind of key mapping that the other two MDM styles require to be able to access history records.

In many cases, this process is too difficult or too expensive to provide a significant return on investment and even if it is justified, it can take many years to make the transition so the Repository style of MDM hub may not be suitable for many projects.

 

Registry

The Registry style hub is attractive because it’s generally fairly quick to implement and avoids some of the political issues around a common data model. Because only pointers to records are stored, there is no need to agree on a common data model. There is also less need for a data quality program because the data is left in the source systems. To be clear, it’s probably not possible to create a pure registry style MDM hub. One of the main things this hub is used for is mapping duplicate records in the source systems to a single record in the hub. In order to do this, each record must be matched on a set of attributes to determine if it is a duplicate of a record already in the hub. For example, customers would probably be matched on name and address and products might be matched on descriptions and dimensions. If you want to avoid searching every database in every source system when a new record is added to the hub, you will need to keep the matching attributes for each master record in the hub so you can tell whether in incoming record is a duplicate of one of the hub records. This matching won’t work reliably unless the attributes stored in the hub are accurate and high quality so you will probably have to do a significant amount of data quality work to ensure the address is right and in a common format and maybe even enriching the attributes with data from an external source like Dunn and Bradstreet. Once you have all this established, you’re a significant way down the road toward creating an hybrid hub so you can consider a registry hub to be a hybrid hub that’s not done yet.

The biggest disadvantage of the registry style MDM hub is that while it helps you find all the duplicate and inconsistent copies of your master data, it doesn’t give you much help in cleaning them up. If Roger Wolter has 3 records in the ERP database, 6 records in the CRM database and 2 records in the customer support database, and among the 11 copies there are 4 phone numbers, a registry hub will tell you where all the records are but won’t help you get them to agree on a phone number.

Hybrid

The Hybrid style of MDM hub has some of the attributes of both the Registry and Repository styles. Like the Registry style, the Hybrid style maintains links to the copies of a master data record in the source system so you won’t have to replace the master data access parts of all your applications. Like the Repository style, the Hybrid style maintains the shared part of the master data in the MDM hub so that you can improve its quality and enrich its content in a single place. Thus the advantage of the Hybrid approach is that it provides a single, authoritative source for shared master data without the necessity of changing all your applications to use it.

The most significant disadvantage of the Hybrid style is that keeping the MDM hub copy of the data synchronized with all the source systems can be a complex process. If you allow all the source systems to change master data, you will have a continuous data integration problem caused by incompatible changes coming from different systems. You can reduce this problem by requiring changes to the master data to be made only to the copy in the MDM hub but this may be difficult to implement and enforce. Also, keep in mind that MDM synchronization is more complex than data replication because the data may have to be transformed both when it is loading into the hub and when it is sent from the MDM hub back to the source systems because the data models of the source systems may all be different.

Conclusion

So what’s the best choice for you? As in everything – it depends. Moving from Registry to Hybrid to Repository style increases cost and complexity but also increases usefulness and data quality so you have to pick the solution that provides the data quality you need in a timeframe and a budget that you can afford. My recommendation is usually the Hybrid approach. The Registry approach is relatively simple and quick to implement but few users will be satisfied with the data quality it provides over the long run. The Repository style is generally too hard to do and too expensive for most companies even though it provides the best data quality. Hybrid implementations can evolve over time. You might start with a minimum number of attributes for each entity stored in the MDM hub so it is pretty close to being a Registry Style hub and then over time, as your needs change and your MDM data management and stewardship capabilities improve you gradually add attributes until the MDM hub is a complete source of master data. At this point, new applications can start using the MDM hub directly for their master data so the hub evolves gradually toward the Repository style. While not too many people will be able to move completely to the Repository style, eventually it may become the predominant approach for applications as the old apps are replaced.