MDM and EAG and CDI Oh My! (Part 2)

MDM and EAG and CDI Oh My! (Part 2)

I covered customer master data in the first part because that’s generally both the most critical and most difficult problem to solve but any data that is going to be aggregated by a service may have issues with duplicates and key mapping.

As a simple example, take part numbers in a manufacturing system. One of my earliest jobs in the computer business involved stocking parts used for repairing computer hardware. Our part numbers had 8 digit numbers which seems very straightforward at first but There was a wealth of information about the part encoded in the part number. By looking at the number I could tell what kind of part it was and in many cases where it was made. Also, only seven of the eight digits were significant. The last digit was a checksum that ensured the number was valid.

As often happens, our company merged with another company. This company also had eight digit part numbers (if I remember correctly) but obviously the same part would have different numbers in the two systems and the same part number in the two systems would almost never refer to the same part.

While this was well before the advent of SOA and master data management, what on the surface looked to be a non-issue – both systems used eight digit part numbers – was in fact a huge problem that took years to resolve. Let’s look at some of the possible resolutions. One obvious solution would be to adopt on of the company’s numbering systems and convert the other company’s number to the new system. While this seems like a rational solution, in reality it would have been a disaster. Part numbers are used in hundred of different places – manufacturing systems, inventory systems, catalogues, repair manuals, and in my case a couple hundred envelopes in my car trunk. Those numbers obviously can’t all be changed instantaneously so there would have been a long transition period when nobody would know for sure whether a part number was the “old” number or a “new” number. This would lead to chaos. When I ordered a part to fix a machine, I would have a 50/50 chance at best of getting the part I needed. There would also be huge political repercussions from this kind of decision. One of the companies would have to be the “winner” of the part number war and the other one would obviously be the “loser”. Since in most mergers the people at the lower levels of the org chart are nervous and upset about the impact of the merger, being the loser in the part number wars would be a significant blow to already fragile egos.

The obvious solution is to force everybody to adopt a completely different part numbering system – maybe 9 digit numbers or a combination of letters and numbers so it’s instantly obvious whether the old number system or the new number system is being used. This solves the political issue by making everyone a loser. Not a great thing for the ego but at least somewhat fair. The transition from the old system to the neew system is accomplished by having everyone carry around a cross reference list for a couple years until all the systems are changed. This is a major burden and probably a huge productivity loss but it’s probably preferable to the huge waste caused by having the same part stocked under two different numbers because the two inventory systems can’t agree on a number.

While this is obviously a very painful process, it’s at least fairly straightforward to implement. Parts are physical things and it’s generally not hard to decide whether two parts are in fact the same. Imagine the pain when two companies merge and then try to merge their accounting systems that have different charts of accounts. Not only do the two companies have different account numbers for plane tickets but in one company plane tickets and rental cars are in the same account and in the other company they are in different accounts. These discrepancies have to be reconciled before the merged entity can even know for sure how much money it is making or losing. I still remember a CEO who talked at an early data warehouse conference complaining that his company’s seven divisions each sent him a financial statement every month that showed they were making a profit but the company as a whole was losing 20 million a year – figures don’t lie but liars figure. When the merged company finally agrees on the new chart of accounts, it may be almost impossible to compute performance trends because the historical numbers may not be convertible to the same structure as the current numbers.

As I said in the first part of this post, there are basically three approached to Master Data Management or Entity Aggregation:

1) Convert all the aggregated systems to use the same identifiers. This is the “everybody uses the same part number” approach. It is often the best approach and will generally produce the best results in the end but it may be difficult or impossible to implement. It generally means breaking every system that maintains or uses this data and can take years to accomplish. While the transition is underway, there can be huge productivity losses resulting from the chaos caused by changing everything. Even if you are willing to make the change and live with the pain, you may not be able to make it happen because some of the systems can’t be changed. Many third party system don’t have the flexibility required to make the change.

2) Create a “Master” copy of the data and map all the aggregated systems to the master copy. As I said, there are many products that will help you create this master copy. Some of the issues with this are reaching agreement on what should be included in the master data (a superset of all the attributes of all the systems involved is usually too unwieldy to be practical).and maintaining the master copy of the data once it is created. The master list will rapidly become useless if the participating systems are allowed to maintain their own data without also changing the master data and changes to the master data must be propagated to all copies of that data that exist. Again, making this alternative work is going to require some changes to all the systems that participate. The changes are significantly smaller and less disruptive but they are required. If all the systems are being wrapped with a service façade as part of a service oriented architecture project, this might be a good time to make the changes because as I said earlier, if the data identifier issues aren’t resolved up front, the SOA design will probably fail.

3) The last alternative is to leave the data where it is and maintain a map of the key values required to aggregate the data. This is usually the easiest approach to implement but is the least efficient at runtime. To retrieve a consolidated view of a customer for example, a customer service would have to query all the systems that contain copies of that customer’s data and aggregate the results on the fly. This approach also doesn’t result in a single version of the truth. As a simple example, if three different systems have different addresses for the same customer, what address will you display when someone asks for the customer address? In general, this approach solves the technical problem without considering the data quality problem that is generally the main reason for wanting a common view in the first place.

I’m sure there are other approaches which are either variants of these approaches or significantly different ways of solving the problems but these three should be enough to frame the discussion. A key thing to remember is that the technical issues of Entity Aggregation are significantly easier to resolve than the political and organizational issues. Choosing a mapping technology is relatively straightforward but deciding what the aggregated entity should look like may be a very intractable problems because if involves egos, power, prestige. If there are twenty different customer number formats used in your organization, chances are one of the formats will “win” and stay the same and 19 others will have to be converted or mapped to the winning format. If you work in an organization where that decision can be made purely on technical merit without any political wrangling involved, you are a lucky person indeed and most of use would like to know if you have any openings. Going into this process thinking that it’s a purely technical issue without understanding the people and organizational issues is a recipe for disaster.

If you have gone through this process successfully, I would like to talk to you and gather some best practices that we all can learn from. Let me know if you have something to offer.