Canonical Model, Canonical Schema, and Event Driven SOA

One thing I've been thinking and talking about for the past few weeks is the relationship between four different concepts, a relationship that I didn't fully grasp at first but have become more convinced of as time wears on.  Those terms are:

  • Enterprise Canonical Data Model
  • Canonical Message Schema
  • Event Driven Architecture
  • Business Event Ontology

I understood a general relationship between them, but as time has passed and I've been placing my mind directly in the space of delivering service oriented business applications, the meanings have crystalized and their relationship has become more important.  First, some definitions from my viewpoint.

  • Enterprise Canonical Data Model - The data we all agree on.  This is not ALL the data.  This is the data that we all need to agree on in order to do our business.  This is the entire model, as though the enterprise had one and only one relational database.  Of course, it is impossible for the enterprise to function with a single database.  So, in some respect, creating this model is an academic exercise.  It's usefulness doesn't become apparent until you add in the following concepts, so read on.
     
  • Canonical Message Schema - When we pass a message from one application to another, over a Service Oriented Architecture or in EDI or in a batch file, we pass a set of data between applications.  Both the sender and the reciever have a shared understanding of what these fields (a) data type, (b) range of values, and (c) semantic meaning.  The first two we can handle with the service tools we have.  The third one is far and away the hardest to do, and this is where most of the cost of point-to-point integration comes from: creating a consistent agreement between two applications for what the data MEANS and how it will be used. 
     
  • Event Driven Architecture - a style of application and system architecture characterized by the development of a set of relatively independent actors who communicate events amongst themselves in order to achieve a coordinated goal.  This can be done at the application level, the distributed system level, the enterprise level, and the inter-enterprise level (B2B and EDI).  I've used this at many levels.  It's probably my favorite model.  At the application level, I once participated in coding a component that interacted in an EDA application that ran in firmware on a high-speed modem.  At the system level, I helped design a system of messages and components that controls the creation of enterprise agreements.  At the enterprise level, I worked for numerous agencies, in my consulting days, to set up EDI transactions to share business messages between different business partners.
     
  • Business Event Ontology -- A reasonably complete list of business events, usually in a heirarchy, that represents the points in the overall business process where two "things" need to communicate or share.  I'm not referring to a single event, but rather to the entire list.  Note that a business event is not the same as a process step.  An event may trigger a process step, but the event itself is a "notification of something that has occurred," not the name of the process we follow next.

I guess what escaped me, until recently, was how closely related these concepts really are.

The way I'm approaching this starts from the business goal:  use data to drive decisions.  Therefore, we need good data.  In order to have good data, we need to either integrate our applications or bring the data together at the end.   Either way, if the data is used consistently along the way, we will have a good data set to report from at the end. 

To create that consistency, we need the Enterprise Canonical Data Model.  Creating this bird is not easy.  It requires a lot of work and executive buy-in.  Note that the process of creating this model can generate a lot of heated discussions, mostly about variations in business process.  Usually the only way to mitigate these discussions is to create a data model that contains either none of the variations between processes, or contains them all.  Neither direction is "more correct" than the other.

However, in order to integrate the applications, either along the way or at the end of the data-generation processes, we need to use a particularly constrained definition of Canonical Schema: the Enterprise Canonical Message Schema is a subset of the Enterprise Canonical Data Model that represents the data we will pass between systems that many people feel would be useful. Note that we added a constraint over the definition above.  Not only are we sharing the data, but we are sharing the data from the Enterprise CDM. 

By constraining our message schema to the elements in the Enterprise Canonical Data Model, we radically reduce the cost of producing good data "at the end" because we will not generate bad data along the way.  The key word is "subset."  In order to create a canonical schema without a canonical data model, you are building a house on sand.  The CDM provides the foundation for the schema, and creating the schema first is likely to cause problems later.

Therefore, for my friends still debating if we should do SOA as a "code first" or "schema first" approach, I will say this: if you want to actually share the service, you have no choice but to create the service "schema first" and even then, only AFTER a sufficiently well understood part of the canonical data model is described and understood.

And for my friends creating schemas that are not a subset of the overall model, time to resync with the overall model.  Let's get a single model that we all agree on as a necessary foundation for data integration.

The next relationship is between the Canonical Message Schema and the Event Driven Architecture approach.  If you build your application so that you are sending messages, and you want to create autonomy between the components (goodness), you need to send data that has a well understood interpretation and as little 'business rule baggage" as you can get away with.  What better place than the Canonical Data Model to get that understanding?  Now, this is no longer an academic exercise.  Creating the enterprise level data model provides common understanding, so that these messages can have clear and consistent meaning.  That is imperative to the notion of Event Driven Architecture, where you are trying to keep the logic of one component from bleeding over into another. 

The business event ontology defines the list of events that will occur that require you to send data.  Creating an ontology requires that you understand the process well enough to generalize the process steps into common-held sharable events.  To get this, the data shared at the point of an event should be in the form of an Enterprise Canonical Message Schema.

Therefore, to summarize the relationship:

   Business Events occur in a business, causing an application to send a Canonical Message to another application.  The Canonical Message Schema is a subset of the Canonical Data Model.  Event Driven Architecture is most efficient when you send a Canonical Message Schema message between components.  This provides you with more consistent data, which is better for creating a business intelligence data warehouse at the end.

Some agility notes:

The list of business events in a prospect ontology may include things like "receive prospect base information", "receive prospect extended information", "prospect questionnaire response received", "prospect (re)assigned", "prospect archived", "prospect matched to existing customer", "prospect assigned to marketing program," etc. It is not a list of process steps.  Just the events that occur as inputs or outputs.

Clearly, this list can be created in iterations, but if it is, you need to make sure that you capture all of the events that surround a particular high level process and not just focus from technology.  In other words, the business processes of "qualify prospect" or "validate order" may have many business events associated with them, and those events may need to touch many applications and people.  If you decide to focus on "qualify prospect" first, then understand all of the events surrounding "qualify prospect" before moving on to "validate order," but if both processes hit your Customer Relationship Management system, focus on the process, not the system.