EDA: Avoiding "coupling on the name"

A friend of mine, Harry Pierson, is a great thought-provoker.  I'm a big fan of thought provokers.  Pat Helland is another, as are David Chappell and Martin Fowler.  Harry has been asking me to make sure we build a layer of indirection into our message addressing system (which I agree with, but haven't been really thinking about).

So this morning, as I was just waking up (literally), I thought about my recent post on Event Driven Architecture and whether to use events or documents.  Both models share a key behavior: send out a message without any idea of who will pick it up. 

So what if no one picks the message up?  Is that an error? 

Let's say I have a system to handle a call center for financial services or telco.  When a customer calls on the phone and asks to be enrolled in "Heavily Advertised Program ABC," there may need to be three or four systems that interact to make that real. 

In an async EDA world, we want to disconnect the behavior of one system from another in time.  The orchestration can happen over time and location.  But the connected process has to occur based on an orchestration that is not on the local machine. 

Harry asked me if the sending app would have to know the identity of the receiving app, because it would be an error if the receiving app doesn't get the message.  Personally, I think this is coupling on the name.  The sending app knows the name of the receiver.

Harry asks me to consider using a 'logical name' of the receiver.  The sender contacts a logical end point, the addressing infrastructure turns that into a physical end point, and we still have decoupling. 

Honestly, I like it but I think it is insufficient.  What if we need to contact 20 downstream systems in a complex workflow, but I don't want a single "orchestration coordinator" to be a bottleneck (or single point of failure).  I don't want to hand the orchestration off from my app to a central orchestration hub. 

How about this: when I send a document, we begin a handshake between my app and a local agent. 

My app: I have a document.  Please tell me what you are going to do with it.

Local agent: Thank you for contacting your local routing agency.  Let me see your document.  (I look in my (cached) instructions for handling that document).  Here, I will return, to you, a routing slip.  Now you know everyone that will see your document (using logical names).  Error handling criteria is included in the slip, so you know what the infrastructure thinks an error is.  Is this list OK?

My app: (I examine at the routing slip.  It meets my criteria for handling).  I approve the transmission of my document, but I have marked up the error handling criteria... I want it to be an error if the 'logical order handler' doesn't pick up the document in 10 seconds.

Local agent: I'll get it started.  I'll assign your logical name to the message, and the endpoint for your callback service, so if someone wants more information, they know who to contact.  (Local agent sends both the message and the routing slip.  Agents on other systems follow the instructions in the routing slip, which may include calls to a dozen other systems along the way).

One thing that is not clear from this use case: The "approval" step is not required.  The sender (My app) could have chosen to simply trust the infrastructure in the first place.  That's a valid option.   

A couple benefits for this handshake:

  • The calling system doesn't know the names of all of the document's collaborators. It knows the logical name of one collaborator that it cares about (logical: orderhandler). 
     
  • The calling system gets local override over error handling criteria on a per-message basis.
     
  • The calling system doesn't know it's own name.  That comes from the infrastructure. 
     
  • The recipient system, when they want to call back for more information, have all the information in the document, as inserted by the infrastructure.  The recipient doesn't have to know the name of the 'sender of information' either... it comes in the message. 
     
  • Workflow coordination happens at the agent, not in a separate infrastructure.  Talking from point A to point B still involves one (and only one) message, not two as would be required in a typical hub-and-spoke model. 

Of course, these ideas are not new.  This is right out of the ESB playbook, and why shouldn't it be? 

The point is that the caller doesn't know much about the orchestration... preferably they know nothing at all, but they have the RIGHT to know if they want to, and there is no 'central authority' that slows you down or decides things for you.  There is an agent, working on your behalf, on your own system.  Under the agent's covers: .Net Workflow (WF). 

Note: to my friends who love Biztalk, we still need many of the capabilities of Biztalk... like transformation and the adapters to SAP and such.  The routing infrastructure can call Biztalk when it is important to do so.  When it is not important, it won't.

A management console is still needed for someone to manage routing information.  The local agents download updates when they get an event indicating that routing has changed.  The point is that the management point is not a bottleneck.  It is just an endpoint that also interacts on the same infrastructure.