Domain Events vs. Integration Events in Domain-Driven Design and microservices architectures


image

This blog post is about comparing several approaches of Domain Events vs. Integration Events patterns already published by the community. I might evolve this post depending on feedback and some implementations we’ll be doing in the short/medium term. So, feel free to discuss about it with comments at the end of this post. I’m open to variations. 🙂

When discussing about “Domain Events” there’s certain lack of clarity because sometimes you can be talking about Domain Events that happened but within the same transaction scope of the operation that is still in memory objects (so I fact, it still didn’t happen from a persistence point of view) or, on the other hand, you could be talking about events that were already committed in your persistence storage and must be published because of state propagation and consistency between multiple microservices, bounded-contexts or external systems. In the latter case, it really has happened in the past, from an application and persistence point of view.

The latter type of events can be used as “integration events” because its related data was persisted for sure, therefore, you could use any implementation of a EventBus.Publish() and publish asynchronously that event to any subscriber whether the subscriber is in-memory or is remote and out-of-process and you are communicating through a real messaging system like a Service Bus or any queue based system that supports a Pubs/Subs model.

Let’s start with the first model, though. Kind of in-memory/transactional Domain Events.

Domain Events

That mentioned difference is really important because in the first model (pure Domain Events) you must do everything in-process (in-memory) and within the same transaction or operation. You have to raise the event from a business method in a Domain Entity class (An AggregateRoot or a Child Domain Entity) and handle the event from multiple handlers (anyone subscribed) but all happening within the same transaction scope. That is critical because if when publishing a domain event, that event was published asynchronously in a “fire and forget” way to somewhere else, including the possibility of remote systems, that event propagation would have occurred before committing your original transaction, therefore, if your original transaction or operation fails, you will have inconsistent data between your original local domain and the subscribed systems (potentially, out-of-process, like another microservice).

However, if those events are handled only within the same transaction scope and in-memory, then, the approach is good and provides a “fully encapsulated Domain Model”.

In short, in-memory Domain events and their side effects (event handler’s operations) need to occur within the same logical transaction.

This model was clearly exposed by Udi Dahan, in several blog posts:

http://udidahan.com/2009/06/14/domain-events-salvation/

http://udidahan.com/2008/08/25/domain-events-take-2/

http://udidahan.com/2008/02/29/how-to-create-fully-encapsulated-domain-models/

Also in this good blog post from Jimmy Bogard:

https://lostechies.com/jimmybogard/2010/04/08/strengthening-your-domain-domain-events/

And if I remember well, Vaugh Vernon’s implementation of Domain Events is pretty similar.

You can have differences in the DomainEvents class implementation. It could be a static class like in Udi’s approach or you could use an IoC container (like Autofac, etc.) to get the DomainEvents class and have that class wired up for in-proc communication. But the important point here is that you raise the events within the Domain entities before persisting the states, therefore, all the event-handlers must be executed within the same transaction or you could get inconsistent states.

For instance, from Udi’s example using a Domain entity:

public class Customer

{

public void DoSomething()

{

//Operations to change Customer's state to preferred (in-memory)

//...

//Raise the Domain event

DomainEvents.Raise(new CustomerBecamePreferred() { Customer = this });

}

}

Or in Jimmy Bogard’s initial example (before his other evolved pattern) using this code snippet with another domain entity:

public Payment RecordPayment(decimal paymentAmount, IBalanceCalculator balanceCalculator)

{

var payment = new Payment(paymentAmount, this);

_payments.Add(payment);

Balance = balanceCalculator.Calculate(this);

if (Balance == 0)

DomainEvents.Raise(new FeePaidOff(this));

return payment;

}

In both cases, you raise the event within the domain entity which, at that point, is just an in-memory object still not persisted. Therefore, whatever happens with the event-handlers subscribed to that event, must happen within the same group of operations or transaction. This is why that event should not be an integration event based on asynchronous messaging (like queues or a Service Bus) unless somehow you can make it transactional, because the event handlers could be propagating events that were still not persisted in a database and if your original transaction fails, you would have inconsistency between the remote systems (call it microservices, bounded-contexts or remote sub-systems or even different applications) and your original domain model.

When handling the event, any event handler subscribed to the event could run additional domain operations by using other AggregateRoot objects, but again, you still need to be within the same transaction scope.

public class FeePaidOffHandler : IHandler<FeePaidOff>

{

private readonly ICustomerRepository _customerRepository;

public FeePaidOffHandler(ICustomerRepository customerRepository)

{

_customerRepository = customerRepository;

}

public void Handle(FeePaidOff args)

{

var fee = args.Fee;

var customer = _customerRepository.GetCustomerChargedForFee(fee);

customer.UpdateAtRiskStatus();

}

}

Therefore, domain events, as a pattern, is not always applicable. But for in-memory event based communication across disconnected aggregates that are part of the same domain model and part of the same transaction, domain events are great ensuring consistency across a single domain model within the same microservice or Bounded-Context.

Two-phase Domain events (a la Jimmy Bogard)

Steve Smith also uses this method in some examples posted at GitHub and his blog.

Note: I called this “Two phase” domain events because clearly is has two decoupled phases. But it has nothing to do with “Two Phase Commit Transactions based on the DTC, etc..”…

So, Jimmy Bogard wrote an interesting blog post (A better Domain Events pattern) where he describes an alternate method to work with Domain Events where it still means event side effect operations within the same transaction scope or group of operations in-memory, but using an improved way which decouples the raise event operation from the dispatch operation.

He starts highlighting the issues of the original approach saying that the DomainEvents class dispatches to handlers immediately. That makes testing your domain model difficult, because side effects (operations executed by event handlers) are executed immediately at the point of raising the domain event.

But, what if instead of directly raising the domain events from the domain entities (remember that in that case the events will be handled immediately) you were recording the domain events happening, and before committing the transaction, you dispatch all those domain events at that point?

You can see now that from within the entity class you are not raising/publishing domain events but only adding events to a collection within the same scope.

public Payment RecordPayment(decimal paymentAmount, IBalanceCalculator balanceCalculator)

{

var payment = new Payment(paymentAmount, this);

_payments.Add(payment);

Balance = balanceCalculator.Calculate(this);

if (Balance == 0)

Events.Add(new FeePaidOff(this));

return payment;

}

The Events collection would be a child collection of events per Domain Entity that needs events (usually AggregateRoots are the entities that need events, most of all).

Finally, you need to actually fire off these domain events. That is something you can hook into your infrastructure layer of whatever ORM we’re using. In Entity Framework, you can hook into the SaveChanges method, as the following:

public override int SaveChanges() {

var domainEventEntities = ChangeTracker.Entries<IEntity>()

.Select(po => po.Entity)

.Where(po => po.Events.Any())

.ToArray();

foreach (var entity in domainEventEntities)

{

var events = entity.Events.ToArray();

entity.Events.Clear();

foreach (var domainEvent in events)

{

_dispatcher.Dispatch(domainEvent);

}

}

    return base.SaveChanges();

}

In any case you can see that you’ll be dispatching the events right before committing the original transactional operations using EF DbContext SaveChanges(). That is why those events and their event-handlers still need to be running in-memory and as part of the same transaction. Other than that, if you were sending remote and asynchronous events (like integration events) but “SaveChanges()” fails, you would have inconsistent data between your multiple models, microservices or external systems.

You could use this approach for messaging (out-of-proc communication) only in the case where you are publishing/dispatching messages through some transactional artifact, like a transactional queue, so if your original .SaveChanges() fails you would take the event-message back from the transactional queue before it is processed by other microservices or external systems.

In any case, the beauty of this approach compared to regular domain events is that you can decoupled the raising of a domain event using Events.Add(), from dispatching to a handler.

Integration Events

Dealing with integration events is usually a different matter than using domain events within the same transaction. You might need to send asynchronous events to communicate and propagate changes from one original domain model (the original microservice or original bounded-context, for instance) to multiple subscribed microservices or even external subscribed applications.

Because of that asynchronous communication, you should not publish integration events until you are 100% sure that your original operation, the event, really happened in the past. That means it was persisted in a database or any durable system. If you don’t do that, you can end up with inconsistent information across multiple microservices, bounded-contexts or external applications.

This kind of Integration Event is pretty similar to the events used in a full CQRS with a different “reads-database” than the “writes-database” and you need to propagate changes/events from the transactional or “writes-database” to the “reads-database”.

A very similar scenario would be when communicating out-of-process microservices, each microservice owning its own model, as it is really the same pattern, as well.

Therefore, when you publish an integration event you usually would do it from the infrastructure layer or the application layer, right after saving the data to the persistent database, for instance, Greg Young does it that way in his simplified CQRS+EventSource example:

https://github.com/gregoryyoung/m-r/blob/master/SimpleCQRS/EventStore.cs

public class EventStore : IEventStore

{

//Additional parts of the EventStore...

//...

public void SaveEvents(Guid aggregateId, IEnumerable<Event> events, int expectedVersion)

{

//Validations and obtaining objects, etc...

// iterate through current aggregate events increasing version with each processed event

foreach (var @event in events)

{

i++;

@event.Version = i;

// push event to the event descriptors list for current aggregate

eventDescriptors.Add(new EventDescriptor(aggregateId,@event,i));

                // publish current event to the EventBus for further processing by subscribers

_publisher.Publish(@event);

}

}

}

Independently of this CQRS example using also EventSourcing as the persistence artifact, you can see how when publishing the integration events with EventBus.Publish() it is done as the last infrastructure operation, after making sure your data is persisted in the original system.

Even the communication Pub/Subs artifact should probably be named different. When propagating state changes across multiple bounded-contexts or microservices that feels more like an EventBus and the EventBus.Publish(IntegrationEvent).

However, when using in-proc Domain Events within the same microservice or bounded-context (the first scenario in this post), if feels better to name the communication artifacts as DomainEvents.Add() and then simply Dispatcher.Dispath(DomainEvent).

Sure, you could argue that the last method proposed by Jimmy Bogard using the EF SaveChanges() looks pretty similar to this SaveEvents(). The difference is that when using the override SaveChanges() the last operation is base.SaveChanges(), so you shouldn’t publish integration events (async messaging) before making sure that your persistence was properly done (unless that publish() action is also transactional).

Also, in regards the domain events, if you call base.SaveChanges() before dispatching the events then they won't participate in the transaction and this defeats the whole purpose of Domain Events. In that case, you'd have to worry about what happens within the same domain when the event's transactions fail. This is why integration events or committed events are a separate story.

Converting Domain Events to Integration Events

Of course, you might want to propagate some Domain Events to other microservices or bounded-contexts. The point is that you’d convert the domain event to an integration event (or aggregate multiple domain events into a single integration event) and publish it to the outside world after making sure that the original transaction is committed, after “it really happened” in the past in your original system, which is the real definition of an event.

How to maintain consistency between microservices when failures happen before publishing the integration events

There are multiple ways to make sure that your aggregate's saving operation and the event being published are assured in the case of failures, because, make no mistake about it, in a distributed system failures will happen so you need to embrace them.

Some possible approaches depending on your application are the following:

  1. Using a transactional database table as a message queue that will be the base for an event-creator component that would create the event and publish it.
  2. Using transaction log mining
  3. Using the event sourcing pattern

Probably, using the event sourcing pattern is one of the best approaches, however, you might not be able to implement a full event sourcing in many application scenarios.

The transaction log mining might be too coupled to the underneath infrastructure. In some scenarios you don't want that. It is complicated under the covers and very much coupled to the infrastructure, in this case to the SQL infrastructure, probably.
I believe a good way to solve this issue is with eventual consistency based on states like “ready to publish the event” state that you set in the original transaction (and commit) and then the try to publish the event to the Event Bus (based on queues or whatever). Then, if that succeeds (the Service Bus transaction worked, like sticking the event in a queue and committing), you start another transaction in the source/original service and move the state from “ready to publish the event” to “event already published”.

If the “publish event” action fails, data won’t be inconsistent within the original microservice because it is still marked as “ready to publish the event”, and in regards the rest of the services, it will be eventually consistent, as you can always have background processes checking the state of the transactions and if it finds any “ready to publish the event”, it will try to re-publish that event into the Event Bus (queue, etc.).

You can see this approach can be based on a simplified event-sourcing system as you need a list of integration-events with their current state (ready to publish vs. published) but you just need to implement a simplified event-sourcing for this. You might not need to store "everything" as events in the transactional database, you just need to store the integration-events per microservice.

An important aspect is that failures at any point in the communication should cause the message to be retried (retries with exponential backoff, etc.), or the background process searching for “ready to publish the event” states might try to publish an event which was being published at the same time, like in a race condition, so you also need to make sure it’s either idempotent or carries enough information to ensure that you can detect the duplicate and discard it and send back the same response you already sent.

Summary

Therefore, the global approach I like to do is the following:

Domain Events: you add or raise domain events from the domain entity classes (with or without using the “two phase” approach from Jimmy, which is better) only for intra-domain communication, when the domain events scope is purely within the domain itself, like within a single microservice’s domain.

Integration Events: You publish integration events using an EventBus.Publish() from outside of the domain entity classes (from the infrastructure or even the application layer like the Command-Handlers) when you need to propagate state changes across multiple microservice or out-of-process systems.

Conclusion

Basically, by differentiating between Domain Events and Integration Events you can solve the issue of dealing with transactions since domain events are always scoped within a transaction but integration events (using an EventBus.Publish()) are only published to the outside world if the transaction was committed successfully. By doing this you can be sure that other domain-models, microservices and external systems do not react on something that in fact has rolled back and does not exist anymore. You make sure about consistency across out-of-proc systems, like microservices.

Using Domain Events for the internal domain events and for integration between out-of-proc systems

As another possible approach, when you raise a domain event, it would also be queued to a CommitedEvent Queue once the transaction was successfully finished. Then, you would have 2 types of handlers. DomainEventHandler and CommitedEventHandler.

You could extend any of those classes. If you need to handle the event immediately (internal domain operation), you would use a "normal" DomainEventHandler. If you need Commited data (ex: your handler relies on queries over stored data) you use a CommitedDomainEventHandler which would be pretty similar to handling an integration event.

At the end of the day, it is pretty similar to having domain events and integration events but using the same definition of event going through different channels depending on the state (committed vs. not committed event).

Comments (9)

  1. Interesting and related discussion:
    Distinguishing between internal and external events
    https://groups.google.com/forum/m/#!topic/dddcqrs/4t_2i5xnuBU

  2. Christian Johansen says:

    Isn't publishing a Domain event before the changes to the aggregate raising the event has been persisted going to cause violation of the 'only modify one aggregate pr transaction' rule?

    1. @Christian - Domain events are events to be raised usually within the same transaction. You might want to do multiple operations or none.
      Also, the main rule about Aggregates is that they are a consistency boundary, but you could also have several consistency boundaries within the same transaction, like subsets.
      Having the same transaction for several Aggregates is not a bad thing. Usually, one transaction will modify a single Aggregate, but it doesn't have to be always like that.
      From my point of view an Aggregate must be defined based on the minimum transaction boundary that must happen in order to maintain consistency, but not the maximum transaction boundary that sometimes you could have across multiple aggregates.

      On the other hand, Integration Events

  3. Niko Will says:

    What happens if the current service crashes after the event was persisted to disk but before the event was published to the messaging system to keep other services in sync?

    1. One way to avoid that possibility is to have a transactional queue or any other transactional EventBus implementation and to include the "EventBus.Publish()" based on that transactional implementation into the same transaction that was persisting the original data to "disk".
      For instance, using Azure Service Fabric Queues and Stateful Services for data, that could be done, as both, SF queues and SF stateful services are transactional.
      If your queue or any EventBus implementation is not able to extend the transaction originated from the "original data", you'll need to explore additional approaches, like:

      1. Using a transactional database table as a message queue that will be the base for an event-creator component that would create the event and publish it.
      2. Using transaction log mining: http://www.scoop.it/t/sql-server-transaction-log-mining
      3. Using the event sourcing pattern: https://msdn.microsoft.com/en-us/library/dn589792.aspx

      1. Niko Will says:

        Just for the record, transactions are not the only way to avoid inconsistencies: https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-2-richardson

        1. I agree. I should have said "one way" instead of the only way.
          I updated my comment plus adding the other viable options:
          1. Using a transactional database table as a message queue that will be the base for an event-creator component that would create the event and publish it.
          2. Using transaction log mining
          3. Using the event sourcing pattern

          The link you provided from Infoq is very complementary to my post. Thanks! 🙂

          1. In regards this point, I believe that making it transactional makes your solution too coupled to the infrastructure you might be using that might or might not support transactions. Creating other "in-the-middle" solutions like a temporal transactional table or log mining also makes it over-complicated and again coupled to the infrastructure..
            I believe a good way to solve this issue is with eventual consistency based on states like “ready to publish the event” state that you set in the original transaction (and commit) and then the try to publish the event to the Event Bus (based on queues or whatever). Then if that succeeds (the Service Bus transaction worked, like sticking the event in a queue and committing), then you start another transaction in the source or original service and move the state from “ready to publish the event” to “event already published”.
            If the "event publish" fails, data won't be inconsistent within the original microservice because it is still marked as “ready to publish the event”, and in regards the rest of the services, it will be eventually consistent, as you can always have background processes checking the state of the transactions and if it finds any “ready to publish the event”, it will try to re-publish that event into the Event Bus (queue, etc.).
            An important aspect is that failures at any point in the communication should cause the message to be retried (retries with exponential backoff, etc.), or the background process might try to publish an event which was being published at the same time, like in a race condition, so you also need to make sure it’s either idempotent or carries enough information to ensure that you can detect the duplicate and discard it and send back the same response you already sent.

  4. Interesting article and a lot of food for thought. Thanks for taking the time to post.

Skip to main content