Poison Message Handling

I've got a few posts on queued and durable messaging coming up over the next few weeks, and we're going to need some vocabulary for those posts that hasn't been used yet while talking about web services. Today's article covers general background around the concept of "poison" messages.

Web services without durability or reliability make no guarantee about preserving messages. When failure occurs during message processing, the web service may send back a fault describing that failure, but the original message that caused the fault is destroyed. You can layer some reliability on top of such a messaging system by making buffered copies of messages and using acknowledgments to indicate that processing is complete and the buffered copy can be destroyed.

Buffering in memory doesn't really provide any durability because memory is a transient store. There's still no actual guarantee here that messages will be delivered.

Now, suppose that the individual messages have a lot of value. The value could be an economic value, but the type of value isn't important for this description. We want to be rigorous now about making delivery guarantees to preserve that value. One way to implement the guarantee is to have a permanent, durable store and some atomic way of linking successful message processing together with deleting the message from the store. Let's call those pieces a queue and a transaction.

There is a new problem with the durable service that the non-durable service did not face. In the error-handling case, we have unsuccessful message processing and therefore we do not delete the message from the store. The message will be picked out of the store again in the future to retry processing. If this was a transient processing error, then that behavior is exactly what we want. If this was a permanent processing error, perhaps because the message was malformed, we are going to be locked in a futile cycle of retrieving the message and unsuccessfully processing it. A lot of processing time is wasted making no progress.

Poison messages are the idea of these permanently unprocessable messages. We need to take the poison message out of the queue and apply some strategy to it. A typical solution is to move the poison message to some other queue, where it will not be tying up the processing time of our main loop. Next time, we'll look at some of the options for poison message strategies used by MSMQ.

Next time: MSMQ and Poison Messages