Routing Service Features - Error Handling Part 2

Article
12/30/2009

Take a deep breath, this post is a bit of a doozy :)

In the last post I discussed the Error Handling capabilities of the Routing Service. In that post I showed how error handling can be added to lots of common scenarios in order to provide a more robust experience for clients. These scenarios are useful, but they don’t illustrate the strengths of the error handling capabilities, which is what happens when the Routing Service is dealing with more complex messaging patterns and capabilities, such as multicast, sessions, and transactions. In this post, we’ll examine some of these patterns and how the Routing Service deals with them when handling errors. Lets say you have the Routing Service configured as a part of the following scenario:

In this example, you have some inbound queue into which “work messages” are delivered (either by clients, other services, etc). You’re using the Routing Service to make sure that a record of that message is delivered to exactly one of your Service Queues (to make sure the work gets done) and to exactly one of your Logging Queues (to make sure the work is recorded). The routing configuration would look something like this (presuming all the queue endpoints had already been defined):

</filters>

</filterTable>

</filterTables>

</backupList>

</backupList>

</backupLists>

</routing>

What you don’t want is for work to be done that isn’t recorded somewhere (that’s how customers don’t get billed) or to record that work was done that actually wasn’t (that’s how customers get billed incorrectly and end up upset). Thus, this configuration, with the Routing Service should successfully move the message out of the inbound queue only if the action of inserting the message into exactly one of the Service Queues AND exactly one of the logging queues. In order to do this, the Routing Service utilizes both the Receive Context features of .NET 4 and transactions.

Receive Context is used to obtain a temporary lock (peek-lock) on the message inside the inbound queue so that other readers from the queue (potentially other Routing Services) don’t attempt to operate on the message while this Routing Service is. Next the Routing Service reads the message, and feeds it into the Message Filter Table defined here, which matches the message to the two filter entries (the primaryServiceQueue/backup list and corresponding logging endpoints). Because there were multiple matches, the message is multicast to both destinations (in this case the primaryServiceQueue and the primaryLoggingQueue). Because there are messaging guarantees that the Routing Service is trying to meet, it first creates a transaction. The Routing Service will attempt to do the necessary work inside this transaction, and then will only commit the transaction if everything succeeds.

When there’s nothing wrong, the result of placing a message in the inbound queue looks like this: ErrorHandling2.1

If we turn on tracing and get a look at the events that are coming out of the Routing Service we get a good idea of what’s going on:

So that’s the happy path, but what happens when the various queues are unavailable/unreachable or the Routing Service encounters errors when trying to insert the message into the destination queues (say they don’t exist anymore for some reason or that the destination host is unreachable)? Clearly if the primary service or logging queue is down, the Routing Service will use the backups instead.But what’s going on inside the Routing Service when this happens?

We see that when the initial send to the primary Service Endpoint fails, the Routing Service automatically knows that the current transaction is no longer any good, and thus it spins up a new transaction for sending the message to the backup service queue. When these sends succeed, the second transaction is committed. (Not pictured is the prior transaction being discarded). This pattern of discarding the current transaction, moving the message to a backup endpoint, and then creating a new transaction and trying again will be repeated until the message is successfully sent to all of the destinations indicated and the transaction is able to commit.

What about when the Routing Service is unable to deliver the message with the transactional guarantees in the way that it was configured to do so (say in a case where we run out of backup endpoints to send the message to)? In this case, the ultimate behavior is configurable as a setting with MSMQ, but a common action is to move the message out of the inbound queue and into a DLQ. In order to do this, the Routing Service, after trying to create and use a transaction to send the message to some functional combination of the specified destinations, calls Abandon on the Receive Context, indicating to the inbound queue that it was unable to process the message correctly.

ErrorHandling2.3

And here’s the trace:

This kind of advanced processing works even when there are sessions (or sets) of messages that are being processed. In this case, as the session of messages passes through the Routing Service, the Routing Service temporarily caches each message so that it can be replayed to the last destination it was successfully routed to. Only when all messages are successfully sent to a endpoint and successfully received can the transaction be committed and the messages marked as successfully sent.

Whew! So what does this all mean? This means that we’ve baked a lot of complex state tracking into the Routing Service, so that even when your message patterns, filter rules, and transactional/sessionful requirements change, we’ve got you covered on the robustness front. The Routing Service is going to take a lot of pains to deliver messages and handle this logic so that you don’t have to write it into your clients.

Routing Service Features - Error Handling Part 2

Additional resources