Using multiple routes in Service Broker

Article
07/14/2008

One of main Service Broker components is routing. Whenever you want your messages to leave the database they originate in, you need to provide routes. Setting up routes may become complicated, so if you're making your first steps in Service Broker area, I suggest staying within single database. Once you have an idea of how Service Broker conversations work, it's time to move one of the communicating services to a different database, or even different server. For that you'll need routes. For a syntax of route creation, see T-SQL reference at https://msdn.microsoft.com/en-us/library/ms186742.aspx. A route is basically a matching between logical conversation endpoint and an address of the machine that hosts the service. The logical endpoint may be specified in two ways:

By giving just the service name
By giving the service name and additionally a broker instance identifier, which ties the route to a specific instance of the service (broker instance ID is just a database identifier in broker terms, so in other words it specifies a database the service is deployed in).

When such mappings (in the form of routes) are defined, each time a message needs to be sent to the specified service (and optionally in the specified database), it will use the address provided.

In this post I would like to cover somewhat more advanced topic of using multiple routes for the same service name. There are two main reasons you would like to do so, namely:

Load balancing
High availability

I'll go over these two scenarios, describing what happens in each case and providing examples.

To understand the rest of this post, you need to be aware that a service of given name may exist in multiple databases. A pair <service name; broker instance ID> is required to uniquely identify service deployment in a database with this broker instance ID (broker instance ID is simply a guid; I'll just call it "broker ID" from now on). You can specify which instance of target service you want to talk to by providing target broker ID in the BEGIN DIALOG statement. But you may specify just the target service name and in such case the broker ID remains "open", i.e. messages are sent to whichever instance (with whatever broker ID) of the target service is known. Once an acknowledgement comes back from the target service, the target broker ID it carries is set in sys.conversation_endpoints table and all further messages sent from initiator are directed specifically to that broker ID. If a broker receives a message carrying broker ID other than its own, it simply drops it, concluding that it was probably meant for other instance of a service of the same name.

Load balancing

If no broker ID is specified in the BEGIN DIALOG statement and all matching routes specify broker ID, Service Broker will pick one of the available broker IDs from the routing table and direct messages to the chosen broker ID. To avoid distributing messages from a single dialog among different service instances, load balancing doesn't simply pick a route randomly every time it is needed. It employs a mechanism called deterministic routing: each time a message needs to be sent on a dialog started without specifying broker ID and all your routes to the target service are load-balancing routes (i.e. they contain broker IDs), Service Broker performs a hash of dialog ID and, based on that hash, picks one from the set of possible broker IDs. Dialog ID is a parameter that is available from the moment of dialog creation and stays the same during the whole lifetime of a dialog, so as long as the routing table doesn't change while conversations are active, every message of a given dialog will be sent to the same target service instance, because the same broker ID will be always picked up based on the dialog ID hash. Note that all this refers only to messages that are being sent before the first ack comes back from the target. When it does, the target's broker ID is locked at the initiator side and load balancing mechanism is no longer used in the sending process.

Load balancing example

It's now time for an example. Let's assume that we start our dialogs from InitiatorService on a machine named ServerA. The service is deployed in DatabaseA. For the sake of simplicity, let broker IDs in the example be equal to database names (in reality they would be GUIDs that have nothing to do with database names). The target of our dialogs is TargetService, which is deployed in two databases: DatabaseB located on ServerB, and DatabaseC located on ServerC.

For the load balancing mechanism to start working, you need to set up the following routes:

DatabaseA on ServerA:
CREATE ROUTE [LoadBalancingRoute1] WITH SERVICE_NAME = 'TargetService', BROKER_INSTANCE = 'DatabaseB', ADDRESS = 'tcp://ServerB:4022';
CREATE ROUTE [LoadBalancingRoute2] WITH SERVICE_NAME = 'TargetService', BROKER_INSTANCE = 'DatabaseC', ADDRESS = 'tcp://ServerC:4022';
DatabaseB on ServerB:
CREATE ROUTE [ReturnRoute] WITH SERVICE_NAME = 'InitiatorService', BROKER_INSTANCE = 'DatabaseA', ADDRESS = 'tcp://ServerA:4022';
DatabaseC on ServerC:
CREATE ROUTE [ReturnRoute] WITH SERVICE_NAME = 'InitiatorService', BROKER_INSTANCE = 'DatabaseA', ADDRESS = 'tcp://ServerA:4022';

Unless you have dropped the default ‘AutoCreatedLocal' routes in msdb's of all servers, this will suffice. If you followed the recommended practices and dropped the default routes, you'll need the following routes as well to close the routing loop:

msdb of ServerA:
CREATE ROUTE [LocalRoute] WITH SERVICE_NAME = 'InitiatorService', BROKER_INSTANCE = 'DatabaseA', ADDRESS = 'LOCAL';
msdb of both ServerB and ServerC:
CREATE ROUTE [LocalRoute] WITH SERVICE_NAME = 'TargetService', BROKER_INSTANCE = 'DatabaseB', ADDRESS = 'LOCAL';

The BROKER_INSTANCE parts of LoadBalancing1 and LoadBalancing2 routes are REQUIRED. Without them the load balancing mechanism won't work (I'll explain later what happens in such case). Note that I specified broker ID for return routes as well, even though there is only one instance of InitiatorService. It may seem redundant, but it is always a good thing to specify broker ID in a route if it is known at the time of route creation. This may save you from headaches in the future, when something changes in the way services are deployed.

Now, having these routes in place, each time you start a dialog in DatabaseA, one instance of the target service will be chosen randomly with even distribution among the provided TargetService instances and the dialog will be bound to that instance (to be specific, it's not the choice being made that is random - the randomness comes from random values of dialog IDs).

You've got your load balancing, but there are two gotchas you need to remember:

Of course you need to start your dialogs without specifying broker ID of the target service in the BEGIN DIALOG statement. If you do specify it, you're preventing the load balancing mechanism from doing its work of choosing the instance for you.
You cannot have any routes in DatabaseA that point to the target service and do not specify broker ID. Such routes have higher priority for dialogs started without specifying broker ID, so your load balancing routes won't be considered at all. For more information on route matching priority, take a look at https://msdn.microsoft.com/en-us/library/ms166052.aspx.

Unfair load balancing

Deterministic routing, explained before, is a reason why it is impossible to provide "uneven" load balancing, based on machines' processing power. One might think that if ServerC is two times more powerful than ServerB, it would be a nice idea to create routes as follows, doubling the chances of the fast machine for being picked up, and effectively making 2/3 of the traffic hit the more powerful server:

CREATE ROUTE [LoadBalancingRoute1] WITH SERVICE_NAME = 'TargetService', BROKER_INSTANCE = 'DatabaseB', ADDRESS = 'tcp://ServerB:4022';
CREATE ROUTE [LoadBalancingRoute2] WITH SERVICE_NAME = 'TargetService', BROKER_INSTANCE = 'DatabaseC', ADDRESS = 'tcp://ServerC:4022';
CREATE ROUTE [LoadBalancingRoute3] WITH SERVICE_NAME = 'TargetService', BROKER_INSTANCE = 'DatabaseC', ADDRESS = 'tcp://ServerC:4022';

Unfortunately, this won't work, as deterministic routing uses dialog ID hash to choose particular broker ID and not particular route, so the number of routes with the same broker ID doesn't change the probability of that ID being picked up.

High availability

As explained above, providing multiple routes for the same service instance doesn't influence the load balancing behavior in any way. But it is still an important scenario. Why would anyone want to create multiple routes to the same service instance? The answer is availability.

Imagine that you only have single instance of the target service (in DatabaseB on ServerB), but the traffic between ServerA and ServerB needs to go through one of two available forwarders (e.g. network boundary nodes). You don't care which forwarder your traffic goes through, but in the event one of them goes down, you would like the traffic to start flowing through the other one. Here's how Service Broker helps you achieving this functionality. When multiple routes match the target of a dialog, (and it is not a load balancing scenario described above), broker doesn't arbitrarily choose to use one of the routes. Instead it passes all matching routes to the underlying transport layer, which tries to deliver the message utilizing all the routing information it got. This may just mean sending the message on all the routes simultaneously, but may also be based on previous attempts to connect to given target, connection latency etc. You shouldn't assume anything regarding this behavior. It is unspecified and may change without any notice in the documentation. Treat it as a black box that knows what it is doing.

Wait a minute! So shouldn't all "matching routes" be passed to the transport also in load balancing scenario described before? Actually, that's something else. The catch is that load balancing chooses the target broker ID for a dialog first, so only the route with the chosen broker ID is considered a "matching route". If there are two or more routes with the chosen broker ID, they will indeed be all passed to the transport, even in a load balancing scenario.

High availability example

Let's go through an example of how to set up a high availability scenario. As mentioned before, now there is only one instance of TargetService, deployed in DatabaseB on ServerB. Let the two mentioned forwarder nodes be named GatewayA and GatewayB. ServerA and ServerB cannot directly communicate with each other (e.g. due to cross-domain trust relationship issues). The routes that need to be in place are as follows:

DatabaseA on ServerA:
CREATE ROUTE [HighAvailabilityRoute1] WITH SERVICE_NAME = 'TargetService', BROKER_INSTANCE = 'DatabaseB', ADDRESS = 'tcp://GatewayA:4022';
CREATE ROUTE [HighAvailabilityRoute2] WITH SERVICE_NAME = 'TargetService', BROKER_INSTANCE = 'DatabaseB', ADDRESS = 'tcp://GatewayB:4022';
DatabaseB of ServerB:
CREATE ROUTE [ReturnRoute1] WITH SERVICE_NAME = 'InitiatorService', BROKER_INSTANCE = 'DatabaseA', ADDRESS = 'tcp://GatewayA:4022';
CREATE ROUTE [ReturnRoute2] WITH SERVICE_NAME = 'InitiatorService', BROKER_INSTANCE = 'DatabaseA', ADDRESS = 'tcp://GatewayB:4022';
msdb of both GatewayA and GatewayB:
CREATE ROUTE [ForwardingRoute] WITH SERVICE_NAME = 'TargetService', BROKER_INSTANCE = 'DatabaseB', ADDRESS = 'tcp://ServerB:4022';
CREATE ROUTE [ForwardingReturnRoute] WITH SERVICE_NAME = 'InitiatorService', BROKER_INSTANCE = 'DatabaseA', ADDRESS = 'tcp://ServerA:4022';

If your gateway nodes serve as forwarders for multiple services, you may opt for defining more generic TRANSPORT routes on them and naming your services accordingly, so that you don't have to worry about providing connectivity for each specific service pair, thus decreasing the administrative burden.

For the example to work, you will also need the following msdb routes (that you may have already):

msdb of ServerA:
CREATE ROUTE [LocalRoute] WITH SERVICE_NAME = 'InitiatorService', BROKER_INSTANCE = 'DatabaseA', ADDRESS = 'LOCAL';
msdb of ServerB:
CREATE ROUTE [LocalRoute] WITH SERVICE_NAME = 'TargetService', BROKER_INSTANCE = 'DatabaseB', ADDRESS = 'LOCAL';

If, for some reason, you don't know target broker ID at the time of setting the routes at ServerA, it's OK, you may omit the BROKER_INSTANCE = ‘DatabaseB' part and both routes will still be passed to the transport. However, when creating multiple routes for the same service without specifying broker IDs in them, it is very important to make sure they all point to the same instance of the service, merely providing alternative ways of reaching it. Using multiple routes to different service instances without specifying broker ID for each of them may easily lead to problems such as multiple target endpoint creation (described in details below), so you should never do it. I really can't think of a scenario where it would be desired.

Multiple target endpoint creation

Let's see how such multiple target endpoint situation could happen. Imagine that in the initiator database you have routes to two different instances of TargetService (on ServerB and ServerC), but you don't specify broker ID for them, just the service name and address.

Once you begin your dialog and send the first message, both these routes are passed to the transport layer and let's say it sends the message on both of them.
The message hits ServerB first and a dialog endpoint is created there (an entry in DatabaseB.sys.conversation_endpoints shows up).
After some time, the broker in DatabaseB sends an acknowledgement back. The acknowledgement contains the broker ID of the database hosting TargetService on ServerB (i.e. DatabaseB).
After a little while (connection to ServerC might have taken longer to be established) the message transmitted in step 1 hits ServerC and a dialog endpoint is created there as well. ServerB and ServerC don't know anything about each other.
ServerC sends an acknowledgement cointaining its own broker ID.

If the business logic that processes first message of a dialog triggers some external action that should be carried out only once for each dialog, you've already run into problems, because it has been executed twice. But let's see what may happen next.

The ack from ServerB arrives at ServerA and locks the target broker ID for the conversation to DatabaseB.
The ack from ServerC arrives as well, but its broker ID doesn't match the one already fixed for the dialog, so an error is sent back to ServerC.
Now, InitiatorService tries to send second message on the dialog, again both routes are passed to the transport layer, but the transport layer might keep choosing the one to ServerC (perhaps it thinks that ServerB is inaccessible or slow). The message now carries the broker ID of DatabaseB (since the first ack locked it) so ServerC keeps dropping it and the broker conversation cannot continue.

Well, the transport logic will probably try ServerB eventually, but anyway that's certainly not a behavior one's looking for. Note that we didn't provide any matching in the routes between target server addresses and broker IDs, so there is no way for Service Broker to act smart in this case.

So how come this problem doesn't occur in a load balancing scenario? Because of how deterministic routing works. As long as the routing table doesn't change while conversations are active, each message of a given dialog will be sent to the same TargetService instance, so the risk of creating multiple target endpoints is avoided. What if you cannot avoid changing routes when new conversations are being started and you really care not to fall into the multiple target endpoints scenario? Well, you'll have to implement some kind of a three-way handshake and defer executing any business logic at the target side until you get second message from the initiator, because receiving it means that it's your broker ID that has been saved in initiator's sys.conversation_endpoints table.

As you can see, it's always a good idea to provide broker IDs in created routes. The only exception is a situation when the initiator server doesn't know about target server location, number of instances and broker IDs of the target service. In such case there is usually a dedicated node in the topology that takes care of routing, load balancing etc. It is justified to have a route without broker ID in the initiating database, which would delegate all the messages to the intermediate node for processing. Setting the broker ID by the initiator might for example prevent that node from doing load balancing on its own or choosing an appropriate target service instance based on some business logic.

Putting it all together

Finally, let me just quickly mention that it is also possible to combine the two multiple route features: load balancing and high availability. Hopefully that's obvious at this point, but let me just provide a short example of how the routes in the initiator database would need to be created. In this example we'll have two TargetService instances, just as in the load balancing example, but access to each one will be available via two dedicated forwarders: GatewayA, GatewayB for accessing ServerB, and GatewayC, GatewayD for accessing ServerC. The routes in DatabaseA will have to be created as follows:

CREATE ROUTE [LoadBal1Fwd1] WITH SERVICE_NAME = 'TargetService', BROKER_INSTANCE = 'DatabaseB', ADDRESS = 'tcp://GatewayA:4022';
CREATE ROUTE [LoadBal1Fwd2] WITH SERVICE_NAME = 'TargetService', BROKER_INSTANCE = 'DatabaseB', ADDRESS = 'tcp://GatewayB:4022';
CREATE ROUTE [LoadBal2Fwd1] WITH SERVICE_NAME = 'TargetService', BROKER_INSTANCE = 'DatabaseC', ADDRESS = 'tcp://GatewayC:4022';
CREATE ROUTE [LoadBal2Fwd2] WITH SERVICE_NAME = 'TargetService', BROKER_INSTANCE = 'DatabaseC', ADDRESS = 'tcp://GatewayD:4022';

Now, when we begin a dialog and send a message to TargetService, one of the two instances of the service will be chosen in the load balancing process, based on dialog ID hash, as described before. But since two routes match the chosen broker ID, they will both be passed to the transport layer and it will do its heuristic to determine which forwarder to use for sending the message. The multiple target endpoint creation problem doesn't exist in this case. Dialog ID is established between initiator and target and not changed by any forwarding machine, so whichever gateway a message will flow through, it will carry the same dialog ID. Therefore even if the same message is sent to the target server from both gateways, it will be able to recognize that it is the same dialog and receive the message that arrives first, treating the other one as a duplicate, thus dropping it.