Neglect

Article
09/08/2004

Okay, I have been neglect and am trying to remedy this. I have a couple of papers I am working on which will get posted here very shortly. Hopefully one will be here by end of the week. I am also busy at work on BTS stuff trying to make your life easier (I hope). Finally I was away visiting some customers and getting some good real world experience. I have actually visitted lots of customers but it is always good to stay knowledgable because it is too easy to get it the glass ball world over here and forget about everyone who really uses this stuff and the problems they face. Never want to have that happen.

So first off, I just read a write up from Charles Young (who I have now added to my links) on how the subscription routing mechanisms work. Very impressive stuff. He even got some of the sequence of stored procedures being called down. I think it is maybe a bit more technical than a lot of you need, but hey, if you are reading my blog, you must be into this kind of torture, so I would check it out. Good stuff.

Now let's pick a topic to chat about. I have seen some of the other blogs out there and, well, I am sorry mine is not formatted so nicely. Basically you get me at 11:30 pm feeling like I need to shed some insight and just trying to brain dump. So I guess you will bear with me. Hows about debugging routing failures. Not exactly rocket science to the bts experts, but probably a usefull topic.

What is a routing failure: Routing failures occur when messages are published into the messagebox, but no service is found to have expressed interest in the message (ie the properties of the message do not match any subscriptions).

So how do I debug this? -- When a routing failure occurs, the messageagent (see my last post) generates something called a routing failure report. A routing failure report is nothing more than a dummy instance, which holds a reference to a dummy message, which has no parts, but which has the context of the message which failed routing at the time it failed routing. We capture the data like this becomes some adapters do not suspend the message and even when the message is suspended, the context of the suspended message is often different from the context at the time of the routing failure. So what can you do with this. Well really there are only a couple of times routing failures should occur in your system. The first is as you are developing and testing your solution (please test your solution). In these cases you should have a reasonable idea where the message should be going. You can look at the context of the message which failed to route and check which properties were promoted and what their values were. Then, you can use the subscription viewer in the <instsalldir>\sdk\utilites directory to see what the subscripton actually looked like for the sendport or orchestration you thought the mesage should have gone to. Often it is simply that you forgot to promote a property or just got the value in your subscription a bit off. Or you forgot to start the orchestration or sendport.

The second case where this can occur is when you try stressing a system with orcehstrations which use corelation and you pass in non-unique correlation sets. Don't do this. Try to imagine what is happening with these messages. Now a response which was supposed to go to one orchestration gets broadcast to 20 orchestrations. And then the responses for those find the orchestrations completed and so fail routing. Actually, what would really happen is that half of those 19 would actually get there before the orchestration completed and you might get 9 zombied orchestrations (see earlier blog) and 10 routing failures. Lesson to be leasned, test with real data and in the real world, correlation sets must be unique.

The third case you might get this in some type of stress is also tightly related to zombies (see earlier blog). If you think about it, a zombie is a race condition where the message got routed to the orchestration right before it completed and so you are "completed with discarded messages". Well what would happen if the raec were a bit different. In that case instead of rouitng just before the orcehstration completed, the orcehstration would complete just before the routing happened. Then you would get routing failures. In these cases, this is what you designed the system to do, so I guess you decide what to do with the failed message at this point. Read my blog on zombies to get a better idea of when zombies can happen.

Sorry this was brief, but it is midnight and I'm tired. Like I said, if things work out correctly, I should have a really good blog coming shortly. Just have to get the signoff from Woodgate and a couple of other people. :)

Thanks

Neglect

Additional resources