Is there a pub/sub system underneath BizTalk?

Okay, according to the stats, you guys stopped reading when I posted the convoy topic, but, well, I'll keep throwing stuff at you and hope you enjoy it.

This is the first installment of the bizTalk pub/sub infrastructure talk. I have met lots of people who are confused about this (some of them are on my team). So the short answer is yes, BizTalk processing is built on top of a sql based pub/sub infrasctructure which you know as the messagebox. The longer answer involves explaining how you interact with it since BizTalk does not tout itself as a pub/sub product or really expose a lot of views into its pub/sub nature (except for the subscription viewer which is a good demonstration of why I am not a UI developer J ). I will attempt to give you a quick insight into pub/sub in BizTalk. Maybe one day I will take all this stuff I am writing, fancy it up and make it into chapters in a book or something, but given that I write code not prose, it’s doubtful.

There are really two components which together make the BizTalk pub/sub infrastructure. The database portion is known as the MessageBox. The other portion is what the engines internally use to interact with the MessageBox and is called the MessageAgent. This piece abstracts away all of the guts of the messagebox from the engines (things like multi-messagebox are understood by the agent, but the engines do not need to worry about it). This is probably a bit more than you needed. So in a pub/sub system, well, there are really three things you need to describe: publishers, subscribers, and events (or messages) which flow through the system.

Publishers. Who are the publishers in your BizTalk system? There are only really 3 publishers in a BizTalk system. Receive ports are publishers. They pick up data from somewhere based on the adapter and the URI, and then pass it through a pipeline and maybe a map and then eventually give it to the messageagent and say “Publish this for me, please.” (our engines are very polite and always say please). To clear up confusion, data is persisted to the messagebox after the pipeline and after the map, not before. (There is one exception to this involving MSMQt and large messages but you really don’t need to care about that) Send shapes in a schedule are publishers. They also give the message to the message agent and say “Please publish this for me. Thank you.” There are also a couple of other random points like the response portion of a solicit-response sendport. It does a publish. Also when an orchestration execs (not calls. Call is inline and synchronous) another orchestration, it publishes a message too. It is a bit confusing, I know because you are saying to yourself, “In my orchestration I bind my send action to a specific sendport. What do you mean it publish?”. I will get to that shortly.

Subscribers: Who are my subscribers? Orchestrations are subscribers. Any receive action in an orchestration maps to a subscription. The orchestration subscriptions are made up of the filter expression on the activate receives and the correlation sets on the subsequent receives (you can see them in your subscription viewer in the sdk\utilities directory). There are two types of subscriptions which I like to talk about … activation subscriptions and instance subscriptions (sometimes correlation subscriptions). Activation subscriptions start a new instance of a service. These are the subscriptions in your orchestration which you mark as Activate = true. Instance subscriptions, or correlation subscriptions, are subscriptions which route messages to already running instances. They are created after the orchestration instance starts once the necessary correlation sets have been initialized. It gets tricky with convoy semantics, but I don’t think I can really explain that in a quick blog like this. Let’s just say that I get tricky and in a parallel activation convoy, they’ve got a little activation and a little instance subscription in them. Send ports are subscribers. Send port subscriptions are always activation subscriptions. There is one exception to this and that is ordered delivery sendports. I’ll let you in on a secret. We do ordered delivery in sendports just like you try to build it in your schedules. We use a convoy. So MSMQt sendports are inherently on a convoy, so they are that weird blend. Other subscribers are the response portion of a request / response receive port. We use some internal correlation sets to make sure that the response gets back to the correct nt service for things like HTTP so that we can send the response on the open connection. Another example of a subscriber is when you do delivery notification. We actually create an internal subscription for the notification and use an internal correlation set to get it back to the correct orchestration instance. Hmmmmmm. What else. Oh yeah. About the confusion over direct binding to a sendport. All sendports subscribe to messages sent directly to them (you can see this in the subscription viewer) based upon a property called their TransportId which is an internal bts property. This way we can force send messages from the orchestration to the sendport. That’s the basics of it.

Events: Events or messages are just your messages. The MessageBox and MessageAgent do not care at all about what is in your message. We never look at the contents. To us, it is an opaque blob. We care only about the structure of it … how many parts and what there names are … and the properties associated with the message in its context. There are two basic types of properties on the context: written properties and promoted properties. They are both streamed out to the database when the context is persisted. The only difference is that promoted properties are used to route on. If someone subscribes to “foo = 3” and you promote foo with a value of 3 then your message will go to the subscriber. Anyone can promote anything (almost) at anytime as long as the property is defined in a property schema. If it is not defined in a property schema, you will get an error when you try to promote it. One thing many people don’t know is that the routing layer supports multi-valued properties (ie VT_ARRAY | VT_??), Our native components won’t promote anything like this, but you can do it in a custom pipeline component. You cannot reference these properties from within a schedule because orchestrations do not support multi-valued properties, but if you just want to route it there, and you have repeating elements, this could work for you.

There is really only one big gotcha in the routing layer that you have to avoid. Do not create a lot of sendports subscribing to the exact same thing without using a Distribution List (SendPort Group). If you create 100 sendports and all of them subscribe to A=5 & B=4 your performance will be worse than if you were to create one sendport group with that filter expression and then add all of the sendports to it. This is very important. If you do it the bad way, you will see some performance degradation and increased CPU utilization on your master messagebox for routing. Just giving you a heads up. It probably won’t happen till you have a whole lot, but it is just not a good practice to get into. Basically if you have more than say 8 sendports subscribing to the exact same thing, use a sendport group.

Hope this has given you a bit of insight. Apparently there are people out there reading this. Sorry it is not all official and beautiful, but, well, hopefully it is something.

Lee