The race between MSMQ setup and Active Directory replication – who will win?

There’s a new KnowledgeBase article out that reinforces the need for you to ensure you allow for replication delays when installing MSMQ with the “Active Directory Integrated” option. The problem has been around for years but only now properly mentioned (I would say “properly documented” but the article is pretty thin on content).

After you install MSMQ with the “Active Directory Integrated” option enabled on a Windows Server 2003-based or Windows Server 2008-based computer, the Message Queuing service may fail to be started

The problem is that there is a race between the object and the service. The object has to be replicated to whichever domain controller the new machine’s MSMQ service is going to talk to during startup – before it finishes starting up. If the service chooses the same domain controller as setup talked to originally then MSMQ will startup fine but there is no guarantee that this will be the case, especially in a domain where the load on DCs can vary their responsiveness.

Although intra-site replication is fast, it is not instant and this is not helped by the way Windows 2008 setup works. In Windows 2003, setup created the object so that there was a reasonable amount of time to replicate before the MSMQ service had restarted and required access to the newly created configuration object. In Windows 2008, though, setup doesn’t create the object. Instead it merely sets a registry value and restarts the service which in turn creates the object, after which it tries to read back the QMID value. If the LDAP query goes to a different DC then the object will not be there and the service will only be able to start in workgroup mode.

[[Thanks to Xin Chen for input]]

Comments (3)

  1. ChrisAD2 says:

    Is this problem also manifested when a server is creating new MSMQ queues? E.g. a system that, for every new web session, is creating new public queues, then the object only exist one one domain controller until the replication kicks in place?

    I am seeing a very similar problem to this today. We are writing new public queues very often, and it seems that when we try to do a lookup of the queue on a different DC, we then see about 15 seconds delay.

  2. ChrisAD2 says:

    I believe my previous comment may be resolved with this:

  3. ChrisAD2 says:

    I can confirm that the previous link is the problem I was having. Sorry for three posts, but I cannot see how I edit my posts?