I am dedicating this post to Jozsef because I’d have missed out on this interesting issue without his escalation.
Yesterday Jozsef sent out an email about an orchestration performance issue. I thought this would be a good test run for PTrace after Dwaine improved the speed of that utility. The issue sounded strange. Customer complained that his orchestration was slow. The unusual part was, after using orchestration debugger, Jozsef isolated the bottleneck to the start of an atomic shape. The delay was actually 16-17 seconds. The delay here was so out of normal scale, I thought at first this must be a debugger error. I should know better to blame on the tool.
After getting a BizTalk trace, I parsed it in PTrace and loaded up the orchestration events. Sure enough, there was a huge gap of time between the completion of an expression shape and the beginning of an atomic scope. That just didn’t make any sense at first. The delay was at the start of the scope, we didn’t even hit any shape in the scope yet. We were not at the end of a scope or just completed a send so persistence didn’t even come into play. Filtering down to the thread level in the trace, I saw that the delay happened right before AtomicTransaction.CreateInstance(). I did a quick look up in the source code to confirm that this event is logged AFTER we have completed object instantiation, not before. As it turned out, customer declared a variable at the scope level so all signs are pointing at the delay being caused by this variable’s instantiation. Even then, hard to believe what in the constructor can cause such delay.
Fortunately customer shared his .odx file with us. It was easy to find the variable and the type associated with it. I was able to reflect the constructor. Only thing that jumped out was a call into the static DatabaseFactory class in the Enterprise Library. I searched through KB articles and on the web thinking that we may be hitting a known issue specific to this method. To my surprise, I got many hits on delays caused by certificate revocation list lookup by .Net CLR against signed assemblies. In this case, the Enterprise Library assembly was signed by Microsoft certificate. Finally we got some direction. If there was a networking issue where CLR couldn’t reach a certificate revocation list server, that’d delay assembly loaded until CLR gives up on the lookup. I’ve had a similar issue recently with MSIEXE trying to do the same with a signed assembly as part of a hotfix package. All we had to do to verify this hypothesis was to temporarily disable certificate revocation check, which you can do within Internet Explorer (Internet Options –> Advanced tab). Customer tested this and confirmed that it resolved the issue. We are no longer seeing the 16-17 second delay.
I am not saying you should disable certificate revocation check permanently. That is obviously a security risk. However, if you ever run into similar issue, it is simple to test this possibility. If it turns out to be the same issue, you can shift your troubleshooting to why you are unable to reach the certificate revocation list server from your BizTalk machine.