BizTalk Performance - Useful technique to baseline your infrastructure

I've recently been doing some performance tuning work for a customer.  The customer wanted to achieve a throughput of 100 messages per second on their solution.  The message flow was quite straightforward so it should have been achievable.  Unfortunately when I turned up on site they were getting less than 5 messages per second throughput!

The hardware used for the BizTalk Servers (of which there were 2) was very new (Dual proc, dual core machines with plenty of RAM :-) ) but the performance was still not there.  It became clear that we were not going to get the improvement through 'incremental' changes.

The difficulty in these scenarios is that there is no standard 'benchmark' that you can use to compare your system to. This is because your combination of BizTalk components (Orchestration, Messaging, custom components, etc) and hardware is unlikely to have been tested.  In this case it was not clear whether the problem was been caused by the custom code or the hardware.

Therefore to 'bench mark' the system I set up a simple pass thru pipeline receive location and a pass thru pipeline send port (which subscribed to all messages on the inbound receive location).  I then used Loadgen to load the system with 3KB messages at a rate of 500 per second (at this time we had one BizTalk box started, with a separate SQL Server with SAN attached storage). On modern hardware the BizTalk engine should be capable of easily achieving 500 messages per second in this scenario.  When the test was run on the customers environment we were getting less than 50!!

 By this time we had set up a duplicate system using similar hardware in Microsoft's netlabs (which we use to replicate customers environments for Proof of Concepts and Performance Labs). We were easily getting 500 messages per second in the netlab, therefore the issue had to be with the customers environment.

 In this scenario the SAN should be your first point of investigation, the SQLIO tool can be used to bench mark your disk configuration, this tool is produced and used by MIcrosoft Product Support Services (PSS).  Running this tool indicated that the SAN was not the culprit (the results from SQLIO were broadly the same as the environment we had in the netlabs).  Further investigation eventually indicated that it was the speed of the BizTalk disks that was the culprit for this problem!  The Avg Disk Writes per second physical disk performon counter was the counter ultimately used to determine this.  In the netlabs we were getting approximately 2000 writes per second, the customer was only getting 200!

 Replacing the disks in the BizTalk Server massively improved performance and the customer is now well on their way to achieveing their numbers.

 The ultimate point of this post is that:

a) You should performance test your solution early - many people try for the 'finger in the air' approach but this rarely turns out well in my experience

b) You should bench mark your environment using simple BizTalk Content Based Routing and pass thru pipelines, if you are not able to achieve a minimum of 250 messages per second in this scenario, perform further analysis to determine what the hardware bottleneck is.