Hosting Workflows in a WCF Service is queueing up and times out

Hosting a Workflow through a WCF Service is not an usual scenario that we see very often, right?

Well a couple of days ago I find out one of this unicorns, and guess what? Yes, is causing some issues :-)

So, I got this question for help on a scenario where the developers had developed a Workflow that was been hosted on a WCF Service, and lately they were facing this issue of getting the WCF Service stop responding to any requests and resulting into requests being queued and, of course, eventually they will time out.

Looking at all the evidences of this issue (logs, event viewers, traces, etc) we could say that the error was rooted in the below Exceptions:

  1. System.ServiceModel.Communication Exception: The socket connection was aborted.
    This could be caused by an error processing your message or a receive timeout being exceeded by the remote host,
    or an underlying network resource issue.
    Local socket timeout was '00:10:00'. ---> System.Net.Sockets.SocketException:
    An existing connection was forcibly closed by the remote host
  2. System.TimeoutException: The socket was aborted because an asynchronous receive from the socket did not complete within the allotted timeout of 00:10:00.
    The time allotted to this operation may have been a portion of a longer timeout. --->
    System.Net.Sockets.SocketException: The I/O operation has been aborted because of either a thread exit or an application request

Looking in more detail in the code that they had made inside the Workflows being executed I could clear see that they were doing some async calls, further investigation show us that in this calls usually can take some time to execute.

Humm, interesting scenario, right?

On one side we have this Workflows doing some stuff involving async calls that can take some time, and on the other side we have a configuration on the WCF Service side called "ReceiveTimeout" that is usually limited to the default of 10 minutes, dangerous combination, right.

So, we had to discuss this constraints with the development team, and try to provide some clarifications on this.

First of all, we had the need to clarify that the WCF "ReceiveTimeout" described below is essentially an inactivity timer between the client and service
and the service will dispose of the connection if this timeout is violated to conserve resources.
See Binding.ReceiveTimeout.

Based on that we can understand that this second inactivity timer is on the service and uses the "ReceiveTimeout" setting of the binding.
This inactivity timer fires if no application messages are received within the timeout period. This specifies, for example, the maximum time a client may take to send at least one message to the server before the server will close the channel used by a session.

This behaviour ensures that clients cannot hold on to server resources for arbitrary long periods.

The solution in our case, since this is Workflow hosted in WCF, or of course we can have workflows Idle with activities as long running processes for longer than 10 minutes, we ending up using ReceiveTimeout="infinite" in their WCF service configuration to mitigate these timeout issues during the workflow async calls.

And in their case, voila problem solved.

Hope that helps