Solutions, workarounds and suggestions for common AppFabric 1.1 errors

AppFabric 1.1 can throw random DataCacheException under load. The three most common errors are:

Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure. Please retry later. Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0018>:SubStatus<ES0001>:The request timed out.
Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0016>:SubStatus<ES0001>:The connection was terminated.

To resolve or alleviate these issues, you can try the following solutions/workarounds:

1.
ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure. Please retry later.

  • Install the latest cumulative update for AppFabric 1.1 (CU7 as of this writing)
  • Upgrade .NET Framework to 4.5 on all cache servers
  • Enable backgroundGC per the instructions in the CU3 article (see the CU3 article for the bug fixed and how to enable backgorundGC)

All CU links can be found in How to Update Windows Server AppFabric 1.1 with Cumulative Update Packages. CU7 and backgroundGC on the cache server can be very helpful to reduce the number of ErrorCode<ERRCA0017>:SubStatus<ES0006>. CU7 is not required to install on the cache clients.

2.
If you have local cache enabled, you need update client side caching assemblies to CU6 to fix the local cache bug (see the CU4 article for the details on the local cache bug.) It is not required to install CU7 on the cache servers to resolve the local cache bug.

3.
ErrorCode<ERRCA0018>:SubStatus<ES0001>:The request timed out.

3.1 
Increase maxConnectionsToServer. The default value is 1. The article Tune MaxConnectionsToServer suggests to set it to the number of cores (of the application server.) In our experience it may be too big. The value should not be greater than 4 in most of cases. Setting a big number can result in many open connections to the cache cluster nodes and high CPU on Lsass and the event logging service.

3.2
There is a known WCF 4.0 issue, which can affect the performance of your application if the cache client is a WCF application. You need either upgrade the .NET framework to 4.5 on the client or implement the workaround described in the article WCF service may scale up slowly under load.

3.3
For more realistic test results, do not use hot keys in load tests, which can lead to high CPU on one particular cache host. Instead use randomly generated keys (for example, Guid key = Guid.NewGuid() for example) for Get/Put operations in load tests. This is because AppFabric is a partition cache rather than a replication cache.

4.

ErrorCode<ERRCA0016>:SubStatus<ES0001>:The connection was terminated.

One possibility of this exception is due to a timing issue as the cache client and cache server have the same 10 minute setting. We have seen cases where a client is getting a request on the connection just as the server is closing its side due to the timeout. If you have seen this error, we recommend you to set the client receiveTimeout less than the cache server.

The client side receiveTimeout setting (in milliseconds) can be found in Application Configuration Settings. This needs set in each client web.config or app.config.

The server side receiveTimeout (in milliseconds) can be set in the cluster config file:

<dataCache size="Small">
<!—Other Settings Here-->
   <advancedProperties>
       <transportProperties receiveTimeout=”600000” />
    </advancedProperties>
</dataCache>