Hosting Workflow Services in Azure under multiple Service Instances

I ran into an interesting issue with Workflow services hosted in Azure. When I had enabled more than one service instance my Workflow service started failing with the error:

"A value of the wrong type was retrieved from the instance store. A value of type {/INSTANCE_1/Workflow/}myWFCloudService.xamlx was expected, but a value of type {/INSTANCE_2/Workflow/} myWFCloudService xamlx was encountered instead."

As it turns out, when you scale the cloud service to two or more instances, the site name in IIS is different for each site instance. They will look something like "<servicename>_IN_<instance number>. So with a servicename of "myWFCloudService" and 2 instances, the site names will be:

  • MyWFCloudService_IN_0
  • MyWFCloudService_IN_1

For workflow services, the Site name is used to calculate both the WorkflowHostType value and for calculating correlation keys used with Workflow instance persistence. For the above two instance nodes the values will be calculated differently so if we attempt to load our workflow instance on a different node than last time we will get the above error.

My first thought for a workaround was to configure Azure so that the site names would match. You can accomplish this manually using remote desktop with Azure and renaming the site in the IIS manager, however, the moment one of these sites is moved or reloaded you’ll find that the site name reverts back to the original format. To make this site name change “permanent” in Azure you’ll have to use a Web Role and the RoleEntryPoint.OnStart() function. I was able to test that this approach did indeed work, but then I discovered (by way of our Workflow Product team!) an undocumented parameter workflowHostingOptions whose sole purpose is to allow you to override the Site name dependency. You’ll need to add the following to your Workflow service config file:

<configuration>
    <configSections>
        <sectionGroup name="system.serviceModel.activities" type="System.ServiceModel.Activities.Configuration.ServiceModelActivitiesSectionGroup, System.ServiceModel.Activities, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35">
            <section name="workflowHostingOptions" type="System.ServiceModel.Activities.Configuration.WorkflowHostingOptionsSection, System.ServiceModel.Activities, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
        </sectionGroup>
    </configSections>

    <system.serviceModel.activities>
        <workflowHostingOptions overrideSiteName="true"/>
    </system.serviceModel.activities>

…. 
    <system.serviceModel>
….
    </system.serviceModel>
….

</configuration>

I found that the order these are listed in the config file does matter. The section group has to come before the “WorkflowHostingOptions” and the “WorkflowHostingOptions” has to come before the <system.serviceModel> section.

There is one other detail that needs mentioning. During testing it was found that even with matching site names the Workflow service sometimes failed with "The requested resource has moved to one of the following locations" exception. The type of this exception is a RedirectionException. This is the result of the instance we are trying to load being locked in the instance store.

This can be remedied by modifying the configuration file of the workflow service to have a "workflowIdle" behavior with values of timeToPersist and timeToUnload both set to "0:0:0". This way, as soon as an instance goes idle it will be unloaded. This is what that behavior looks like in the web.config file: 

<workflowIdle timeToUnload="0:0:0" timeToPersist="0:0:0"/>

If the timeToUnload value is greater than 0:0:0, it is possible for a subsequent request to be received on one of the nodes while the instance is still loaded/locked by the other node. This scenario will result in the RedirectionException.

Once you have configured the workflowHostingOptions and the workflowIdle behavior your Workflow services should scale out in Azure without any issues.