WF4 Workflow Versioning Spike

Today I’m starting on another spike project.  So far there are only questions to investigate, no answers.  I’m sharing my plans with you because I believe doing so might help some of you and I’m hopeful that you might share your insights with me as we seek to solve this problem together.

But aren't you solving versioning in the next release?

Why yes – if you saw my session at PDC10 – Windows Workflow Futures you know that we are working on solutions for the next release.  However, you have projects that are happening right now and we want to provide answers for you today. 

WF4 Workflow Versioning Spike

What is "Versioning"?

"Versioning is the creation and management of multiple releases of a product, all of which have the same general function but are improved, upgraded or customized." –

When developing solutions with Windows Workflow Foundation developers have two elements which must be versioned together as the system evolves over time.

  • The Workflow Definition
  • The Workflow Instance State

The Workflow Runtime uses a Workflow Definition to create a Workflow Instance. The Workflow Instance creates and updates the Workflow Instance State at runtime. This state includes information about the activities that we executing and the state of variables in the workflow as well as other internal data.

When the Workflow Instance is persisted, the Workflow Instance State is stored in an Instance Store. At some later time, the Workflow Runtime will create a new Workflow Instance and load the Workflow Instance State from the Instance Store into the Workflow Instance.

Windows Workflow Foundation in .NET 4 does not provide explicit support for versioning. Yet we know that any serious implementation must provide a solution for versioning. The purpose of this spike project is to investigate the various dimensions of the versioning problem and to propose solutions.


Activity Library Versioning

This deals with how workflows resolve dependencies to assemblies which contain types that they require. This has been previously investigated and the results published in the following blog posts. This area is out of scope for this spike.

Service/Data Contract Versioning

Versioning Workflow Services is very similar to the general purpose problem of versioning Web Services with regard to Service / Data Contracts. There are well known techniques and resources dealing with this problem therefore this area is out of scope for this project.

Workflow Versioning

While any activity can be considered a workflow, for the purposes of this project a Workflow is defined as the activity which is invoked by the workflow runtime via WorkflowApplication, WorkflowInvoker or WorkflowServiceHost.

These are questions that need to be investigated

  • What happens when I change a workflow which has persisted instances?
  • Are there any "safe" non-breaking changes I can make? If so, what are they?
  • How would I know if I made a change that was not safe?
  • Is there a way I can know what the version of a persisted instance is?
What happens when I change a workflow which has persisted instances?

To test this scenario

  • Create a workflow that will be persisted.
  • Run the workflow to create several instances in the instance store
  • Update and deploy a new workflow definition
  • Resume a persisted instance and note what happens
Are there any "safe" non-breaking changes I can make?

Try Changing a Workflow Definition in various ways

  • Alter the activity tree by adding or removing activities before and after the persistence point
  • Add / Remove variables
  • Add / Remove arguments
  • Alter property arguments on an activity
  • Alter expression text on an activity
  • Alter a type used by the workflow
How would I know if I made a change that was not safe?

Make a list of the results when various kinds of changes are made that cause incorrect behavior

Is there a way I can know what the version of a persisted instance is?

Investigate options for adding version information to the persistence store


Versioning Scenarios

These scenarios represent the target solution scenarios for investigation in this spike. The output of the spike is to produce working scenarios which I can demo showing techniques which achieve the goals.


Scenario: Side by Side Versions


  • Workflow V1 has been in production and has persisted instances
  • Workflow V2 has been developed and will now be deployed


  • Workflow V2 is deployed


  • Messages sent to existing workflow instances for Workflow V1 are received by Workflow V1 and instances will complete under V1
  • Messages which create new workflow instances will create new instances for Workflow V2

So That

  • Workflows complete under the version they started with


  • How do we route messages to the correct workflow with minimal pain?
  • What requirement does this place on the client?
  • How can we minimize the coupling between the client and the service with regard to versioning?

Scenario: Bug in the Workflow


  • Workflow V1 has been in production and has persisted instances
  • Workflow V1 has a bug and instances will fail or produce incorrect results if they complete under V1
  • Workflow V2 has been developed with bug fixes and will now be deployed


  • Workflow V2 is deployed


  • Workflow instances started under V1 are deleted and resubmitted to Workflow V2
  • Messages which create new workflow instances will create new instances for Workflow V2

So That

  • All workflows complete under V2


  • How do we detect instances that need to be deleted?
  • How do we resubmit?
  • How do we avoid duplicate work problems when we resubmit work which has already been done?


Ultimately this project is for you.  My goal is to help you create robust long lived solutions and getting versioning right is a key element of that.  Perhaps you have thoughts about my project.  Are there areas I should investigate that are missing?  Are there solutions that you have found worked well for you?  Just leave a comment and let me know.

Comments (6)

  1. Dmitry Kusnier says:

    Omg. Ron, I just have no words. Just want to say thank you for all work you have done. Looking forward for new posts.

  2. Dmitry Kusnier says:

    Warning. Bad Eanglish.

    My opinion on versioning. I think it may be just enough in some cases to introduce some kind of workflow key-bookmarks (or regions, or containers). When workflow reaches this bookmark, the workflow instance should be aware about where it has stoped. At the momment we have idle instance with the data about the place where particular workflow instance is actualy stoped. Then we can take decissions about can the workflow be safly updated (We can safely update the part which is not executed yet if we work with simple sequential workflow for example). Then we compare two versions of the xamls, and update parts which is between key-bookmarks is subversion like manner, and take care about updating conflicts (variable deleted/introduce etc)

    For this system to work workflow definitions should be saved with versioning data in the persistance database. (Is it already done?)

    This is quiet obvious things, I belive you already thinked about the problem in this way. But anyway, I just want to deliver feedback, because this is the least thing that I can do to thank you for providing much smoother way to dive inside the WF4 world.

  3. Hi Ron,

    I am so glad that somebody as MS is asking this question!

    I am usually tasked with making the versioning recommendations within our organisation, and getting some solid direction from Microsoft (and the community) will make my job a lot easier.

    I personally have found that in developing a versioning practice, the options available tend to lay themselves out along a linear range of diametric oppositions.

    One end represents high granularity (large number of loosely coupled elements), which gives the developer massive power to change the product, but also increases the development cost of maintaining all these 'versioning boundaries'. The other end represents low granularity, fewer components, lower development cost when making a change, but higher customer impact, an increased likelihood that a given change will break compatibility with some element of the product – resulting in a lot of "You just can't use those components together, you have to upgrade X to use Y".

    Finding the best position along this scale is a question of resource available to maintaining infrastructure and releases, available tooling, and complexity of the product itself.

    I don't see much difference between managing the versioning of compiled binaries in an extensible product, and Xaml in a distributed workflow system.

    From a functional perspective, we have come to the same conclusion you have. The two goals we have when rolling out new components, is either to repair an issue rendering existing components non-functional, or to supersede an operational component to change functionality.

    The former has strict rules regarding what may be changed, to support 'hot-swap'. The latter, is a complete reversion, and while it must continue to interoperate with the same contractual interfaces that the previous component was expected to, it brings with it all of its tightly coupled dependencies – so the component and those directly supporting it, are considered to belong to a separate lifecycle.

    If we look at hot-fixing the ideal scenario would be, applying a hot-fix does not reboot the AppDomain and fail running instances, but all running instances are persisted, once that is achieved, the AppDomain is recycled and instances are rehydrated if necessary. Any which fail to rehydrate into the hot-fix because of versioning rule conflicts (breaking changes) are displayed in the AppFabric dashboard.

    The hot-fix should apply to all active workflow instances, because, as you have mentioned, there may be a bug that requires the fix to be applied to workflows that have already started.

    Non-hot-swap upgrades should be achieved "side by side" – exactly as you have described.

    We have considered using a WCF routing mechanism to ensure that messages are always routed to the version of the workflow that was responsible for creating the workflow instance. That would require that the serialized workflow store information about its version, and the workflow exposes multiple endpoints


    http://localhost/workflow1/v1.0/ This endpoint specifically points to v1.0 endpoint

    http://localhost/workflow1/v1.1/ This endpoint specifically points to v1.1 endpoint

    http://localhost/workflow1/ This endpoint will always redirect to the 'newest' endpoint, allowing new workflows to be started as v1.1 (in this case)

    Admittedly, everyone could just post messages to the http://localhost/workflow1/ endpoint, and if it's not a message that should start a new workflow, then it must contain a correlation id, and that id could be used to determine which endpoint version to route to, and those 'endpoint versions' could be private, not accepting direct messages (to prevent people from starting old workflow versions once they have been superseded).

    This, would be pretty painless – it would have no impact on how clients interact with AppFabric, and it will allow the service contract to change between versions because correlated messages will be redirected to the appropriate version of workflow.

    While the concept of allowing a workflow state to 'rehydrate' into a newer workflow version is a wonderful thought – it's not as simple as inspecting the workflow to determine if a workflow instance can be rehydrated into a newer workflow pattern.

    Some activity that has yet to be executed may be expecting that a prior activity has manipulated some external resource in a specific way – and that 'manipulation' may have changed between versions. So, while the workflow might simply be passing a different argument to some external resource, which won't affect the rehydration of the workflow negatively, the external resource has been 'initialized' in such a way that is only compatible with the prior version of workflow design… there's no way you can know this.

    As a rule of thumb, I suppose if your workflow makes any calls to external resources, and the arguments (or rules that builds the arguments) to that external call changes (some might tricky static analysis there), then you have to fail the 'in-place upgrade'.

    Adam Langley

  4. Ron Jacobs says:

    @Adam – Wow – great feedback .  You raise some excellent points I will have to consider.

    @Dmitry – thanks – keep the ideas coming!

  5. I love your blog posts, and check the RSS feed daily.  Great work, I've been waiting for something exactly like this (second only to a date on the vNext stuff ;).

    I wonder if there's a "hacky" interim solution similar to what you plan for the full vNext where something vaguely similar to Xamlinjector is used to modify the in-store view of the persistence data based on the deltas between the old and new workflow.

    It wouldn't even have to be fully automated – basically I know the difference between these two workflow definitions, what changes do I need to make to the persistence data to reconcile them.  For example, I know I added a WriteLine between these two activites, so I'm going to have a program troll through persistence and do an "Insert(something)" between activities X and Y.

    Sounds like something MS would probably be loathe to officially support, though, and I understand the reasons why.

  6. Ron Jacobs says:

    @jvrobert I've got some ideas that I'm looking into.  One thing is certain this investigation has been very revealing so far and will certainly drive some new thinking into our next release as well.

Skip to main content