I spoke at an internal architect event this week and faced an interesting question from my co-presenter:
- Since SharePoint makes use of multiple databases. Does this imply that using Workflow within Office 12 will require two phase commit? If not, how is transactional integrity maintained between what is being updated and the Workflow persistence?
SharePoint does, in fact, use multiple databases – one for configuration, one for content, one for profiles, and one for services. The type of data stored within each database is discreet and orthogonal.
Everything that has to do with the workflow state and its data (from workflow associations to running instances to the manifestations of workflow tasks as SharePoint tasks) is stored in the content database. SharePoint doesn't have multiple databases connected in a way that would need to implement a two phase commit for a given workflow.
This does, however, raise an interesting point. SharePoint's persistence architecture is complicated by the fact that it is a two tier system - one database machine in the backend plus one on front-end Web servers (primarily for scale). Windows Workflow Foundation runs within each and every front-end machine - but the workflow's data will normally reside on back-end machines. How does this work?
- When a workflowevent (such as the completion of workflow activity) comes into a front-end machine, it makes a query to the database to find the corresponding serialized workflow instance data in the database. At that point, the front-end checks to see whether it can lock that row in the database - if it can’t, it asks which front-end has the current lock, and forwards the event to that machine.
- When the front-end obtains the lock, it retrieves the persistence data and loads it into the workflow engine along with its workflow schedule. As soon as the instance finishes processing the event, we check to see if there are any other events waiting in the event queue for this particular instance. Assuming there are none, we serialize it, persist it back to the database, and unlock the row.
This raised another question:
- In SharePoint there is a concept of a site group (e.g. a subset of the URL tree or a sub-directory). A site group is mapped to a content database. Current best practice says that a content database should be around the 50GB mark (although this may change with Office 12 due to the availability of a wastebasket). It appears that a workflow cannot transactionally span site groups since different site groups can be (and usually are) in different databases. This could potentially cause issues with activities that span departments in a company (since a site group is usually mapped at the departmental level).
No issues here. In Office 12 SharePoint a workflow instance is attached to a single item in a single list (such as a document within a document library). There are some ideal scenarios for this: document review, approval, routing, etc. There is no concept of a workflow instance spanning multiple items or multiple site collections.
Rule of Thumb: If a workflow needs to span multiple site collections or servers, you should probably splitup the workflow into segments for each serverand launch them as a set of interrelated flows, correlating and communicating the results via services. (This is roughly the same approach you might use when your workflow needs to interact with LOB apps like SAP.)
Thanks to George Hatoun (SharePoint PM) for assistance on this!
Note: You can learn more about how WF and SharePoint work together in Office 12 by picking up the Windows Workflow Foundation book. Keep in mind that the book only covers Beta 1.2 which you can get here. The samples for the book are available in C# and VB.NET and can be downloaded from Sams Publishing here. There is a more current beta available for Windows Workflow Foundation - Beta 2 can be downloaded as part of the WinFx January CTP. You can also download hands-on-labs, custom activities and other cool stuff at the Windows Workflow Foundation Community Site.