My customer's BizTalk environment uses a 2-node failover cluster to host the SQL instances where the BizTalk databases live. We also clustered Enterprise SSO on this failover cluster. Recently there was a catastrophic issue with the NAS device where the VM storage lived, and one of the nodes of this cluster didn't come back cleanly when the NAS issue was resolved (essentially the cluster service refused to start). In that this was about the only virtual machine to come up in a bad state (out of 60 some) I counted myself lucky.
As I had provisioned this in SCVMM with a service template, it was a simple matter to scale out my database tier by one and replace the bad node and in short order I had SQL installed and configured on the new node. The final step was to install and reconfigure SSO on the new node, including restoring the master secret so SSO would fail over cleanly. And here's where the fun started.
When I ran the install script to install SSO on the new node, it promptly through an error: The following platform Components failed to install and will need to be manually installed before setup can be processed: Enterprise Single sign on server: Unspecified Error
I'd seen this before and a quick mental scan reminded me that the clustered DTC resource needs to be on the local node while installing SSO. Problem: the clustered SSO resource is ALSO in the DTC resource and will not fail over to the new node until SSO is installed and configured there. Chicken, meet egg.
The fix at the end of the day was to remove the SSO resource's dependency on DTC and move it to another resource group temporarily, then move DTC over to the new node. Then it allowed me to finish my install. Then I moved DTC BACK to the original node and moved SSO back to the DTC resource (and re-adding the dependencies I'd removed) so I could configure the resource. Once all of that was done, I was able to run my script to restore the master secret and the new node was fully configured for duty.