In MOSS 2007, when we run an instance of workflow that creates N number of tasks, sometimes it could fail in between before all the tasks get created successfully. The exception that you might see is "Failed on Start ( retrying )". There could be multiple reason for this exception to happen. One probable reason for this exception is, when N number of tasks are high (say more than 500 tasks). You should also check ULS log for this exception. It could be a timeout problem that happens with SQL server and the transaction gets aborted. In this case, in ULS you might see this exception
Workflow Infrastructure 72fg High Error in persisting workflow: System.Transactions.TransactionAbortedException: The transaction has aborted. ---> System.TimeoutException: Transaction Timeout --- End of inner exception stack trace --- at System.Transactions.TransactionStateAborted.CreateAbortingClone(InternalTransaction tx).
System.Workflow.Runtime.Hosting Error: 0 : DefaultWorkflowCommitWorkBatchService caught exception from commitWorkBatchCallback: System.Transactions.TransactionAbortedException: The transaction has aborted. ---> System.TimeoutException: Transaction Timeout.
One possible workaround is to increase the timeout value in the web.config file. By default the value is set to 1 minute. You can increase the value to say 15 minutes.