So you have a startup task in your any Windows Azure Role which included the following actions:
- Startup Tasktype is set to “Simple”. (Background and foreground task type will not block the role to start)
- Download ZIP installer from blob storage.
- Unzip it
- Install in silent mode.
- Run some script to start the application
Let’s consider above overall steps take about X amount of time to get it done, you may have the following questions:
- Is there any limit for startup task to finish?
- Is the time taken by startup task should be follow some guideline?
- What will happen with the role, if startup task takes forever or just stuck?
- What is the best way to handle such situation?
Here are some details you can consider with your Windows Azure Application which follows above scenario:
When your role is waiting to get ready it is not added into load balancer and once it is ready it is added into load balancer which indicated that your role is ready to accept traffic from outside. When the role returns from OnStart, the Role is added to the Load Balancer so if your startup task is active, Role is not added into load balancer yet. During the time your startup task is performing action, the Role status should be “Busy” (Any other state should be consider as problem). So if your role takes more than X minutes to initialize during startup task, this just indicate that your role is not able to accept traffic and that is indicated by its status as “Busy”.
There is no time limit for your role to start, but if you take more than 15 minutes it is possible your service might hit some outage while your role is getting ready.
So what is 15 minute?:
There is a 15 minute timeout on role start which applies only when your service is set for automatic upgrade. It means the fabric controller will only wait 15 minutes for the role to start before proceeding to the next upgrade domain.15 minute limit is about how long the Fabric Controller will wait for a role instance to go to the Ready state during a rolling update.
So if fabric controller is moved to next upgrade domain, your role will continue to run and wait in OnStart and your service will be unavailable until fabric finishes all the upgrade domains before your first upgrade domain finishes.
What about when service is set to Automatic or Manual upgrade:
As you may know if your service is set to perform Manual upgrade, fabric controller upgrade one domain at a time and wait until it is ready before moving to the next upgrade domain. So if you have service upgrade set to Manual then 15 minutes wait by fabric controller will not apply as Fabric controller will wait for your startup task to finish before moving to next upgrade domain. So you might consider having manual service upgrade mode however there are still concern during regular host OS update as well.
How to handle such scenarios:
If you have a scenario in which you believe your startup task takes more than 15 minutes to get ready and you don’t want to have service outage, you can increase your service update domains and correspondingly the number of instances you have so that you always have at least one instance up and ready to take traffic while others are initializing.
- Windows Azure Fabric Controller Details:
- Upgrade & Fault Domains: