Ask Learn
Preview
Ask Learn is an AI assistant that can answer questions, clarify concepts, and define terms using trusted Microsoft documentation.
Please sign in to use Ask Learn.
Sign inThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Update March 7, 2013
Added to the Q&A section --- Q: How long will the upgrade take? How long will my VM be down?
Update October 17, 2014
Added information about Guest Agent updates. Thanks to my colleague Anurag Sharma for this idea.
------------
Roughly once per month Microsoft releases a new Guest OS version for Windows Azure PaaS VMs. The exact schedule varies and the historic trend can be seen at https://msdn.microsoft.com/en-us/library/windowsazure/ee924680.aspx. During this rollout the Window Azure Fabric Controller will do two passes through all of the datacenters. There is also a periodic update of the Azure guest agent that runs inside of your VM.
Mark Russinovich has a great blog post which describes the Host OS upgrade process - https://blogs.technet.com/b/markrussinovich/archive/2012/08/22/3515679.aspx.
Note that this article is focused on PaaS scenarios, but the Host OS update process applies to IaaS Persistent VMs as well. For more information about IaaS VM restarts see https://blogs.msdn.com/b/windows_azure_technical_support_wats_team/archive/2013/11/27/windows-azure-iaas-host-os-update-demystified.aspx.
See https://blogs.msdn.com/b/kwill/archive/2011/05/05/windows-azure-role-architecture.aspx for more information about the processes which are running and the location of log files which can be used to troubleshoot.
public class WebRole : RoleEntryPoint {
public override bool OnStart () {
// For information on handling configuration changes
// see the MSDN topic at https://go.microsoft.com/fwlink/?LinkId=166357.
IPHostEntry ipEntry = Dns.GetHostEntry (Dns.GetHostName ());
string ip = null;
foreach (IPAddress ipaddress in ipEntry.AddressList) {
if (ipaddress.AddressFamily.ToString () == "InterNetwork") {
ip = ipaddress.ToString ();
}
}
string urlToPing = "https://" + ip;
HttpWebRequest req = HttpWebRequest.Create (urlToPing) as HttpWebRequest;
WebResponse resp = req.GetResponse ();
return base.OnStart ();
}
}
Notification
At this time the Windows Azure platform does not offer proactive notifications when an OS upgrade is happening. The Windows Azure development team is working on this functionality so that service administrators can better plan for upgrades and possible service impact. Your role instances will receive a RoleEnvironment.Stopping event prior to being shut down and you can use that event to gracefully terminate any work that the role instance is doing or notify an administrator that an instance is shutting down.
In the meantime you can subscribe to the Windows Azure OS Updates RSS feed at https://sxp.microsoft.com/feeds/3.0/msdntn/WindowsAzureOSUpdates. This feed should be updated the same day that the OS updates start being rolled out to the datacenter. This typically does not give advanced proactive notification, but it does help identify when the updates are happening. As noted above in the Host OS and Guest OS description the update process can take several days to complete, so it may be one or more days between when the RSS feed is updated and your hosted service begins updating.
The Guest OS list at https://msdn.microsoft.com/en-us/library/windowsazure/ee924680.aspx and the OS version selection dropdown in the management portal are typically updated after the Guest OS rollout has completed so you should not use the latest entry in these lists as an indication of when the OS updates are in progress.
Detection
At this time there is no direct way to detect a Host OS upgrade, but you can see the evidence of the reboot within the logs on the VM:
A: You cannot opt out of the Host OS updates because Microsoft must maintain updated and patched host OSes within the datacenter. You can opt out of the Guest OS update by specifying a version of the Guest OS, but note that your service will no longer receive security patches and may be vulnerable. See /en-us/azure/cloud-services/cloud-services-how-to-configure-portal#manage-guest-os-version.
A: There is no way to control when an individual instance or service will be upgraded for the Host OS. The upgrade is started on all Azure datacenters across the world at approximately the same time, and the fabric works continuously on upgrading each datacenter. This process takes several days due to the complexity of making sure upgrade domain rules are followed for all cloud services, and there is no way to control or determine when a specific instance will be impacted. To control the Guest OS update you can specify a fixed Guest OS version and then update it whenever you are ready.
A: Connecting to an Azure PaaS VM via RDP and making changes or installing software is unsupported. At any point in time the VM may be completely rebuilt and any changes you make will be lost. This can happen if the hardware fails and we have to startup a new VM on new hardware. This will also happen during the Guest OS update when the Windows Partition is rebuilt. If you need to install software or make changes to the VM you must create a startup task and do the work from there. This ensures that when the VM is recreated that your configuration will be executed again.
A: The updates that are installed onto the new guest OS version are publicly available and thoroughly tested hotfixes which are also being deployed to servers around the world via Windows Update and the chance negative impact to your service is extremely small. However, the root of the question goes back to how you manage OS patches in your on-premise services - do you install directly on the production servers and assume it will work, or do you have a staging environment where you test the patches first? You will follow the same pattern in Azure. If you want to have a staging environment to test patches prior to production then you should configure your production service to use a fixed version OS string in the .cscfg file. Then when a new guest OS is available you can deploy your service into the staging slot using the newest guest OS version. After you have validated that the service works correctly on the latest guest OS you can then either do a VIP swap, or do an in-place upgrade of your production service to use the latest OS.
A: There is a common misconception that the more patches being applied, the longer the update will take. This is based on the belief that the upgrade works similar to how a Windows Update upgrade happens on your local desktop machine where a bunch of patches are copied to Windows and installed with subsequent reboots, but this is not how upgrading works in Azure. When a new OS version is being released in Azure, the OS team will take the latest image, apply the patches, and then save a new VHD with this new base image. This base image is then copied to a repository in Azure. When the fabric is instructed to do an OS upgrade it will first make a copy pass where it copies this new base image VHD to the hard disks on each server in the datacenter that is going to be upgraded. Once this copy process is finished the fabric will begin the upgrade process, following the normal upgrade domain rules. When a guest is going to be updated the fabric will do a graceful shutdown of the OS and then start a new VM using the new base image. The time it takes to upgrade a given VM for a Guest OS is roughly the time it takes to do a graceful Windows shutdown + the time it takes to start Windows. The timing for a Host OS update is a little different. When a Host is being upgraded it first sends the shutdown message to each Guest OS running on that Host. Each Guest OS is then given the standard OnStop and Windows Shutdown time to finish shutting down. Once every Guest OS is shut down, then the Host OS does a graceful shutdown and goes through it's normal shutdown procedure. Once the Host OS is shutdown then the Host is rebooted using the new OS image. Once the Host is up and running then it will start each of the Guest OSes. Typically this Host OS update process will take 15 to 20 minutes, but it can vary depending on how many other Guests are on that Host and how long they take. Having said that, there will always be exceptions if there is a failure on a particular node and the Azure fabric determines that the Guests on that node need to be moved to a different node.
A: When the OS is being updated the Azure Fabric will perform a graceful shutdown of your role instance. This means that your ASP.NET code will receive the Application_End event, and the Azure service runtime will raise the Stopping and OnStop events. Your code will have 5 minutes to finish cleanup work in OnStop before the process is shut down. After your Azure host process is shut down then Windows will go through a normal graceful shutdown including raising the standard OnStop and related events for Windows Services. For more information about gracefully handling a shut down of your instance see https://azure.microsoft.com/en-us/blog/the-right-way-to-handle-azure-onstop-events/, https://msdn.microsoft.com/en-us/library/hh180152.aspx and https://msdn.microsoft.com/en-us/library/windowsazure/microsoft.windowsazure.serviceruntime.roleentrypoint.onstop.aspx.
Anonymous
September 20, 2012
This explains a lot of my recent headaches with our Azure web roles in an "infinite initialization" state.
Thanks for the information - it is very helpful!
Anonymous
September 20, 2012
Thanks for detailed information :)
Anonymous
September 23, 2012
We have had this issue where roles fail after the OS updates. Reimaging always fixes it.
Thanks,
Matt Watson
<a href="http://www.stackify.com">http://www.stackify.com</a>
Anonymous
September 23, 2012
Matt, I would encourage you to open a support incident at www.windowsazure.com/.../contact next time this happens and the team can help you investigate why your role fails to start. The root cause is typically pretty easy to find and the fix is usually easy to implement, and this will make your service much more robust.
Kevin
Anonymous
October 10, 2012
Do you have any idea why two reboots are necessary? Once the host has been rebooted why not immediately reimage all the VMs inside it and let them start? What's the need for the second reboot?
Anonymous
March 07, 2013
Dmitry, the 2 upgrade pass has been around since Azure started and I am not positive of the reasoning behind this design decision. My best guess is to try to isolate the Host OS upgrade in order to make it faster and get through the datacenter as quickly as possible. During the host OS upgrade of any specific server the fabric waits for a maximum of 15 minutes for each guest on that host to report Ready before it is able to move to the next upgrade domain for that service. During a Host OS update the Windows partition on the guest OS is preserved which can shorten the startup time for the hosted service running in that guest OS. During a Guest OS update the Windows partition is wiped out which means startup tasks that do installations will have to run again which will increase the amount of time it takes to get to the Ready state. See blogs.msdn.com/.../windows-azure-disk-partition-preservation.aspx for more info on the disk preservation scenarios.
Anonymous
August 13, 2013
The comment has been removed
Anonymous
November 18, 2013
There is an intermittent issue with the certificate path for our SSL web service that occurs at certain times, I am assuming, either when our cloud service on Azure reboots or is moved. This occurred on November 18, 2013, and previously on or about September 27, 2013. Using SSL Checker at www.sslshopper.com/ssl-checker.html it reports that the certificate is not trusted in all web browsers. When I add our domain to IIS site binding, the issue is resolved.
Sometime after the September occurrence, I later removed the site binding setting (as it is a real issue that prevents using staging for testing) and we had no issues until last night, Nov 18. Again I had to add the site binding to resolve the issue (at 08:45 UTC). I have now removed the site binding setting at 13:30 UTC and the issue remains resolved.
The real problem is that before I changed the site binding setting, requests to our web service could not be made. Salesforce.com only allows Apex callouts for GET and POST requests to SSL web services only for certain specified root certificates and only when the certificate path can be determined correctly by Salesforce. Callouts will result in a PKIX path building failed error when the path can not be determined. After adding the domain to IIS site binding, Salesforce has no problem.
This appears then to be a Windows Azure issue where the certificate paths are not re-established promptly when certain changes are made to the server instance. It seems that having multiple role instances would not avoid this issue as our web service works, using soapUI for testing, but the certificate path for Salesforce is still not correct.
Anonymous
June 01, 2017
It's an old post so, so helpful. do you have a current link for this section: Your role needs to adhere to the rules around host OS updates, in particular instances should reach the Ready state within 30 minutes of starting the Startup tasks. For more information about this limitation see http://msdn.microsoft.com/en-us/library/hh543978.the link here seems no to take me to anything about 30 minutes.
Anonymous
December 05, 2017
Hi Don. Updated the 30 minute timeout link to https://docs.microsoft.com/en-us/azure/cloud-services/cloud-services-update-azure-service#how-an-upgrade-proceeds, thanks for the feedback.
Ask Learn is an AI assistant that can answer questions, clarify concepts, and define terms using trusted Microsoft documentation.
Please sign in to use Ask Learn.
Sign in