Final Update: Monday, October 22nd 2018 23:22 UTC
We’ve confirmed that all systems are back to normal as of 22:50 UTC. Our logs show the incident started at 19:40 and that during the 3 hours and 10 minutes that it took to resolve the issue, an unknown number of customers experienced build delays upwards of 20 minutes. Sorry for any inconvenience this may have caused.
- Root Cause: The failure was due to a loss of build capacity following Azure provisioning failures for virtual machines.
- Chance of Re-occurrence: Low - additional capacity already provisioned.
- Lessons Learned: We are working to improve capacity monitor and earlier alerting for capacity issues to ensure availability with temporary Azure resource issues.
- Incident Timeline: 3 hours & 10 minutes – 19:40 UTC through 22:50.
Update: Monday, October 22nd 2018 22:01 UTC
We have identified an issue with virtual machine management operations failing for some users in West US (Azure is working on this), which resulted in a drop in our overall capacity for allocating hosted build agents and thus causing the increased latency/delays for all users in the US who were using hosted builds. We have since added additional capacity in our east US region to try to mitigate the impact and have started to see the wait times reduce.
Next Update: Before Tuesday, October 23rd 2018 00:15 UTC
Initial notification: Monday, October 22nd 2018 21:00 UTC
We're investigating build delays in allocating hosted build agents in Central US.
Currently we are engaged in bridge with MMS DRI for further investigation.
Next Update: Before Monday, October 22nd 2018 23:35 UTC
Sai Madhav Tiruvuri