Final Update: Friday, July 27th 2018 16:29 UTC
We’ve confirmed that all systems are back to normal as of 14:33 UTC. Our logs show the incident started on 11:39 UTC and that during the 2 hours and 54 minutes that it took to resolve the issue less than 1% of builds on hosted agents were slow to start or failed due to VSTS internal errors in the West Europe regions. Sorry for any inconvenience this may have caused.
- Root Cause: The failure was due to a sudden degradation in performance of code involved in agent allocation and release.
- Chance of Re-occurrence: Low, a change to the code that was deployed one week ago has been rolled back and performance is back to normal.
- Lessons Learned: While we still don't fully understand why the new code performed within expectations for almost 7 days and suddenly degraded, we will be refining the process of testing similar code changes for performance impact ahead of future deployments. Additionally we will be looking to refine our ability to understand impact better and provide more timely communication of issues such as this one in the future.
- Incident Timeline: 2 hours & 54 minutes – 11:39 UTC through 14:33 UTC.
Initial notification: Friday, July 27th 2018 15:31 UTC
We're investigating an issue affecting builds on hosted agents in West Europe.
- Builds using hosted agents may experience delayed starts or failures reporting errors internal to the VSTS services
- Next Update: Before Friday, July 27th 2018 16:05 UTC