We’ve experienced a reoccurrence of an edge case bug before the hotfix could deploy. All operations are back to normal with no customer impact as of 2/1, 21:45 UTC. Our logs show the incident started on 2/1, 20:43 UTC and that during the 1 hour, 2 minutes that it took to resolve the issue, customers may have experienced a loss of availability data for web tests configured through the CA-SanJose, FL-Miami, IL-Chicago, TX-SanAntonio, and VA-Ashburn regions.
- Root Cause: The failure was due a bug being triggered in our global availability test nodes, causing them to fail to emit proper telemetry.
- Lessons Learned: There is further risk of reoccurrence until the hotfix is fully tested and deployed.
- Incident Timeline: 1 Hour & 2 minutes – 2/1, 20:43 UTC through 2/1, 21:45 UTC.
-Application Insights Service Delivery Team