We've confirmed that all systems are back to normal with no customer impact as of 02/23, 08:30 UTC. Our logs show that the incident started on 02/23, 00:50 UTC and that during the 7 hours & 40 minutes that it took to resolve the issue very small percentage of customers experienced alert notification failures.
- Root Cause: The failure was due to configuration change which caused this issue. We have rolled out an update to mitigate this configuration issue.
- Lessons Learned: We will be working on detailed RCA of this incident to avoid re-occurrence of such issues in future.
- Incident Timeline: 7 Hours & 40 Minutes - 02/23, 00:50 UTC through 02/23, 08:30 UTC
We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.
Root cause has been isolated to a configuration issue which impacted alerting for some customers. To address this issue we have begun rolling out an update to correct the configuration issue. Some customers may continue to experience alerting failures as the update is applied. We estimate 4 hours before all issues are addressed.
- Work Around: Manually modifying individual web tests (any change) will restore alerting for each test.
- Next Update: Before 02/23 10:00 UTC
We are aware of issues within Application Insights and are actively investigating. Some customers may experience Alerting failure. The following data types are affected: Availability,Metric.
- Work Around: Making any modification to existing web tests will restore alerting.
- Next Update: Before 02/23 09:00 UTC
We are working hard to resolve this issue and apologize for any inconvenience.