Experiencing Alerting failure for Multiple Functional Areas – 02/23 – Resolved


Final Update: Tuesday, 23 February 2016 08:42 UTC

We've confirmed that all systems are back to normal with no customer impact as of 02/23, 08:30 UTC. Our logs show that the incident started on 02/23, 00:50 UTC and that during the 7 hours & 40 minutes that it took to resolve the issue very small percentage of customers experienced alert notification failures.

  • Root Cause: The failure was due to configuration change which caused this issue. We have rolled out an update to mitigate this configuration issue.
  • Lessons Learned: We will be working on detailed RCA of this incident to avoid re-occurrence of such issues in future.
  • Incident Timeline: 7 Hours & 40 Minutes - 02/23, 00:50 UTC through 02/23, 08:30 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Durga


Update: Tuesday, 23 February 2016 05:33 UTC

Root cause has been isolated to a configuration issue which impacted alerting for some customers. To address this issue we have begun rolling out an update to correct the configuration issue. Some customers may continue to experience alerting failures as the update is applied. We estimate 4 hours before all issues are addressed.

  • Work Around: Manually modifying individual web tests (any change) will restore alerting for each test.
  • Next Update: Before 02/23 10:00 UTC

-Steve


Initial Update: Tuesday, 23 February 2016 04:31 UTC

We are aware of issues within Application Insights and are actively investigating. Some customers may experience Alerting failure. The following data types are affected: Availability,Metric.

  • Work Around: Making any modification to existing web tests will restore alerting.
  • Next Update: Before 02/23 09:00 UTC

We are working hard to resolve this issue and apologize for any inconvenience.
-Steve



Skip to main content