Experiencing Alerting failure for Many Data Types – 10/20 – Resolved

Final Update: Wednesday, 21 October 2015 02:14 UTC

We've confirmed that all systems are back to normal with no customer impact as of 10/21/2015, 01:45 UTC. Our logs show the incident started on 10/20/2015, 01:00 UTC and that during the 24 hours and 45 minutes that it took to resolve the issue some customers experienced may have experienced delays in receiving availability and metric based alerts as well as billing quota notifications.
  • Root Cause: The failure was due to a spike in email volumes in the notification system leveraged by Application Insights.
  • Lessons Learned: Our engineering teams are reviewing the available telemetry to ensure this issue does not reoccur.
  • Incident Timeline: 24 Hours & 45 minutes - 10/20/2015, 01:00 UTC through 10/21/2015, 01:45 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Application Insights Service Delivery Team

Update: Tuesday, 20 October 2015 22:34 UTC

Root cause has been isolated to a spike in volume impacting the email notification service leveraged by Application Insights. To address this issue the service was scaled out and the queues are draining. Some customers may experience delays with email notifications, including alerts and quota usage, until the queues return to normal levels. We estimate 4 hours for the issue to be mitigated.

  • Work Around: None
  • Next Update: Before 10/21 03:00 UTC

-Application Insights Service Delivery Team

Initial Update: Tuesday, 20 October 2015 21:41 UTC

We are aware of issues within Application Insights and are actively investigating. Some customers may experience Alerting failure. The following data types are affected: Availability,Metric.
  • Work Around: None
  • Next Update: Before 10/21 00:00 UTC

We are working hard to resolve this issue and apologize for any inconvenience.
-Application Insights Service Delivery Team

Skip to main content