Experiencing Alerting failure for Many Data Types - 9/28 - Resolved


Final Update: Monday, 9/28/2015 21:07 UTC

We’ve confirmed that all systems are back to normal with no customer impact as of 9/28, 20:51 UTC. Our logs show the incident started on 9/28, 19:11 UTC and that during the approximately 1.6 hours that it took to resolve the issue some customers may have experienced alerting failures and an inability to create new or update existing alerts.

Alerting functionality for existing alerts was restored at 20:21 UTC, and all pending alert configuration changes were processed by 9/28, 20:51 UTC.

Root Cause: The failure was due to a deployment bug.
Lessons Learned: The specific bug is understood and while our monitoring caught the issue right away, the restoration steps took some time to execute.  We are investigating optimizations for service restoration in this failure mode.
Incident Timeline: 1 Hours & 40 minutes - 9/28, 19:11 UTC through 9/28, 20:51 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Application Insights Service Delivery Team


Initial Update: Monday, 9/28/2015 20:18 UTC

We are aware of issues within Application Insights and are actively investigating. Some customers may experience Alerting failure. The following data types are affected: Availability, Metric, Performance Counter.

Work Around: none
Next Update: Before 22:00 UTC

We are working hard to resolve this issue and apologize for any inconvenience.

-Application Insights Service Delivery Team

 

Skip to main content