Experiencing Alerting failure for Availability Data Type – 6/15 – Resolved


Final Update: Monday, 6/15/2015 19:35 UTC

We’ve confirmed that all systems are back to normal with no customer impact as of 06/15, 18:57 UTC. Our logs show the incident started on 06/14, 11:45 UTC and that during the 31 hours that it took to resolve the issue ~50% of availability tests were failing from Amsterdam location. At this point, customers will not get any false alerts, if they have high sensitivity option enabled and are running availability tests from Amsterdam location.

Root Cause: The failure was due to a bad port on one of the network device in Amsterdam datacenter.
Lessons Learned: Network team will perform full RCA, which will include why this issue took so long to detect and mitigate.
Incident Timeline: 31 Hours & 12 minutes – 06/14, 11:45 UTC through 06/15, 18:57 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Application Insights Service Delivery Team


Initial Update: Monday, 6/15/2015 17:07 UTC

We are aware of issues within Application Insights and are actively investigating. Some customers may experience false alerts, if customers have high sensitivity setting selected and Amsterdam location selected for their web tests. The underlying issue is ~50% failures for Availability tests running in Amsterdam location. At this point we suspect network issues in Amsterdam data center and we are engaging networking team to investigate further.  The following data types are affected: Availability.

Work Around: Change alerting sensitivity to medium or low. Unselect Amsterdam location from location choices for web tests.
Next Update: Before 21:00 UTC

We are working hard to resolve this issue and apologize for any inconvenience.

-Application Insights Service Delivery Team

 

Skip to main content