Experiencing Alerting failure for Availability Data Type - 11/11 - Resolved


Final Update: Saturday, 12 November 2016 02:30 UTC

We've confirmed that all our systems are back to normal with no customer impact as of 11/12, 02:05 UTC. Our logs show the incident started on 11/11, 09:43 UTC and that during the 16 hours and 22 minutes that it took to resolve the issue customers experienced availability data missing for the tests running in the Moscow location.
  • Root Cause: The issue was due to infrastructure failure in Moscow region.
  • Incident Timeline: 16 Hours & 22 minutes - 11/11, 09:43 UTC  through 11/12, 02:05 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Sapna


Update: Saturday, 12 November 2016 01:51 UTC

Root cause has been isolated to the infrastructure failure in Moscow region which was impacting availability data for the tests running in the Moscow location. To address this issue we are actively working on the mitigation steps. 

Some customers will continue to experience availability data missing for the tests running in the Moscow region and we estimate another 6 hours before all the impact is addressed for this region.
  • Work Around: None
  • Next Update: Before 11/12 08:00 UTC

-Sapna


Update: Friday, 11 November 2016 19:32 UTC

We continue to investigate the issue within Application Insights. Root cause is not fully understood at this time and is still under investigation. We shall update the blog as we learn more about it. Customers will continue to experience availability data missing for the tests running in the Moscow location. Initial findings indicate that the problem began at 11/11 ~09:00 UTC. We currently have no estimate for resolution.
  • Work Around: None
  • Next Update: Before 11/12 02:00 UTC

-Sapna


Update: Friday, 11 November 2016 15:00 UTC

We continue to investigate issues within Application Insights. The issue is due to a hardware failure at one of the backend component in Moscow location.  Customers continue to experience availability data missing for the tests running in the Moscow location. Initial findings indicate that the problem began at 11/11 ~09:00 UTC. We currently have no estimate for resolution.
  • Work Around: None
  • Next Update: Before 11/11 19:00 UTC

-Mohini


Update: Friday, 11 November 2016 12:48 UTC

We continue to investigate issues within Application Insights. Root cause is not fully understood at this time. Customers continue to experience availability data missing for the tests running in the Moscow location. Initial findings indicate that the problem began at 11/11 ~09:00 UTC. We currently have no estimate for resolution.
  • Work Around:  None
  • Next Update: Before 11/11 15:00 UTC

-Mohini


Initial Update: Friday, 11 November 2016 10:22 UTC

We are aware of issues within Application Insights and are actively investigating. Some customers may experience availability data missing for the tests running in the Moscow location. The following data types are affected: Availability. 
  • Work Around: None
  • Next Update: Before 11/11 12:30 UTC

We are working hard to resolve this issue and apologize for any inconvenience.
-Mohini


Skip to main content