Experiencing Multiple Issues with Application Insights – 04/10 – Resolved


Final Update: Monday, 11 April 2016 14:16 UTC

We've confirmed that all systems are back to normal with no customer impact as of 4/11, 14:15 UTC. Our logs show the incident started on 4/10, 20:25 UTC and that during the 17 hours 50 mins that it took to resolve the issue, Customers would have experienced Availability data gaps and Data access issues.Some Customer might still see data gap as our data processing system is still processing backlog data.
  • Root Cause: The failure was due to authorization calls were failing between the services due to disabling of authorization namespace.
  • Lessons Learned: We have collected telemetry logs and identified steps to be taken to avoid these kind of scenarios in future.
  • Incident Timeline: 17 Hours & 50 minutes - 4/10, 20:25 UTC through 4/11, 14:15 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Rama


Update: Monday, 11 April 2016 13:08 UTC

Root cause has been isolated to Authorization namespace being disabled which was impacting service to service authorization calls . To address this issue we re-enabled Access control namespace. Data Access and Alerting  is now working as expected. Some customers may experience access issues and we estimate 4 hours before all the data will be accessible. However alerting data during the impact period will be lost and we are taking steps to prevent such data loss issue in future.
  • Work Around: None
  • Next Update: Before 04/11 17:30 UTC

-Durga


Update: Monday, 11 April 2016 09:03 UTC

We are seeing Issues with our Authorization service which interact with our systems in Application Insights for authorizing. We are still trying to figure out the root cause for the authorization to fail. The current issues we are facing are:
        1) Data Access - Some customers will experience errors trying to access Application Insights resources in Azure portal.
        2) Alerting Failure - Customers will continue to experience alerting failures and Availability data gaps.

  • Work Around: None
  • Next Update: Before 04/11 13:30 UTC

-Rama


Update: Monday, 11 April 2016 05:04 UTC

We are still trying to figure out the root cause of the issues in  Application Insights. The current issues we are facing are:
        1) Data Access - some customers will experience errors trying to access Application Insights resources in Azure portal.
        2) Alerting Failure - customers will continue to experience alerting failures and Availability data gaps.

  • Work Around: None
  • Next Update: Before 04/11 09:30 UTC

-Rama


Update: Monday, 11 April 2016 03:02 UTC

We are still experience several issues within Application Insights and working on understanding root cause.  The current issues we are facing are:
        1) Data Access - some customers will experience errors trying to access Application Insights resources in Azure portal.
        2) Alerting Failure - customers will continue to experience alerting failures and Availability data gaps.

  • Work Around: None
  • Next Update: Before 04/11 05:30 UTC

-Vitaliy


Update: Monday, 11 April 2016 00:47 UTC

Please note that we are currently experience several issues within Application Insights and are consolidating our blog communications within this post.  The current issues we are facing are:
        1) Data Access - some customers will experience errors trying to access Application Insights resources in Azure portal.
        2) Alerting Failure - customers will continue to see alerting failures and Availability data gaps.

  • Work Around:
  • Next Update: Before 04/11 03:00 UTC

This is a major outage and we are fully focused on resolving the impact as soon as possible.  As stated above, we’ll continue to provide status in the post as the status changes. 

-Vitaliy


Update: Sunday, 10 April 2016 23:12 UTC

We continue to investigate issues within Application Insights. Root cause is not fully understood at this time. Some customers continue to experience Alerting failures and availability data gaps. We are working to establish the start time for the issue, initial findings indicate that the problem began at 4/10 20:25 UTC. We currently have no estimate for resolution.
  • Work Around: None
  • Next Update: Before 04/11 01:30 UTC

-Vitaliy


Initial Update: Sunday, 10 April 2016 20:53 UTC

We are aware of issues within Application Insights and are actively investigating. Some customers may experience Alerting failure. The following data types are affected: Availability.
  • Work Around: None
  • Next Update: Before 04/10 23:00 UTC

We are working hard to resolve this issue and apologize for any inconvenience.
-Vitaliy


Skip to main content