Experiencing Multiple Issues with Application Insights – 03/24 – Resolved


Final Update: Friday, 25 March 2016 23:38 UTC

We've confirmed that all systems are back to normal with no customer impact as of 03/25, 23:00 UTC. Our logs show the incident started on 03/24 9:00 UTC and that during the 38 hours that it took to resolve the issue 15-20% of customers experienced partial data loss and data latency. Most of the missing data will backfill within 7 days, once summarized data will be visible in the Azure portal.
  • Root Cause: The failure was due to Azure storage and network outages which were impacting multiple Application Insights services.
  • Lessons Learned: Pending RCA from partner teams
  • Incident Timeline: 19 Hours - 03/24 9:00 UTC through 03/25, 4:00 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Vitaliy


Update: Friday, 25 March 2016 10:57 UTC

Root cause has been isolated to Azure storage and network outages which were impacting multiple Application Insights services. Azure teams have addressed underlying issues.
The Application insights processing service is  processing the current data as expected and is also chewing up the backlog data at a healthier rate. Some customers may experience data gaps for the data sent before 24th March 22:30 UTC and we estimate another 12 hours before all latent data is processed.
  • Work Around: Data sent after 24th March 22:30 UTC is not delayed.
  • Next Update: Before 03/25 23:00 UTC

-Varun


Update: Thursday, 24 March 2016 23:34 UTC

Root cause has been isolated to Azure storage and network outages which were impacting multiple Application Insights services. Azure teams have addressed underlying issues. Application Insights services are now working as expected. Below is the status for issues mentioned earlier:

  1. Data Latency & Data Loss - Some customers may experience data gaps for data sent before 22:30 UTC and we estimate 12 hours before all latent data is processed. Data sent after 22:30 UCT is proceed without delay. Some customers might see up to 5% of data loss for data, which was sent before 22:00 UTC. 
  2. Azure Portal Failures - This issue is resolved. Customers shouldn't see any more errors when accessing their data in Azure portal.
  3. Availability Data Gaps - This issue is resolved, however customers who have outside-in webtest running might see a data gap between 20:40 UTC and 22:30 UTC on 03/24.
  • Work Around: Data sent after 22:30 UTC is not delayed.
  • Next Update: Before 03/25 12:00 UTC

-Vitaliy


Update: Thursday, 24 March 2016 22:01 UTC

Please note that we are currently experience several issues within Application Insights and are consolidating our blog communications within this post.  The current issues we are facing are:

  1. Data Latency & Data Loss - We continue to experience delays in data ingestion due to underlying issue Azure Storage (see Azure Service Status Dashboard).  Many customers will see data latency that exceeds 8 hours and some customers may see up to 5% data loss.  This impact originally started on Thursday, 24 March 2016 12:39 UTC and was related to issues in US South Central.  The scope of customer impact has expanded due to new Azure Storage issues now occurring in US East.  We are working closely with our Azure partners to investigate these issues.
  2. Azure Portal Failures - Some customers are seeing failure in the portal when accessing their data.  This impact is also due to the underlying Azure Storage issues.
  3. Availability Data Gaps - Customers who have outside-in webtest running may see some gaps in the historical availability data. The alerts related to availability monitoring are currently unaffected.

  • Work Around: No work arounds available at this time
  • Next Update: Before 03/25 00:30 UTC


This is a major outage and we are fully focused on resolving the impact as soon as possible.  As stated above, we'll continue to provide status in the post as the status changes. 

-Tom Moore



Update: Thursday, 24 March 2016 21:34 UTC

We continue to investigate issues within Application Insights. Root cause is not fully understood at this time. Some customers continue to experience issues when accessing data from portal. We are working to establish the start time for the issue, initial findings indicate that the problem began at 03/24 ~10:00 UTC. We currently have no estimate for resolution. In addition to that there seems to be an issue in Azure in East US region. We are working with Azure team to mitigate the issue.
  • Work Around: None
  • Next Update: Before 03/25 02:00 UTC

-Vamshi


Initial Update: Thursday, 24 March 2016 17:44 UTC

We are aware of issues within Application Insights and are actively investigating. Some customers may experience Data Access Issue in Azure Portal. The following data types are affected: Page Load,Page View.
  • Work Around: None
  • Next Update: Before 03/24 21:00 UTC

We are working hard to resolve this issue and apologize for any inconvenience.
-Vamshi


Comments (0)

Skip to main content