Experiencing Data Latency for Many Data Types – 12/07 – Resolved


Final Update: Tuesday, 12/8/2015 03:55 UTC

We’ve confirmed that all systems are back to normal with minimal customer impact as of 12/8, 03:45 UTC. Our logs show the incident started on 12/6, 23:06 UTC and that during the 28 hours that it took to resolve the issue 25% of customers experienced. Our monitoring system is healthy and all data streams are processing current data however some customer will continue to see missing data for temporary duration (~3 days) in Metric Explorer & Overview blades. This data is available in our system and can be queried thru Search explorer in portal. We are applying a hotfix to fix the bug that has caused this issue as well as it will help in processing missing data in report system.

Root Cause: The failure was due to a bug that caused failure in our throttling rules and in turn it caused high spike of data streaming thru our system. This lead to overall data stream processing go down causing huge backlog.
Lessons Learned: We completely understand root cause bug and its fix has been applied. Also additional monitoring has been deployed to detect the issue if we happen to see a repeat.
Incident Timeline:  27 hours & 51 minutes - 12/6, 23:06 UTC through 12/8, 03:45 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Application Insights Service Delivery Team


Update: Monday, 07 December 2015 21:45 UTC

Our DevOps team continues to investigate issues within Application Insights. Root cause is not fully understood at this time.  Up to 25% customers may continue to experience data latency out side SLA for Metrics Explorer and Overview blades. We are working to restore the service health but we currently have no estimate for resolution.

 
  • Next Update: Before 12/08 04:00 UTC

-Application Insights Service Delivery Team


Update: Monday, 07 December 2015 14:43 UTC

Our DevOps team continues to investigate issues within Application Insights. Initial root cause points to increase in traffic. Some customers continue to experience data gaps across multiple data types. We are working to establish the start time for the issue, initial findings indicate that the problem began at 12/07 13:30 UTC. We are currently evaluating options to mitigate the current impact.

  • Work Around: none
  • Next Update: Before 12/07 21:00 UTC

-Application Insights Service Delivery Team


Update: Monday, 07 December 2015 08:52 UTC

The mitigation measures put in place by the App Insights team continue to back fill the data gap. This process is taking a bit longer than we initially expected. All current service parameters are showing healthy.

  • Work Around: none
  • Next Update: Before 12/07 15:00 UTC

-Application Insights Service Delivery Team


Update: Monday, 07 December 2015 05:07 UTC

Root cause has been isolated to increase in traffic which caused performance degradation in processing service.  To address this issue we re-balanced impacted nodes in service. Processing Service is now working as expected. Approximately 1.5 % of customers may experience data gaps and we estimate around 4 hours before all backlog data is processed.Current Data is not latent since 12/07 04:00 UTC

  • Work Around: none
  • Next Update: Before 12/07 09:30 UTC

-Application Insights Service Delivery Team


Initial Update: Monday, 07 December 2015 01:16 UTC

We are aware of issues within Application Insights and are actively investigating. Approximately 1.5% of Application Insights customers may experience Data Latency.  The system is processing through the backlog and we are investigating options to speed up the backlog processing.  The following data types are affected: Availability,Customer Event,Dependency,Page View,Performance Counter,Request.

  • Work Around: none
  • Next Update: Before 12/07 05:30 UTC

We are working hard to resolve this issue and apologize for any inconvenience.
-Application Insights Service Delivery Team

 
 
 
 
 

Skip to main content