Experiencing Data Latency for Multiple Functional Areas – 03/01 – Resolved


Final Update: Wednesday, 02 March 2016 15:40 UTC

We’ve confirmed that all systems are back to normal with no customer impact as of 03/02, 15:00 UTC. Our logs show the incident started on 03/01, 16:48 UTC and that during the 10 hours &12 minutes that it took to resolve the issue small percentage of customers experienced data latency.
  • Root Cause: The failure was due to exception caused due to invalid data sent by a customer application.
  • Lessons Learned: We have deployed a hot fix which will avoid re-occurrence of such issues in future.
  • Incident Timeline: 10 Hours & 12 minutes – 3/1, 16:48 UTC through 3/2, 15:00 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Ghouse


Update: Wednesday, 02 March 2016 09:45 UTC

Application Insights services are still working on processing some of the delayed data. Processing backlog is taking more time than expected but we are monitoring current progress.Some customers may still experience data latency for data sent between 3/1 18:00 UTC and 3/2 00:00 UTC and we estimate additional 6 hours before all backlog data is processed. 
  • Work Around: None
  • Next Update: Before 03/02 16:00 UTC

-Ghouse


Update: Wednesday, 02 March 2016 04:35 UTC

Root cause has been isolated to lack of certain key data validations in our pipeline which was impacting our processing components. To address this issue we have now deployed fixes to get these processing components back to normal processing rates. The processing components are now working as expected. However, a subset of our customers will continue to see data gaps for their data ingested between 3/1 18:00 UTC and 3/2 00:00 UTC. We expect this process to take another 6-10 hours.
  • Next Update: Before 03/02 11:00 UTC

-Arun Jolly


Update: Wednesday, 02 March 2016 00:14 UTC

We continue to investigate issues within Application Insights. We continue to see corruption issues in one of our processing components. A subset of our customers will continue to see data gaps for their data ingested between 3/1 18:00 UTC and 3/1 22:00 UTC. We currently have no estimate for resolution.
  • Next Update: Before 03/02 04:30 UTC

-Arun Jolly


Update: Tuesday, 01 March 2016 19:26 UTC

Root cause has been isolated to deployments which were executed prior to this incident. To address this issue we rolled back the aforementioned deployments and this has brought our processing components back to normal state. Customers will now see their latest data in the portal. However, a subset of out customers may experience a gap in their data ingested between 3/1 16:00 UTC and 3/1 18:00 UTC as residual affect. We’ve put in place additional processing components to back fill this data and estimate another 4 hours for this activity to complete.
  • Next Update: Before 03/01 23:30 UTC

-Arun Jolly


Initial Update: Tuesday, 01 March 2016 16:48 UTC

We are aware of issues within Application Insights and are actively investigating. Some customers may experience Data Latency. The following data types are affected: Availability,Customer Event,Dependency,Exception,Metric,Page Load,Page View,Performance Counter,Request,Trace.
  • Next Update: Before 03/01 19:00 UTC

We are working hard to resolve this issue and apologize for any inconvenience.
-Arun Jolly


Skip to main content