Experiencing Data Gaps for Multiple Functional Areas – 03/16 – Resolved


Final Update: Thursday, 17 March 2016 07:10 UTC

We’ve confirmed that all systems are back to normal with no customer impact as of 3/17, 07:10 UTC. Our logs show the incident started on 3/16, 21:17 UTC and that during the 09 hours 53 mins, some customers would have experienced latency in metric data.
  • Root Cause: Root cause is isolated to a network issue with a dependent platform of Application Insights Processing service.
  • Lessons Learned: We have collected required telemetry data and will be investigating more using the same  to avoid such occurrences in future.
  • Incident Timeline: 09 Hours & 53 minutes – 3/16, 21:17 UTC through 3/17, 07:10 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Rama


Update: Thursday, 17 March 2016 03:10 UTC

We are fully caught up with backlog data in US East Region. However we see that Trace data in South Central region is experienced latency and still processing data. It is expected to take few more hours for data to catch up. Some customers might still experience latency for data sent between  03/16 20:30 UTC  and 03/16 22:01 UTC

  • Work Around: None
  • Next Update: Before 03/17 07:30 UTC

-Vamshi


Update: Wednesday, 16 March 2016 22:57 UTC

Root cause has been isolated to networking failure in Azure US East Region which was impacting our services. Azure repaired the issue and we are seeing recovery in our services. Some customers may experience data latency for data sent between  03/16 20:30 UTC  and 03/16 22:01 UTC. A small subset of customers may have experienced some data loss for data sent during the impact window. We estimate 4 hours before all backlog data is processed.
  •  Work Around: None
  • Next Update: Before 03/17 03:00 UTC

-Vamshi


Initial Update: Wednesday, 16 March 2016 21:17 UTC

We are aware of issues within Application Insights and are actively investigating. Some customers may experience intermittent failures while sending telemetry data to Application Insights causing data loss during ingestion.

We do not completely understand the cause of this transient issue however initial investigation points to underlying networking issue. We’ll update this notification as soon as we have more understanding of the cause and possible resolution.

The following data types are affected: Availability,Customer Event,Dependency,Exception,Metric,Page Load,Page View,Performance Counter,Request,Trace.

  • Work Around: None
  • Next Update: Before 03/17 01:30 UTC

We are working hard to resolve this issue and apologize for any inconvenience.
-Pankaj




Comments (0)

Skip to main content