Experiencing Data Latency for Many Data Types – 04/22 – Resolved


Final Update: Saturday, 23 April 2016 17:44 UTC

We’ve confirmed that all systems are back to normal with no customer impact as of 4/23, 17:40 UTC. Our logs show the incident started on 4/22, 16:50 UTC and that during the 25 hours that it took to resolve the issue customers experienced multiple windows of data latency outside of the 2 hour SLA.
  • Root Cause: The failure was due to multiple slowdowns in a dependent service. We are doing further root cause in order to make our services more resilient in the future.
  • Lessons Learned: We coninue to work and research ways to make our system more resilant so that this does not happen as often in the future.
  • Incident Timeline: 24 Hours & 50 minutes – 4/22, 16:50 UTC through 4/23, 17:40 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Randy


Update: Saturday, 23 April 2016 14:09 UTC
We ran into further slowdowns which has extended the time in which the processing of the backlog will take. We are looking into ways to resolve the slowdowns that we are experiencing. The backlog made considerable progress in processing through the latent data. Customers will continue to see data gaps that are outside of SLA. Current data is within the 2 hour SLA. 
  • Work Around: none
  • Next Update: Before 04/24 02:30 UTC

-Girish Kalamati


Update: Saturday, 23 April 2016 02:07 UTC

Root cause has been isolated to a slowdown in a dependent service which was impacting multiple data types. To address this issue we have taken multiple mitigation steps including rebooting services and collecting traces for further RCA. We are now current with the latest data. Some customers will continue to experience windows of data that is outside of SLA. We are currently processing through this backlog of data and it will be available once the backlog catches up.
  • Work Around: none
  • Next Update: Before 04/23 14:30 UTC

-Randy


Update: Friday, 22 April 2016 20:21 UTC

During recovery we experienced another slowdown in the back end service. We are actively investigating this issue and are working on coming up with mitigation steps. The initial window of data that was latent has been backfilled and is now current. Customers will see new windows of data outside of the 2 hour SLA.
  • Work Around: none
  • Next Update: Before 04/23 02:30 UTC

-Randy


Update: Friday, 22 April 2016 16:48 UTC

Root cause has been isolated to a slowdown in a back end service which was impacting many data types. To address this issue we rebooted the affect service. Some customers may experience latency outside of the 2 hour SLA for a window of data as we continue to process through the backlog of data.
  • Work Around: none
  • Next Update: Before 04/22 21:00 UTC

-Randy


Skip to main content