Experiencing Data Latency for Many Data Types - 5/19 - Resolved


Final Update: Thursday, 5/21/2015 22:06 UTC

We’ve confirmed that all systems are back to normal with no customer impact as of 05/21/15 21:10 UTC. Telemetry data is almost current now.

The gap in the data form 5/19 07:30 - 5/20 02:00 is slowly getting filled back and it might take a few days to fully catchup. Considering the importance of newer data over older data, we are prioritizing processing current data which has potential to slow down filling the data gap.

Root Cause: The failure was due to an update of the upstream service which destabilize are pipeline to process telemetry data. We are working towards making our service more resilient towards such incidents.

Chance of Reoccurrence: Low

Incident Timeline: 05/19/15 02:39 UTC through 05/21/15 21:10 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Application Insights Service Delivery Team


Update: Thursday, 5/21/2015 17:43 UTC

Root cause has been isolated to a service update for an upstream service. Application insights service is in recovering state for all the data types now. Currently there is a latency of
2.5 hours for various data types and is catching up.

We will update this blog as soon as we get current.



Work Around: None
Next Update: Before 5/22/15 00:00
-Application Insights Service Delivery Team


Update: Thursday, 5/21/2015 01:01 UTC

This issue continues and the current impact is that customers may see a latency of nearly 6 hours on recent data along with a gap in telemetry for older data. We don't have an ETA on the full recovery of the data. 

Work Around: none
Next Update: Before 5/21/15 18:00 UTC

-Application Insights Service Delivery Team


Update: Wednesday, 5/20/2015 13:47 UTC

We are still working on fixing this telemetry data type issue. Current impact is that customers may see a latency of nearly 2 hours on recent data along with a gap in telemetry for older data. We don't have an ETA on the full recovery of the data. 

Work Around: none
Next Update: Before 5/20/15 22:00 UTC
-Application Insights Service Delivery Team


Update: Tuesday, 5/19/2015 22:42 UTC

Our DevOps team continues to investigate issues within increased latency for all telemetry data types. We don't have an ETA for resolution at this time but we are working hard to fix the issue.

 Current latency is > 15 hours

Work Around: None
Next Update: Before 5/20/15 01:00 UTC

-Application Insights Service Delivery Team


Update: Tuesday, 5/19/2015 19:10 UTC

We continues to investigate issues with telemetry data types and working on fixing it. We currently have no estimate for resolution.

Work Around: none
Next Update: Before 5/19  21:00:00

-Application Insights Service Delivery Team


Update: Tuesday, 5/19/2015 16:59 UTC

We continue to have issues with telemetry data types and working on fixing it. This is caused by reboot of the nodes underneath the service by an upstream service.

Current latency is > 5 hours

Work Around: None
Next Update: Before 5/19 19:00:00

-Application Insights Service Delivery Team


Update: Tuesday, 5/19/2015 11:04 UTC

 While on the path of recovery we lost a few critical nodes which retrograded recovery process. All the data types are latent and the most of the telemetry related queries may fail. DevOps are investigating the issue. We don't have an ETA for complete recovery as of now.

Work Around: None
Next Update: Before 5/19 18:00 UTC
-Application Insights Service Delivery Team


Update: Tuesday, 5/19/2015 05:58 UTC

Most of the data types are now recovering and customers should be able to query their telemetry data. However, as the system is still in recovery some of the queries may still time out. We continue to work on the issue.

Work Around: none
Next Update: Before 5/19 16:00:00 UTC
-Application Insights Service Delivery Team


Initial Update: Tuesday, 5/19/2015 03:55 UTC

We are aware of issues within Application Insights and are actively investigating. Some customers may experience Data Latency.

The following data types are affected: Availability, Customer Event, Dependency, Exception, Page Load, Page View, Performance Counter.

Additionally some customers may also not able to query for telemetry data. DevOps are engaged and actively working on fixing the issue.

Work Around: none
Next Update: Before 5/19 06:00:00 UTC

We are working hard to resolve this issue and apologize for any inconvenience.

-Application Insights Service Delivery Team

 
 
 
 
 
 
 
 
 

Skip to main content