Final Update: Thursday, 16 March 2017 10:05 UTC
We’ve confirmed that all systems are back to normal with small data gaps as of 03/16, 10:00 UTC which will be slowly caught up once all the current backlog in the pipeline is processed completely. We estimate at current rate it will take ~4 hours before all the data ingested during the impacted window is fully processed. Our logs show the incident started on 03/15, 23:30 UTC and that during the 10 hours 30 minutes that it took to resolve the issue substantial % of customers in east US would have experienced data access and data gaps. Also once the data is re-processed customers will see some data duplication. We have identified an improvement workitem on our side which will be taken up to avoid data duplication while backlog processing.
- Root Cause: The failure was due to backend storage component having performance degradation.
- Incident Timeline: 10 Hours & 30 minutes – 03/15, 23:30 UTC through 03/16, 10:00 UTC
We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.
Update: Thursday, 16 March 2017 07:34 UTC
Application Insights dependent platform component is fully recovered, with this we are seeing recovery with in Application Insights. All access issues are now resolved and customers can see their current data ingested without any issues.Backlog data for export scenario is currently being processed and with current recovery rate we estimate 3 hours to fully process the backlog data. We will provide updates on mitgation as we see them.
- Work Around: None
- Next Update: Before 03/16 12:00 UTC
Update: Thursday, 16 March 2017 02:41 UTC
We continue to investigate issues within Application Insights. Issue has been isolated to one of our dependent platform component which is not responding as expected. We are actively working with our partner team in resolving the issue at the earliest. With already applied mitigation steps we are seeing recovery in data access while current ingested data continues to be latent.The following data types are affected: Customer Event, Dependency, Exception, Metric ,Page Load, Page View, Performance Counter, Request, Trace, Availability.
- Work Around: None.
- Next Update: Before 03/16 08:00 UTC
Initial Update: Thursday, 16 March 2017 00:23 UTC
We are aware of issues within Application Insights and are actively investigating. Customers using Application Insights will be having Data Access Issues. The following data types are affected: Customer Event, Dependency, Exception, Metric ,Page Load, Page View, Performance Counter, Request, Trace, Availability.
- Work Around: None
- Next Update: Before 03/16 02:30 UTC
We are working hard to resolve this issue and apologize for any inconvenience.