Degraded Experience with Diagnostic Event Correlation - 2/6 - Resolved


Final Update: Friday, 2/6/2015 21:50 UTC

We’ve confirmed that all systems are back to normal with no customer impact as of 2/6, 21:25 UTC. Our logs show the incident started on 2/3, 01:15 UTC and that during the 92 hours that it took to resolve the issue approximately 3.7% of customers experienced degradation in their ability to correlate diagnostic events using Session ID.

Root Cause: The failure was due to a mismatch between the validation logic ensuring the consistency of the session data and the SDK.  This appeared in a regularly-scheduled service deployment on 2/3.
Chance of Reoccurrence: We consider this to be a very low likelihood of recurrence.
Lessons Learned: We are improving our pre-deployment validation processes to ensure this problem does not reoccur.
Incident Timeline: 92 Hours & 10 minutes - 2/3, 01:15 UTC through 2/6, 21:25 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Application Insights Service Delivery Team


Initial Update: Friday, 2/6/2015 20:29 UTC

We are actively investigating issues with diagnostic event correlation.  Less than 5% of customers may be unable to correlate diagnostic events based on Session ID while troubleshooting issues with their applications.  This is limited to a small subset of users who are missing a specific metadata that causes the Session ID to also be removed from their metadata.  The event data is unaffected.

We have developed a hotfix for this situation and are in the process of testing before deploying to production.

Work Around: affected customers can manually correlate diagnostic events from other fields, like timestamp.
Next Update: Before 21:30 UTC

We are working hard to resolve this issue and apologize for any inconvenience.

-Application Insights Service Delivery Team

 



Skip to main content