I recently debugged a very interesting process crash issue. The code was running in the cloud so I could not just attach the debugger as easily as on my dev box. Besides using debugger should be the last option in my opinion. So I logged on to the machine. Well before that I had to go a through a process to get permission and get the environment ready for security/compliance reason. I observed that the process was recycling by using taskmgr. So I did the below things
1. Checked the log. No exceptions. No errors. No logs related to the crash
2. Checked crash dump files. No files were there
3. Checked OS eventlog. No crash events. Normally for any process crash there is at least one event
I never saw a process crash like this. But I knew the related code which caused this. It was a background task kicked off by calling Task.Run(). I thought that there must be some unhandled exceptions but I checked the code. There was already a catch block which caught all exceptions. However it was still possible that the catch block might throw exception. So I put another catch all inside the catch block to make sure that no unhandled exception was thrown.
However it did not work. The process was still crashing. I asked around about how a process could crash since I was relatively new to the code base. I got some pointers but none of them could match what I observed.
Then I had to turn to the last option. I copied windbg to all the nodes (remember that this is distributed system and there are multiple replicas) and hooked it up with the process. Boom! I got an exception in the debugger and then the process crashed.
I looked at the exception which was a normal .net exception. I could not think of a reason how it could cause crash. Then I looked into the call stack and saw the below line.
000007f9`8538bb27 : 0000001b`9c29cf80 0000001c`2c50c880 0000001c`2c50c888 0000001c`2c50c890 : mscorlib_ni!System.Environment.Exit(Int32)+0x7b
That was really weird. Please check my other blog to see the end result.