Access Violation dumps and metadata corruption

This is an issue that I had been busy working on these past few days. We were getting AV dumps on the DB, and when I looked at the stack, I found that SQL was calling a function to get the name of a column(given a table name), an index id and key id. This function call resulted in a an exception being generated, which is what caused the AV dump:

Child-SP          RetAddr           Call Site
00000000`249676c8 00000000`76ecc0b0 ntdll!ZwWaitForSingleObject+0xa
00000000`249676d0 00000000`01596369 kernel32!WaitForSingleObjectEx+0x9c
00000000`24967790 00000000`01595d2b sqlservr!CDmpDump::DumpInternal+0x4d9
00000000`24967890 00000000`01f95080 sqlservr!CDmpDump::Dump+0x3b
00000000`249678e0 00000000`0204ebae sqlservr!SQLDumperLibraryInvoke+0x1a0
00000000`24967910 00000000`021968d5 sqlservr!CImageHelper::DoMiniDump+0x3ce
00000000`24967af0 00000000`0219728c sqlservr!ContextDumpNoStackOverflow+0x325
00000000`24968340 00000000`021978ea sqlservr!ContextDump+0x7bc
00000000`24968da0 00000000`01f6db08 sqlservr!stackTraceExceptionFilter+0x24a
00000000`24968df0 00000000`01f820d8 sqlservr!SOS_OS::ExecuteDumpExceptionHandlerRoutine+0x28
00000000`24969080 00000000`0267fda5 sqlservr!GenerateExceptionDump+0x48
00000000`249690b0 00000000`0267fe4c sqlservr!ex_trans_cexcept+0x45
00000000`249690f0 00000000`74f6acf0   sqlservr!SOS_SEHTranslator+0x4c
00000000`24969120 00000000`74f69e0b msvcr80!_CallSETranslator+0x40
00000000`24969190 00000000`74f6a62b msvcr80!FindHandlerForForeignException+0x9b
00000000`24969230 00000000`74f6a86b msvcr80!FindHandler+0x63b
00000000`249697e0 00000000`74f6abe7 msvcr80!__InternalCxxFrameHandler+0x1fb
00000000`24969830 00000000`00d10cf3 msvcr80!__CxxFrameHandler+0x77
00000000`24969880 00000000`770058dd sqlservr!__GSHandlerCheck_EH+0x63
00000000`249698b0 00000000`770096d7 ntdll!RtlpExecuteHandlerForException+0xd
00000000`249698e0 00000000`77016e08 ntdll!RtlDispatchException+0x20c
00000000`24969f80 00000000`025ff563 ntdll!KiUserExceptionDispatch+0x2e
00000000`2496a520 00000000`00bc516b sqlservr!WstrIndkeyWstrI4I4+0x323
00000000`2496a670 00000000`00869f6c sqlservr!CQScanNLJoinNew::GetRowHelper+0x119b
00000000`2496abd0 00000000`00c72c2b sqlservr!CQScanSortNew::BuildSortTable+0x18c
00000000`2496ac90 00000000`0086cc15 sqlservr!CQScanTopSortNew::Open+0x47
00000000`2496acc0 00000000`0086cb2e sqlservr!CQueryScan::Startup+0xcd
00000000`2496ad10 00000000`0086bdea sqlservr!CXStmtQuery::SetupQueryScanAndExpression+0x412
00000000`2496ad70 00000000`0087389b sqlservr!CXStmtQuery::ErsqExecuteQuery+0x2f8
00000000`2496dd80 00000000`0086fe6b sqlservr!CMsqlExecContext::ExecuteStmts<1,1>+0xcc2
00000000`2496e030 00000000`0086f789 sqlservr!CMsqlExecContext::FExecute+0x58b
00000000`2496e1b0 00000000`0245bcfd sqlservr!CSQLSource::Execute+0x319
00000000`2496e2e0 00000000`02460b34 sqlservr!ExecuteSql+0x72d
00000000`2496ed60 00000000`02e43271 sqlservr!CSpecProc::ExecuteSpecial+0x234
00000000`2496ee80 00000000`00871270 sqlservr!CSpecProc::Execute+0x1f1
00000000`2496eff0 00000000`008cf87a sqlservr!process_request+0x370
00000000`2496f2b0 00000000`0080b29b sqlservr!process_commands+0x1ba
00000000`2496f4b0 00000000`0080af5a sqlservr!SOS_Task::Param::Execute+0x11b
00000000`2496f5d0 00000000`0080ac35 sqlservr!SOS_Scheduler::RunTask+0xca
00000000`2496f660 00000000`00dbc560 sqlservr!SOS_Scheduler::ProcessTasks+0x95
00000000`2496f6d0 00000000`00dbaca0 sqlservr!SchedulerManager::WorkerEntryPoint+0x110
00000000`2496f790 00000000`00dba640 sqlservr!SystemThread::RunWorker+0x60
00000000`2496f7c0 00000000`00dbc6ff sqlservr!SystemThreadDispatcher::ProcessWorker+0x12c
00000000`2496f850 00000000`74f337d7 sqlservr!SchedulerManager::ThreadEntryPoint+0x12f
00000000`2496f8e0 00000000`74f33894 msvcr80!_callthreadstartex+0x17
00000000`2496f910 00000000`76ebbe3d msvcr80!_threadstartex+0x84
00000000`2496f940 00000000`76ff6861 kernel32!BaseThreadInitThunk+0xd
00000000`2496f970 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

From the dump, I was able to extract the dbid and the object id, and when we tried to run update statistics on the table in question, it failed with the error:-

Msg 0, Level 11, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.

So definitely there was some Metadata corruption here. However, we were able to update statistics on all the indexes and stats on the table explicitly without any issues, but the update table continued to fail with the same error.

It was then that we stumbled upon another interesting piece of the puzzle. When we ran a “Select * from sysindexes”, the query failed with the same error…!!! Based on this, and some more research, we summed up that there was a statistic against this table (and a system statistic at that), which was present in sysindexes. This makes sense, as for individual index rebuilds, we would not need to scan the sysindexes dmv, but when running update statistics against the table, we would need to scan the dmv based on the object id, which causes an AV. We were able to confirm this by querying only the status, id, name and indid columns of sysindexes for the object id in question. We saw that the statistic was mapped to an index id which was not present in any of the other dmv’s such as sys.stats, sys.stats_columns, sys.sysindexkeys, etc.

We even found a KB explaining the issue, and a fix for it, which would prevent the issue from occurring in the future (see here).

But the question remained, how do we get rid of it now? As you probably guessed, we just need to delete the offending statistic, right? But when trying to run the Drop statistics statement, we were getting the error:-

Msg 3701, Level 11, State 6, Line 1

Cannot drop the statistics 'Fielders._WA_Sys_08000002_1FA46B10', because it does not exist or you do not have permission.

So we connected through the DAC Connection (simple, just type Admin: ServernameInstancename in the connection string in SSMS, but remember you have to be a sysadmin for this), and enclosed the statistic name in square brackets, and ran the drop statistics command, and would you believe it, it worked like a charm.

Hope that the next time you run into a metadata corruption issue like this, you know what to do.

As always, feedback/comments/suggestions are both welcome and solicited.