NDIS Case Study 1 - NDIS Packet Double Completion

Hi, this is Anurag again. Here is a case study of an NDIS driver causing a problem due to double completion of a send packet.

 

A protocol driver allocates a NDIS packet and gives it to the miniport driver to be sent on the wire. A miniport driver is supposed to send or complete the packet, but miniport driver is supposed to complete the packet only once. The moment it tries to complete a packet twice (generally due to a bug in the miniport driver), the machine will bugcheck with a stack like the one shown below. The crash will happen either in a protocol driver or in the NDIS driver. The bugcheck would be the result of a trap on accessing an invalid memory in the packet.

 

Each time a packet gets completed the miniport stamps it as completed by putting a string “COM” just before the packet (somewhere in the packet stack). If a miniport driver tries to complete this packet again, the system would crash. The miniport driver which tried to complete the completed packet again is at fault and should be fixed.

 

When reviewing the bugcheck, !analyze –v will give you the stack.

 

1: kd> !analyze -v

*******************************************************************************

* *

* Bugcheck Analysis *

* *

*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)

An attempt was made to access a pageable (or completely invalid) address at an

interrupt request level (IRQL) that is too high. This is usually

caused by drivers using improper addresses.

If kernel debugger is available get stack backtrace.

Arguments:

Arg1: 00000008, memory referenced

Arg2: d0000002, IRQL

Arg3: 00000000, value 0 = read operation, 1 = write operation

Arg4: f78521e0, address which referenced memory

Debugging Details:

------------------

READ_ADDRESS: 00000008

CURRENT_IRQL: 2

FAULTING_IP:

NDIS!ndisMSendCompleteX+71

f78521e0 8b7808 mov edi,dword ptr [eax+8]

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xD1

PROCESS_NAME: System

TRAP_FRAME: f78b2e18 -- (.trap 0xfffffffff78b2e18)

ErrCode = 00000000

eax=00000000 ebx=902bbab0 ecx=00000002 edx=fffffffe esi=8f3aabf8 edi=901bd440

eip=f78521e0 esp=f78b2e8c ebp=f78b2ea0 iopl=0 nv up ei ng nz na po nc

cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010282

NDIS!ndisMSendCompleteX+0x71:

f78521e0 8b7808 mov edi,dword ptr [eax+8] ds:0023:00000008=????????

Resetting default scope

LAST_CONTROL_TRANSFER: from f78521e0 to 8088c963

STACK_TEXT:

f78b2e18 f78521e0 badb0d00 fffffffe 00000000 nt!KiTrap0E+0x2a7

f78b2ea0 b9f47b5a 902bbab0 8f3aabf8 00000000 NDIS!ndisMSendCompleteX+0x71

f78b2ec0 f78521f9 8f411000 013aabf8 00000000 X_Miniport_Network_Teamming_Driver +0x7b5a

f78b2ee4 ba04a0c8 9029b5e8 8f3aabf8 00000000 NDIS!ndisMSendCompleteX+0x8a

f78b2f00 ba052744 904d8eb8 8f3aabf8 00000000 Y_Miniport_NIC_Driver +0x20c8

f78b2f3c ba050339 8f8c5000 00000000 f78b2f68 Y_Miniport_NIC_Driver +0xa744

f78b2f70 ba05065f 008c5000 8f8c5844 f78b2f8c Y_Miniport_NIC_Driver +0x8339

f78b2f80 ba0493f1 8f8c5000 f78b2f9c f783a0e8 Y_Miniport_NIC_Driver +0x865f

f78b2f8c f783a0e8 904d8eb8 8f8c58bc f78b2ff4 Y_Miniport_NIC_Driver +0x13f1

f78b2f9c 808320f0 8f8c58bc 8f8c5844 00000000 NDIS!ndisMDpcEx+0x1f

f78b2ff4 8088db27 b88fe454 00000000 00000000 nt!KiRetireDpcList+0xca

f78b2ff8 b88fe454 00000000 00000000 00000000 nt!KiDispatchInterrupt+0x37

8088db27 00000000 0000000a 0083850f bb830000 0xb88fe454

 

1: kd> .trap 0xfffffffff78b2e18

ErrCode = 00000000

eax=00000000 ebx=902bbab0 ecx=00000002 edx=fffffffe esi=8f3aabf8 edi=901bd440

eip=f78521e0 esp=f78b2e8c ebp=f78b2ea0 iopl=0 nv up ei ng nz na po nc

cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010282

NDIS!ndisMSendCompleteX+0x71:

f78521e0 8b7808 mov edi,dword ptr [eax+8] ds:0023:00000008=????????

Looking at the stack, it appears to be a NULL memory address being accessed that is causing problem. This could lead us to believe that this is a memory corruption problem. This is where we go off-track on the investigation.

 

The current NDIS packet status should be checked if you bugcheck on the NDIS stack and in NDIS!ndisMSendCompleteX.

 

Some basic debugging will display the NDIS_PACKET on the stack and in the ESI register (8f3aabf8).

 

Looking at the memory area before the packet shows the string “COM” which is the indication that it was marked as completed.

 

Here is how it looks.

For reference this is how a non-completed packet looks like

So for this issue the X_Miniport_Network_Teamming_Driver was at fault which tried to complete a completed packet. Fortunately the vendor for the driver was aware of the problem and was able to give the customer a fix.

 

For a 64 bit dump we do a “dc <Packet Address>-0x68” to look at the memory area before the packet. Below is an example.

 

BugCheck D1, {0, 2, 0, fffffade57895375}

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0000000000000000, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffffade57895375, address which referenced memory

0: kd> r

Last set context:

rax=0000000000000000 rbx=fffffade6f203000 rcx=0000000000000000

rdx=fffffade6f564740 rsi=fffffade6faa3548 rdi=0000000000000000

rip=fffffade57895375 rsp=fffff8000012b990 rbp=fffffade6f5647d0

 r8=0000000000000000 r9=5bcdd13016000000 r10=5bcdd13016110009

r11=0000000000000000 r12=0000000000000000 r13=0000000000000000

r14=0000000000000000 r15=0000000000000000

iopl=0 nv up ei pl nz na pe nc

cs=0010 ss=0018 ds=0002 es=0000 fs=0000 gs=0000 efl=00010202

tcpip!TCPSendComplete+0x124:

fffffade`57895375 488b1b mov rbx,qword ptr [rbx] ds:0002:fffffade`6f203000=fffffade6f52e010

 

0: kd> kvn

  *** Stack trace for last set context - .thread/.cxr resets it

 # Child-SP RetAddr : Args to Child : Call Site

00 fffff800`0012b990 fffffade`578975a7 : 00000000`00000000 00000000`00000000 fffffade`6e8980b0 fffffade`6f564740 : tcpip!TCPSendComplete+0x124

01 fffff800`0012b9e0 fffffade`5789a89e : fffffade`6f1f1700 fffffade`00000000 00000000`00026200 fffffade`6e8980b0 : tcpip!IPSendComplete+0x477

02 fffff800`0012bab0 fffffade`5ae37237 : fffff800`0083f480 fffffade`6f887000 fffffade`6faab228 fffffade`6e8980b0 : tcpip!ARPSendComplete+0x14a

03 fffff800`0012baf0 fffffade`58443c15 : fffffade`6f887000 fffffade`6e8980b0 fffffade`6f887038 fffffade`6c31e3b0 : NDIS!ndisMSendCompleteX+0xda

04 fffff800`0012bb40 fffffade`5ae37237 : fffffade`6faa3010 00000000`00000002 fffffade`6fc16228 fffffade`6c31e3b0 : NDIS_IM_TEAM_DRIVER!po_SendDone+0x75

05 fffff800`0012bb80 fffffade`596f9a4c : fffffade`6faa3010 fffff800`0012bc50 00000000`00000000 00000000`00000000 : NDIS!ndisMSendCompleteX+0xda

06 fffff800`0012bbd0 fffffade`5af5e9cd : fffff800`0012bcb0 fffffade`6f52e010 fffffade`6f203000 00000000`00000006 : NDIS_MP_DRIVER!l2nd_indicate_tx+0xfc

07 fffff800`0012bc30 fffffade`5af64889 : fffffade`6f52e010 fffffade`6f5315c8 00000000`00000000 00000000`00000000 : NDIS_MP_DRIVER!um_bdrv_indicate_tx_done+0x121

08 fffff800`0012bc90 fffffade`5af64bab : fffffade`6f52e010 00000000`00000001 00000000`00000001 fffff800`011b2080 : NDIS_MP_DRIVER!service_tx_intr+0xa9

09 fffff800`0012bce0 fffffade`5af64d63 : fffffade`6f52e010 00000000`0005ffd4 00000000`00000000 0000023e`f26bf2bf : NDIS_MP_DRIVER!service_intr_mp+0xb3

0a fffff800`0012bd10 fffff800`010285a1 : 00000000`00000000 fffffade`5af64cb4 fffff800`0012bd68 00000000`00000018 : NDIS_MP_DRIVER!um_bdrv_dpc+0xaf

0b fffff800`0012bd40 fffff800`01067c10 : fffff800`011b0180 fffff800`011b0180 00000000`0005ffd4 fffff800`011b4500 : nt!KiRetireDpcList+0x150

0c fffff800`0012bdd0 fffff800`014141d1 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x50

0d fffff800`0012be00 00000000`fffff800 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemStartup+0x1bf

Note that in this bugcheck we crash in the transport driver in the above stack. But the problem is that the packet (fffffade`6e8980b0) came from the NDIS driver NDIS_IM_TEAM_DRIVER.

 

0: kd> dc fffffade`6e8980b0 -0x68

fffffade`6e898048 00000000 00000000 4d4f4336 00000000 ........6COM....

fffffade`6e898058 00000000 00000000 00000000 00000000 ................

fffffade`6e898068 00000000 00000000 00000000 00000000 ................

fffffade`6e898078 00000000 00000000 00000000 00000000 ................

fffffade`6e898088 00000000 00000000 00000000 00000000 ................

fffffade`6e898098 00000000 00000000 00000000 00000000 ................

fffffade`6e8980a8 ffffffff ffffffff 6e898430 fffffade ........0..n....

fffffade`6e8980b8 6f564740 fffffade 6d9ee460 fffffade @GVo....`..m....

After reading this, I suspect you would have at least these two questions for me:

 

Q1: How did I find out that “COM” is use as an identifier for a completed packet?

A1: Well, mainly code review. This is just intended as a debug trick to verify that the packet is already completed.

 

Q2: Why did I use 0x38 (for 32 bit) or 0x68 (for 64 bit) as an offset to check the memory contents before the packet address?

A2: Again code review shows the field to be updated with string ‘COM” can seen going 0x38 bytes back from the packet start address.

 

NDIS double packet completion is a common issue that would crash the system. The above trick can certainly be used to isolate the culprit NDIS driver.