Push Locks – What are they?

Pushlocks were a new locking primitive first introduced in Windows Server 2003 and are primarily used in place of spinlocks to protect key kernel data structures. Unfortunately, Pushlocks are not documented in the WDK, and are not available for public use; however, a few internal drivers do use them, so you might see them while debugging a machine. Also, I was digging around on MSDN for Pushlocks, and I found that the Filter Manager does expose certain APIs to use Pushlocks, so you are in luck if you are developing a filter driver!


Gate objects


Pushlocks are built on primitive gate objects, which are defined by KGATE structures. The gate object is a highly optimized version of the basic event object. By not having both the notification and synchronization versions of the basic event object, and by being the exclusive object that a thread can wait on, the code for acquiring and releasing a gate is heavily optimized. Gates even have their own dispatcher lock instead of acquiring the entire dispatcher database.


Unlike spinlocks, which must be acquired exclusively for all operations on a data structure, pushlocks can be shared by multiple “readers” and need only be acquired exclusively when a thread needs to modify the data structure.




When a thread acquires a normal pushlock, the pushlock code marks the pushlock as owned, if it is not owned already. If someone owns the pushlock exclusively, or the thread wants to own the pushlock exclusively while someone else has it in shared mode, the thread allocates a wait block on its stack, initializes a gate object in the wait block, and then add the wait block to the wait list associated with the pushlock. When the thread holding the pushlock finally releases it, it wakes the next waiter by signaling the event in the waiters wait block.


When debugging a machine, there is no easy way to figure out the current owner of the pushlock, apart from doing code review. By looking at the waitlist, you can always figure out the threads trying to get access to it, but since the gate does not keep track of the owner like a regular mutex, it is much harder to find the current owner.

For more details on the operation and structure of a pushlock, please review the Pushlocks section in Windows Internals book, under the System Mechanisms Chapter.


Advantages of using a PushLock


If a pushlock is held by one or more readers, threads that want to modify the data structure are queued for exclusive access. This queuing mechanism provides some of the same benefits of queued spinlocks—for example, FIFO ordering, elimination of race conditions, and reduced cache thrashing when more than one thread is waiting for the pushlock.


Another advantage of using a pushlock is the size. A regular resource object is 56 bytes, however a pushlock is the size of a pointer. Apart from a small memory footprint, this helps especially in the non-contended case, since pushlocks do not require lengthy operations to perform acquisition or release. Because the pushlock fits in one “machine word”, the CPU can use atomic operations to compare and exchange the old lock with the new one.


Pushlocks are also self-optimizing in the sense that the list of threads waiting on a pushlock will be periodically rearranged to provide fairer behavior when the pushlock is released.


Cache Aware Pushlocks


A cache-aware pushlock adds to the basic pushlock by allocating a normal pushlock for each processor in the system and associating it with the cache-aware pushlock. When a thread wants to acquire a cache-aware pushlock for shared access, it simply acquires the pushlock on that processor; however if it needs to acquire the lock for exclusive access, it has to acquire the pushlocks for each processor in exclusive mode.


What does a Pushlock look like?


3: kd> !thread 8c9764c0

THREAD 8c9764c0  Cid 2410.1be4  Teb: 7ff9f000 Win32Thread: e5c6f298 GATEWAIT

Stack Init b386b000 Current b386a978 Base b386b000 Limit b3867000 Call 0

ChildEBP RetAddr  Args to Child             

b386a990 80833485 8c9764c0 8c9764e4 00000003 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])

b386a9bc 8082ffe0 b06a6a03 e11e0b18 b386aa54 nt!KiSwapThread+0x2e5 (FPO: [Non-Fpo]) (CONV: fastcall)

b386a9e4 8087d722 00000000 e11e0b08 e11e0b18 nt!KeWaitForGate+0x152 (FPO: [Non-Fpo]) (CONV: fastcall)

e11e0b18 00000000 0c050204 7346744e e37b2808 nt!ExfAcquirePushLockExclusive+0x112 (FPO: [Non-Fpo]) (CONV: fastcall)


Above is a snipped output from a dump that I was recently looking at. From the stack, you can see the ExfAcquirePushLockExclusive call trying to acquire the pushlock, which then calls KEWaitForGate. In this case, the lock was already acquired, so this thread allocated a wait block on its stack, and then added itself to the waitlist.

Also, the stack is broken due to the fastcall, therefore the debugger cannot display it entirely. So we can manually reconstruct the stack by passing parameters to the kb command.

k[b|p|P|v] = BasePtr StackPtr InstructionPtr


To get the arguments, we first dump the stack manually using the dps command with the current esp.

3: kd> dps b386a978 l50

b386a978  b386ad40

b386a97c  00000000

b386a980  8088dafe nt!KiSwapContext+0x26

b386a984  b386a9bc

b386a988  b386aa00

b386a98c  f773f120

b386a990  8c9764c0

b386a994  80833485 nt!KiSwapThread+0x2e5

b386a998  8c9764c0

b386a99c  8c9764e4

b386a9a0  00000003

b386a9a4  8c9764c0

b386a9a8  00000003

b386a9ac  00000002

b386a9b0  00000002

b386a9b4  f773fa7c

b386a9b8  008c0030

b386a9bc  b386a9e4

b386a9c0  8082ffe0 nt!KeWaitForGate+0x152

b386a9c4  b06a6a03

b386a9c8  e11e0b18

b386a9cc  b386aa54

b386a9d0  00000000

b386a9d4  8c976504

b386a9d8  00000000

b386a9dc  0000001c

b386a9e0  00000000

b386a9e4  b386aa40

b386a9e8  8087d722 nt!ExfAcquirePushLockExclusive+0x112

b386a9ec  00000000

b386a9f0  e11e0b08

b386a9f4  e11e0b18

b386a9f8  b386aa40

b386a9fc  8096e9a9 nt!SeOpenObjectAuditAlarm+0x1cf

b386aa00  00040007

b386aa04  00000000

b386aa08  8c976568

b386aa0c  8c976568

b386aa10  b06a6a00

b386aa14  b4ee0a00

b386aa18  b127cc10

b386aa1c  00000000

b386aa20  00000001

b386aa24  80a60456 hal!KfLowerIrql+0x62

b386aa28  b386ac04

b386aa2c  8d117800

b386aa30  00000000

b386aa34  00000000

b386aa38  b386aa20

b386aa3c  01943080

b386aa40  b386aa64

b386aa44  808b7a14 nt!CmpCheckRecursionAndRecordThreadInfo+0x2a


From the output above, we can see the stack. To reconstruct the stack, we can get the ebp, esp, and eip from the stack for the ExfAcquirePushLockExclusive frame, and pass it to the kb command. Voila!


3: kd> kb = b386aa40 b386a9e4 8087d722

ChildEBP RetAddr  Args to Child             

b386aa40 808b7a14 b386ac04 e11e0b18 e11e0b18 nt!ExfAcquirePushLockExclusive+0x112

b386aa64 808b7b09 e11e0b18 b386aa80 e101bf40 nt!CmpCheckRecursionAndRecordThreadInfo+0x2a

b386aaa4 808da118 0000001c b386ab58 00000001 nt!CmpCallCallBacks+0x6b

b386ab90 80937942 e101bf40 00000000 89f13648 nt!CmpParseKey+0xd4

b386ac10 80933a76 00000000 b386ac50 00000040 nt!ObpLookupObjectName+0x5b0

b386ac64 808bb471 00000000 8e930480 00000d01 nt!ObOpenObjectByName+0xea

b386ad50 808897bc 0243eba0 00020019 0243eb68 nt!NtOpenKey+0x1ad

b386ad50 7c8285ec 0243eba0 00020019 0243eb68 nt!KiFastCallEntry+0xfc

WARNING: Frame IP not in any known module. Following frames may be wrong.

0243eba4 00000000 00000000 00000000 00000000 0x7c8285ec



Share this post :

Comments (0)

Skip to main content