The question came up on the NTDEV mailing list a while back about how the “Scheduler” works in Windows … specifically about what thread it ran on and how it got control.
The answer is that you should ignore this idea that there’s a “scheduler” as an independent entity. It doesn’t exist like that in Windows.
In windows there are basically two functions for scheduling in the dispatcher. The first selects the most appropriate thread to run on a processor at any given time based on a number of factors such as thread priority, thread affinity, ideal processor, etc… . The second initiates a context switch between the current thread and the new one.
The Ke subsystem implements these functions and uses them to schedule threads in a distributed manner. Basically any time a Ke function is called which changes the state of a thread (to block it, to wake it up, to change its priority, to mark it as having exceeded a quantum) that function will invoke these two other functions to see if a different thread should be running and, if so, to swap over to it.
So you’ve got some user-mode code on a processor. That code will continue to run until it:
- Calls WaitForSingleObject, which calls the NtWaitForSingleObject system call which transitions to kernel mode and (eventually) calls KeWaitForSingleObject. KeWaitForSingleObject links the current thread onto the wait list for the object (say an event) then selects another thread which could be run and does a context switch
Gets a page fault. This triggers the kernel trap handler, which calls the memory manager which will initiate a page-in then calls KeWaitForSingleObject to block the thread until the I/O completes. KeWaitForSingleObject will select a new thread to run and switch to it.
- Is interrupted by a timer interrupt. The kernel’s timer interrupt handler will see if the current thread has run longer than its quota and, if so, will attempt to select a new thread to run. If the current thread is the highest priority thread in the system then it will be left on the processor. Otherwise the kernel code will get select a new one and switch to it.
- Is interrupted by a device interrupt which completes an I/O request initiated by another process. Often this involves boosting the priority of the other process by calling a Ke function. This function sets the new priority, then calls the selection function to see what process should be running on the processor. If there’s a better candidate after the priority boost then Ke will swap to the new thread
There are a number of other conditions which can cause a context switch, and I’ve dramatically oversimplified some of them, but hopefully this will help you get the idea. The responsibility for scheduling in NT is distributed across all threads in the system, which eventually end up in a Ke routine.