The default number of threads in an I/O completion port is the number of processors, but is that logical or physical processors?


The Create­Io­Completion­Port function lets you specify how many concurrent threads can be processing work from the completion port. But if you pass a value of 0 for Number­Of­Concurrent­Threads, "the system allows as many concurrently running threads as there are processors in the system."

Are these physical processors or logical processors?

They are logical processors.

One way to figure this out is that the goal of the I/O completion port is to keep CPU usage at exactly 100%. If the I/O completion port consumed only as many threads as physical processors, then you wouldn't quite get to 100% CPU utilization, because there would be extra capacity on the unused logical processors.

Another way to figure this out is to use your understanding of history. I/O completion ports were created long before hyperthreading was invented, so this code treated all logical processors as full-fledged processors.

And a third way to figure it out is to test it.

#include <windows.h>
#include <strsafe.h>

#define THREADS 10

LONG ActiveThreads = 0;

DWORD CALLBACK IoThread(void* Port)
{
 DWORD Bytes;
 ULONG_PTR Key;
 OVERLAPPED* Overlapped;
 while (GetQueuedCompletionStatus(Port, &Bytes,
              &Key, &Overlapped, 1000)) {
  TCHAR msg[64];
  auto count = InterlockedIncrement(&ActiveThreads);
  StringCchPrintf(msg, ARRAYSIZE(msg), TEXT(">%d\r\n"), count);
  OutputDebugString(msg);

  DWORD Tick = GetTickCount();
  while (GetTickCount() - Tick < 1000) { }

  count = InterlockedDecrement(&ActiveThreads);
  StringCchPrintf(msg, ARRAYSIZE(msg), TEXT("<%d\r\n"), count);
  OutputDebugString(msg);
 }
 return 0;
}

int __cdecl main(int, char**)
{
 HANDLE Port = CreateIoCompletionPort(INVALID_HANDLE_VALUE,
                nullptr, 0, 0);

 HANDLE ThreadHandles[THREADS];
 int i;
 for (i = 0; i < THREADS; i++) {
  DWORD Id;
  ThreadHandles[i] = CreateThread(0, 0, IoThread, Port, 0, &Id);
 }

 for (i = 0; i < THREADS * 2; i++) {
  PostQueuedCompletionStatus(Port, 0, 0, nullptr);
 }

 for (i = 0; i < THREADS; i++) {
  WaitForSingleObject(ThreadHandles[i], INFINITE);
 }

 return 0;
}

Pick a value for THREADS that is greater than the number of logical processors.

We start by creating an I/O completion port and a bunch of threads whose job it is to complete work posted to the port. we then post a lot of work to the port and wait for the threads to drain the work.

Each thread grabs a work item, then increments a counter that lets us know how many threads are actively processing work. The thread then goes into a tight spin loop for one second. It has to do this rather than Sleep because the thread needs to be actively doing work for it to be counted against the I/O completion port's concurrency limit.

After wasting some time, the thread decrements the count of active threads, and then goes back to looking for more work.

Along the way, we print the number of active threads.

Run this program, and you'll see that it retires work in chunks, and the number of threads in each chunk is the number of logical processors.

So there, confirmed by experimentation.

Comments (2)
  1. M. Peters says:

    “then you wouldn’t quite get to 100% CPU utilization” that statement is only true if the workloads used are non-optimized.

    Example: Gaming on a 4-Core i7 CPU with or without HyperThreading
    There are benchmarks for highly optimized workloads that show a lower (overall) performance when using 8 Threads instead of 4, because the 4 threads achieve 100% physical core utilization and HT adds switching overhead, slowing everything down.
    For reference: https://www.techpowerup.com/forums/threads/gaming-benchmarks-core-i7-6700k-hyperthreading-test.219417/

    1. I think what Raymond meant is that even on workloads that use close to 100% physical CPU utilization, it’d still show up as something like 50% in your Task Manager with HT enabled.

Comments are closed.

Skip to main content