Windows I/O threads vs. managed I/O threads

A question recently came up on an internal discussion forum, which I'll paraphrase:  The Windows QueueUserWorkItem API has an option to queue to an I/O thread.  Why doesn't the managed ThreadPool.QueueUserWorkItem support this option?

First, some background:

In the Windows thread pool (the old one, not the new Vista thread pool), an "I/O thread" is one that processes APCs (Asynchronous Procedure Calls) queued by other threads, or by I/O initiated from the I/O threads.  One example of an I/O functions that completes via APCs is ReadFileEx.  The "non-I/O" threads get their work from a completion port, either as a result of QueueUserWorkItem, or I/O initiated on a handle bound to the threadpool with BinIoCompletionCallback.  So they are both geared toward processing I/O completions, but they just use different mechanisms.

In the managed ThreadPool, we use the terms "worker thread" and "I/O thread."  In our case, an I/O thread is one that waits on a completion port; i.e., it's exactly equivalent to Windows' non-I/O thread.  How confusing!  Our "worker threads" wait on a simple user-space work queue, and never enter an alertable state (unless user code does so), and so explicitly do not process APCs.  Managed "worker threads" have no equivalent in the Windows thread pool, just as Windows "I/O threads" have no managed equivalent.

The managed QueueUserWorkItem queues work to the "worker threads" only.  UnsafeQueueNativeOverlapped queues to the I/O threads, as do completions for asynchronous I/O performed on kernel objects that have been bound to the ThreadPool via BindHandle.

Why don't we support APCs as a completion mechanism?  APCs are really not a good general-purpose completion mechanism for user code.  Managing the reentrancy introduced by APCs is nearly impossible; any time you block on a lock, for example, some arbitrary I/O completion might take over your thread.  It might try to acquire locks of its own, which may introduce lock ordering problems and thus deadlock.  Preventing this requires meticulous design, and the ability to make sure that someone else's code will never run during your alertable wait, and vice-versa.  This greatly limits the usefulness of APCs.

And APCs don't scale well, except in certain very constrained scenarios, because there's no load-balancing of completions across threads; all I/O initiated by a given thread always completes with an APC queued to that same thread.  You can, of course, implement your own load balancing, by using the APC to notify another thread of the completion, but you'll never do better in user-space than the kernel does with completion ports.  So we provide a rich async I/O infrastructure based on completion ports, and nothing else.

The real question to answer, then, is: why does Windows allow this option in the first place?  I would guess that this is because there is a lot of unmanaged code out there that uses APCs, so the unmanaged thread pool needs to support APCs in order to support running that large body of code on the thread pool.  This doesn't apply to managed code; when the managed thread pool was first introduced there was no managed code in existence, so there was no backward-compatibility requirement to support APCs.