Scalable asynchronous I/O

While working on the MSMQ transport for WSE 2.0, my thoughts have drifted to what it takes to do scalable I/O.  Creating a thread-per-operation is very bad behavior, and doesn’t scale well at all – but is a very easy programming model.  On the other hand, using completion ports is very scalable, but almost as equally difficult to program.

The .NET Framework provides a thread pool that can be drawn upon in a pinch to gain asynchronous invocation of any method.  It’s somewhere between the above two models and probably leans less on the side of completion ports and more on the side of using a thread-per-operation.

Why do I care?  Because System.Messaging.MessageQueue provides a BeginReceive method that may or may not use the thread pool and I’m torn between “doing it right” with the transport and making sure that it’s scalable and doesn’t starve out the thread pool vs. “getting it done” so we have a working sample and then come back to tweaking it later.  In all likelihood, I’ll do the latter.

What would be most useful would be a note on the .NET Framework documentation as to the use of the thread pool by an asynchronous operation.  I.e., does it use the thread pool or does it use operating system primitives in the context of its own threads?