Developing the method for taking advantage of the fact that the OVERLAPPED associated with asynchronous I/O is passed by address


You can take advantage of the fact that the OVERLAPPED associated with asynchronous I/O is passed by address, but there was some confusion about how this technique could "work" when kernel mode has no idea that you are playing this trick.

Whether kernel mode is in on the trick is immaterial since it is not part of the trick.

Let's start with a version of the code which does not take advantage of the OVERLAPPED structure address in the way described in the article. This is a technique I found in a book on advanced Windows programming:

#define MAX_OVERLAPPED 10 // let's do 10 I/O's at a time

// data to associate with each OVERLAPPED
struct OTHERDATA { ... };

OVERLAPPED MasterOverlapped[MAX_OVERLAPPED];
OTHERDATA OtherData[MAX_OVERLAPPED];

OTHERDATA* FindOtherDataFromOverlapped(OVERLAPPED *lpOverlapped)
{
 ptrdiff_t index = lpOverlapped - MasterOverlapped;
 return &OtherData[index];
}

// I/O is issued via
// ReadFileEx(hFile, lpBuffer, nNumberOfBytesToRead,
//            &MasterOverlapped[i], CompletionRoutine);

void CALLBACK CompletionRoutine(
    DWORD dwErrorCode,
    DWORD dwNumberOfBytesTransferred,
    LPOVERLAPPED lpOverlapped)
{
 OTHERDATA *lpOtherData =
                       FindOtherDataFromOverlapped(lpOverlapped);
 ... do stuff with lpOverlapped and lpOtherData ...
}

This version of the code uses the address of the OVERLAPPED structure to determine the location in the MasterOverlapped table and uses the corresponding entry in the parallel array at OtherData to hold the other data.

Let's make this code worse before we make it better:

OTHERDATA* FindOtherDataFromOverlapped(OVERLAPPED *lpOverlapped)
{
 for (int index = 0; index < MAX_OVERLAPPED; index++) {
  if (&MasterOverlapped[index] == lpOverlapped) {
   return &OtherData[index];
  }
 }
 FatalError(); // should never be reached
}

Instead of doing simple pointer arithmetic to recover the index, we walk the array testing the pointers. This is naturally worse than doing pointer arithmetic, but watch what this step allows us to do: First, we reorganize the data so that instead of two parallel arrays, we have a single array of a compound structure.

struct OVERLAPPEDEX
{
 OVERLAPPED Overlapped;
 OTHERDATA OtherData;
};

OVERLAPPEDEX Master[MAX_OVERLAPPED];

OTHERDATA* FindOtherDataFromOverlapped(OVERLAPPED *lpOverlapped)
{
 for (int index = 0; index < MAX_OVERLAPPED; index++) {
  if (&Master[index].Overlapped == lpOverlapped) {
   return &Master[index].OtherData;
  }
 }
 FatalError(); // should never be reached
}

// I/O is issued via
// ReadFileEx(hFile, lpBuffer, nNumberOfBytesToRead,
//            &Master[i].Overlapped, CompletionRoutine);

All we did was consolidate the parallel arrays into a single array.

Now that it's an array of compound structures, we don't need to carry two pointers around (one to the OVERLAPPED and one to the OTHERDATA). We can just use a single OVERLAPPEDEX pointer and dereference either the Overlapped or the OtherData part.

OVERLAPPEDEX* FindOverlappedExFromOverlapped(
    OVERLAPPED *lpOverlapped)
{
 for (int index = 0; index < MAX_OVERLAPPED; index++) {
  if (&Master[index].Overlapped == lpOverlapped) {
   return &Master[index];
  }
 }
 FatalError(); // should never be reached
}

void CALLBACK CompletionRoutine(
    DWORD dwErrorCode,
    DWORD dwNumberOfBytesTransferred,
    LPOVERLAPPED lpOverlapped)
{
    OVELRAPPEDEX *lpOverlappedEx =
                    FindOverlappedExFromOverlapped(lpOverlapped);
    ... do stuff with lpOverlappedEx ...
}

Finally, we can optimize the FindOverlappedExFromOverlapped function that we de-optimized earlier. Observe that the de-optimized loop is an example of the "for/if" anti-pattern.

The "for/if" anti-pattern goes like this:

for (int i = 0; i < 100; i++) {
 if (i == 42) do_something(i);
}

This can naturally be simplified to

do_something(42);

Our FindOverlappedExFromOverlapped function is a special case of this anti-pattern. It becomes more evident if we do some rewriting. Start with

&Master[index].Overlapped == lpOverlapped

Apply CONTAINING_RECORD to both sides.

CONTAINING_RECORD(&Master[index].Overlapped, OVERLAPPEDEX, Overlapped) ==
    CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped)

The left-hand side of the comparison simplifies to

    &Master[index]

resulting in

&Master[index] ==
   CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped)

Recall that a[b] is equivalent to *(a+b), and therefore &a[b] is equivalent to a+b.

Master + index ==
   CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped)

Now subtract Master from both sides:

index == CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped) - Master

We have transformed the test into a clear case of the for/if anti-pattern, and the function can be simplified to

OVERLAPPEDEX* FindOverlappedExFromOverlapped(
    OVERLAPPED *lpOverlapped)
{
 ptrdiff_t index =
   CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped) - Master;
 return &Master[index];
}

Again, rewrite &a[b] as a+b:

 return Master + index;

Substitute the value of index computed on the previous line:

 return Master + 
   CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped) - Master;

The two occurrences of Master cancel out, leaving

OVERLAPPEDEX* FindOverlappedExFromOverlapped(
    OVERLAPPED *lpOverlapped)
{
 return CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped);
}

And there you have it. By a series of purely mechanical transformations, we have rediscovered the technique of extending the OVERLAPPED structure.

Comments (26)
  1. Cory Nelson says:

    "The Old New Thing" indeed! hehe

  2. Krunch says:

    This is also how linked list work in the Linux kernel. You embed the list struct in the struct you want to list and given a list pointer, you use container_of() to reach your data.

    kernelnewbies.org/…/LinkedLists

  3. Pavlik says:

    Extenting OVERLAPPED structure is so common pattern but yet people have to rediscover it over and over and over again. I wonder what could be done to make it more discoverable.

  4. Tyler says:

    @Pavlik well, the most common way to make this sort of thing more discoverable is to have Raymond Chen write about it on his blog.

  5. peterchen says:

    @Pavlik: By adding it as community content to the "OVERLAPPED" structure docunmentaiton in MSDN?

    (msdn.microsoft.com/…/ms684342(VS.85).aspx)

    (How can I clickify a link over there?)

  6. zonk says:

    That's crazy.

    I'm passing some pointer to some function. It should read all data it needs and save it to it's internal storage. Then the function is calling me back and passing another pointer. I'm reading all I need to, and return call. This pointer and that pointer must have nothing common, and if they do – that is unreliable feature. Isn

    t it programming ground rules?

  7. Joshua says:

    @zonk: It's a common pattern.

    svn team calls it baton. .NET calls it Tag. Windows calls it LPARAM (see EnumWindows).

  8. zonk says:

    @Joshua:

    Tag is Object. LPARAM is almost void*. They are special fields of structure, not the structure itself.

  9. Simon Buchan says:

    @Joshua: Reserving a user data pointer is fairly different to embedding from a lifetime management POV, remember.

    @zonk: Why on earth would you expect a pointer argument to be copied? This is when ownership rules are useful, instead of a pain in the ass – if you own a pointer and have to reclaim it yourself, then obviously the other party is required to not mess with it, it has to give the same pointer back so you know what to free, at the minimum.

  10. Gabe says:

    zonk: The OVERLAPPED struct isn't immutable. When you pass it to an I/O function, you are expecting that the memory pointed to will be modified. Thus, you should darn well better expect that the LPOVERLAPPED you get back is exactly the one you passed in.

  11. L says:

    @zonk: Please read the descriptions of the API functions first:

    GetQueuedCompletionStatus: "A pointer to a variable that receives the address of the OVERLAPPED structure that was specified when the completed I/O operation was started. "

    GetOverlappedResult: "A pointer to an OVERLAPPED structure that was specified when the overlapped operation was started."

    ReadFile: "Because the read operation starts at the offset that is specified in the OVERLAPPED structure, and ReadFile may return before the system-level read operation is complete (read pending), neither the offset nor any other part of the structure should be modified, freed, or reused by the application until the event is signaled (that is, the read completes)"

    You have to provide this OVERLAPPED structure. It must stay until the operation completes. But you can even access the hEvent handle inside this structure, as documented, so it is not opaque for the application.

  12. Anton Tykhyy says:

    The .NET framework uses this trick when it does asynchronous I/O. Moreover, it uses a completion port to have the kernel invoke callbacks on its thread pool. This is all well and good, but once a handle gets associated with this completion port *any* overlapped operations invoke the CLR's callback. What's more, the CLR's callback does not check whether it was called with CLR's OVERLAPPEDEX and crashes when it gets called as a result of a non-CLR-initiated overlapped operation. There is no way that I know of to *dis*associate a handle from a completion port, so it is impossible to use a single handle (DuplicateHandle does not help, presumably because the completion port is bound at the kernel level) with .NET's asynchronous I/O functions and with homebrew or third-party libraries which do their own overlapped I/O at the same time.

  13. I'm willing to believe that the thread calling WriteFile will be able to access the extended OVERLAPPEDEX members.

    It's a far cry from that to the assertion that the thread on which the completion function is invoked will be able to do the same.

    I've been told that "the documentation says you can" but I don't see such a statement in the documentation.  Can anyone put a finger on a sentence or paragraph?  I do see samples that rely on this behavior.

    I don't find this post any more convincing.  Sure, you can derive that assertion from the book, but this is circular logic: what assurance is there that the book is correct?

    So I'm with zonk on this one.

    [You're making this much harder than it is. If you allocate N bytes of memory at address p, then you pass p to somebody, and that somebody later passes p back to you, then you can access N bytes of memory there. Is this not obvious? -Raymond]
  14. L says:

    There is even the PostQueuedCompletionStatus function (msdn.microsoft.com/…/aa365458%28v=vs.85%29.aspx). The description of its last 3 parameters (the first one specifies the Completion Port) is:

    dwNumberOfBytesTransferred [in]: The value to be returned through the lpNumberOfBytesTransferred parameter of the GetQueuedCompletionStatus function.

    dwCompletionKey [in]: The value to be returned through the lpCompletionKey parameter of the GetQueuedCompletionStatus function.

    lpOverlapped [in, optional]: The value to be returned through the lpOverlapped parameter of the GetQueuedCompletionStatus function.

    This makes it very clear that the completion port mechanism does only return things that are given in before, unchanged.

  15. Ah, there we go – thanks, L.

    msdn.microsoft.com/…/aa365458%28v=vs.85%29.aspx

    Remarks The I/O completion packet will satisfy an outstanding call to the GetQueuedCompletionStatus function. This function returns with the three values passed as the second, third, and fourth parameters of the call to PostQueuedCompletionStatus. The system does not use or validate these values. ***In particular, the lpOverlapped parameter need not point to an OVERLAPPED structure.***

    (my emphasis)

  16. Sunil Joshi says:

    @Maurits

    Perhaps thinking about it another way may help. If for example I allocate an OVERLAPPED structure on the heap or on the stack, it will be followed in memory by other stuff (e.g. the header of the next heap block or some local var). If I pass a POVERLAPPED to a function, I am entitled to expect it won't corrupt the other stuff (if it did that would be a nasty bug). If the other stuff happens to be tracking data for my asynchronous io that doesn't make a difference (the function is still not entitled to write to more than sizeof(OVERLAPPED) bytes.) We do this sort of thing all the time. We pass structures to functions and expect them not to touch surrounding data. So the extra data will still be there when we need it.

  17. Tom says:

    I'm with you, Raymond: there is a lot of over-thinking the problem.  I hate to sound like and old codger, but I suspect people have trouble with this because most people don't do assembly language any more.  Perhaps a more language-centric way of describing the solution might be to use unions, i.e.

    union OVERLAPPEDEX {

       OVERLAPPED ovl;

       struct {

           BYTE reserved[sizeof(OVERLAPPED)];

           OTHERDATA otherdata;

       };

    };

    This is functionally equivalent to your OVERLAPPEDEX structure, but shows in a slightly different way that the pointer to the OVERLAPPEDEX can be cast to an OVERLAPPED without any difficulty.  

  18. Random832 says:

    The confusion I saw wasn't due to whether the structs have compatible layout or whatever, it was due to it not being clear to people that it _is_ passed by address, they were asking if some kernel-mode component might cause a copy of it to be made and have _that_ address passed to the completion function, since it doesn't seem to be explicitly part of the contract. Such behavior would break every single example presented here.

    [That the address passed to the completion function is identical to the address passed to the I/O function is explicitly part of the contract. (How else would you know which I/O completed?) -Raymond]
  19. Tom says:

    @Random832: While you're right about the source of confusion, I still think people are over-thinking the problem.

    The documentation for the asynchronous I/O functions states that the OVERLAPPED structure must exist until the I/O operation completes, and recommends that the OVERLAPPED be in a global variable or allocated on the heap because a stack-allocated OVERLAPPED will likely disappear before the I/O finishes.  Because of this restriction, one can assume that the OVERLAPPED passed to the asynchronous I/O function is the same one passed to the completion function even if that is not explicitly stated in the documentation.  After all, why would the kernel need to make a copy of the OVERLAPPED structure if the kernel already requires you to keep one hanging around until the I/O completes?  If, on the other hand, this restriction were not in place (i.e. the OVERLAPPED could be freed as soon as the asynchronous ReadFile() call returned but before the I/O operation completed), then you should probably assume that the OVERLAPPED is copied.

  20. Alex Grigoriev says:

    @Tom:

    Your union OVERLAPPEDEX is an overkill which only adds confusion.

  21. Pierre B. says:

    @Tom:

    While in this case the documentation is very explicit so there is no question, it is not obivous that any structure passed to the kernel via a pointer will not get copied. Copying parameters to kernel function is a common trick to avoid subtle attack where the data is modified during the call to make the kernel overwrite protected memory. In fact, I suspect that the content of the overlapped structure really do get copied internally before being validate and used for the I/O.

  22. less confused now says:

    Okay, I think what trips people is that somehow, they think when Windows invokes the callback function, the pointer to OVERLAPPED might not be the same one as the pointer from the original call.

    It's explicitly not the case for the APIs here.  But, I suppose such an API design is possible, and might even be desirable in that it doesn't require the caller to maintain the memory to the OVERLAPPED throughout the possible duration of the async I/O.  Of course, if that were the design, it's likely that there would also have been some sort of generic caller-specified void* "context" pointer in the structure that will help the caller identify which OVERLAPPED it is, otherwise you end up with the undesirable situation where it's tricky/impossible for the caller to figure out in the callback which I/O operation it is for.  That sort of "context" pointer could then be used for the caller's secret extra bytes.

  23. confused about why people are confused says:

    Whether or not the kernel makes copies of the structure is totally irrelevant.  Internally the function is free to make billions of copies of the OVERLAPPED structure if it so likes, but at the end of the day, it still has to write back any relevant results back into the *original* structure being pointed to, the one from the caller.  Similarly, the fact that there are extra bytes following the OVERLAPPED parameter has no bearing on the function; the caller is solely responsible for the lifetime management of the full blob of memory, and only the caller knows about the extra bytes; the function only ever access the bytes in the blob that are part of the OVERLAPPED structure.

    The fact that the internal copies made by the function do not include the caller's "secret extra bytes" has no effect on anyone, since the function never knows or cares about the extra bytes to begin with, and the extra bytes from the caller's original structure are never being destroyed in the process since the caller maintains lifetime management on the original structure.

  24. L says:

    The real mysterious APIs are the ones with multi-page long descriptions, where it is nonetheless not clear at all what they are really doing under various circumstances and parameter combinations.

    One of my favorites in this area is the CreateProcess() call. Very old and very basic. Why on earth does it need so many parameters? Why is the documentation still so unclear after so many years?

    For example: What is the true meaning of many options for the dwCreationFlags parameter (Process Creation Flags)? What is the distinction between CREATE_NO_WINDOW and DETACHED_PROCESS? Why is Server 2008 R2 creating a conhost process for a new child process, when the parent process has no console window and the child is created with CREATE_NO_WINDOW (both are compiled as console programs)? I just don't understand.

    Then add to this the inheritance of the stdin/stdout/stderr channels which is just cumbersome the way it must be done.

    THAT one is a real mess.

  25. > If you allocate N bytes of memory at address p, then you pass p to somebody, and that somebody later passes p back to you, then you can access N bytes of memory there. Is this not obvious?

    Yes, it's obvious (even to such as stubborn ox as me) that static_cast<OVERLAPPED_EX *>(p)->foo gives me what I want.

    It's also obvious (I hope) that static_cast<OVERLAPPED_EX *>(new OVERLAPPED(*p))->foo is a Very Bad Thing (TM).

    What is not obvious to me (when relying solely on the signatures of the relevant functions) is, which case are we in?

    Only a close reading of the documentation clears that ambiguity up.  Even then this strikes me as a poor programming practice.

    > How else would you know which I/O completed?

    By relying on something in the OVERLAPPED instance which you were reasonably confident was unique to that OVERLAPPED instances: to wit, the event handle.

    In fact, my implementation of FindOtherDataFromOverlapped would probably go something like this:

    OTHERDATA* FindOtherDataFromOverlapped(OVERLAPPED *lpOverlapped)

    {

       for (int i = 0; i < MAX_OVERLAPPED; i++)

       {

           if (MasterOverlapped[i].hEvent == lpOverlapped->hEvent)

           {

               return &OtherData[i];

           }

       }

       // fatal error… blow up or something

       return NULL;

    }

    [Let me get this straight. I spent two days trying to clarify something that was based on you "relying solely on the signatures of the relevant functions" and not the associated documentation? (BTW, most OVERLAPPED usages do not employ an event. Only if you need to check for completion outside the completion function/completion port do you need an event.) -Raymond]
  26. besides Raymond's point, but anyway says:

    Obviously we are not talking about casting any random OVERLAPPED* to OVERLAPPED_EX*.  In this case you have control over this based on the completion routine you specified for that IO operation.

    And while not a recommended practice, you can specify a NULL event handle in your OVERLAPPED structure, in the case where you know there won't be more than one I/O on the file/pipe/device.  So using the event handle member as a unique identifier also depends on some particular knowledge of how the code is using the OVERLAPPED structure, not any different from knowing that your code will always pass in OVERLAPPED_EX for all I/Os associated with your particular completion routine.

    msdn.microsoft.com/…/ms686358%28VS.85%29.aspx

    I suppose one can claim that the "master table" technique is less prone to AVs in the case where your code unwittingly violated the assumption of always getting OVERLAPPED_EX in your completion routine.  (Assuming no bugs in the code that maintains the master table.)

Comments are closed.