Why is there an Ex and Io work item in WDM?


Have you ever looked at the work item APIs and wondered why there are two different
types of work items? Or for that matter, why are there so many work item APIs?
As Paul wrote
last week, the work item API set has grown for Vista. Today I will try to explain
how we got into this state.

Up until Windows 2000, there was only one type of work item,
WORK_QUEUE_ITEM. You could embed the work item
structure in your own structure and it was quite simple to use. All you to do
is call ExQueueWorkItem() and you were done. There
was one glaring problem with the way WORK_QUEUE_ITEMs worked.


You could not safely unload a driver which had queued a work item.

Safe unload is not possible with this type of work item because there is no outstanding reference on your
device or driver object. A reference on your device or driver object will keep
your driver’s image from unloading. Since there is no reference on eithe robject,
the image can be unloaded before the work item has run or while the work item is executing. But what if you added your own reference
and then released it when the work item ended?

For instance, if you had code that did something like this:


typedef struct _MY_WORK_ITEM {
WORK_QUEUE_ITEM WorkItem;
PDEVICE_OBJECT DeviceObject;
} MY_WORK_ITEM, *PMY_WORK_ITEM;

NTSTATUS QueueWorkItem(PDEVICE_OBJECT DeviceObject)
{
PMY_WORK_ITEM pItem;

pItem = (PMY_WORK_ITEM) ExAllocatePoolWithTag(NonPagedPool, sizeof(MY_WORK_ITEM), tag);
if (pItem == NULL) {
return STATUS_INSUFFICIENT_RESOURCES;
}

ExInitializeWorkItem(&pItem->WorkItem, WorkItemRoutine, pItem);
pItem->DeviceObject = DeviceObject;
ObReferenceObject(DeviceObject);
ExQueueWorkItem(&pItem->WorkItem, DelayedWorkQueue);

return STATUS_SUCCESS;
}

VOID WorkItemRoutine(PVOID Context)
{
PMY_WORK_ITEM pItem = (PMY_WORK_ITEM) Context;
PDEVICE_OBJECT pDevice = pItem->DeviceObject;

// … do work …

ExFreePool(pItem);
ObDereferenceObject(pDevice);
}

The problem is that there is still code execute to execute after the ObDereferenceObject(pDevice);
and the ending } as seen in this disassembly, so
there is still a short window of time where your driver could be unloaded while
your driver is still executing code.


0:000> u WorkItemRoutine+0x23
WorkItemRoutine+0x23

// Put the parameter into ecx and call ObDeferenceObject
000843e3 8b4dfc mov ecx,dword ptr [ebp-4]
000843e6 ff1564a00a00 call dword ptr [wdf01000!_imp_ObfDereferenceObject (000aa064)]

// We still have to execute this code to return to the caller! It is during
// these 3 instructions that the driver can unload
000843ec 8be5 mov esp,ebp
000843ee 5d pop ebp
000843ef c20400 ret 4

To address this problem a new work item type, PIO_WORKITEM, was added.
If the management of the reference was taken care of for the driver in another module, the driver
would not have this problem anymore. This is exactly what PIO_WORKITEM and
IoQueueWorkItem() does. Upon queueing the work
item, the I/O manager takes a reference on the device object and then releases it
after the work item routine returns back to the I/O manager. This means
that all of your driver’s work item code runs while the reference is held, including
the code to return to the caller and it is now possible to safely unload a driver
using this new work item type.

So, the problem is solved right? Well, technically yes, but the new
PIO_WORKITEM type introducted a regression of sorts. The
actual size of the IO_WORKITEM structure is not
exposed publicly which means you can longer embed a work item structure in your
own structure. This results in having to allocate a context and to allocate the
work item separately. This introduces another point of failure and makes the
initialization and destroy code more complex. Here is the previous code snippet
modified to use the new work item type:


typedef struct _MY_WORK_ITEM {
PIO_WORKITEM WorkItem;
// …other context fields…
} MY_WORK_ITEM, *PMY_WORK_ITEM;

NTSTATUS QueueWorkItem(PDEVICE_OBJECT DeviceObject)
{
PMY_WORK_ITEM pItem;

pItem = (PMY_WORK_ITEM) ExAllocatePoolWithTag(NonPagedPool, sizeof(MY_WORK_ITEM), tag);
if (pItem == NULL) {
return STATUS_INSUFFICIENT_RESOURCES;
}

pItem->WorkItem = IoAllocateWorkItem(DeviceObject);
if (pItem->WorkItem == NULL) {
ExFreePool(pItem);
return STATUS_INSUFFICIENT_RESOURCES;
}

// …initialize the rest of pItem…
IoQueueWorkItem(pItem->WorkItem, IoWorkItemRoutine, DelayedWorkQueue, pItem);

return STATUS_SUCCESS;
}

VOID IoWorkItemRoutine(PDEVICE_OBJECT DeviceObject, PVOID Context)
{
PMY_WORK_ITEM pItem = (PMY_WORK_ITEM) Context;

// … do work …

IoFreeWorkItem(pItem->WorkItem);
ExFreePool(pItem);
}

To address the embedded work item “regresssion, Vista introduced
IoSizeofWorkItem() (which you can read about
in Paul’s article which I referenced at the top of this entry). In conclusion,
it is not hard to see why there are two different types of work items and so
many work item APIs in WDM. The problem set has grown over time and the OS
has evolved to solve those problems.

Comments (8)

  1. DrPizza says:

    "This results in having to allocate a context and to allocate the work item separately. This introduces another point of failure and makes the initialization and destroy code more complex"

    Sounds to me like more justification for supporting C++ as a first class kernel-mode development language.

  2. nksingh says:

    C++ doesn’t reduce the actual points of failure… it just abstracts them away from the dev and turns understandable failures to less understandable ones.  If the memory allocation fails, it fails regardless of the language.

  3. doronh says:

    nksingh is right. C++ is not a magic bullet.  I could easily abstract the problem in a C function.  

    In fact, I would say that wrapping an io workitem in a C++ would make things more complicated.   operator new() is passed the fixed size of the object, so you must overload operator new() and use IoSizeofWorkItem() to compute the right size.  If you do this, you have now created a "finalized" class, you cannot derive from this class b/c it has a variable length.

  4. Yesterday I wrote

    about the evolution of work items.  Work items evolved because there was a need

    to…

  5. DrPizza says:

    "C++ doesn’t reduce the actual points of failure… it just abstracts them away from the dev and turns understandable failures to less understandable ones."

    This is complete nonsense.  C++ allows you to create resources that properly manage their own lifetimes.  C does not.

    "In fact, I would say that wrapping an io workitem in a C++ would make things more complicated.   operator new() is passed the fixed size of the object, so you must overload operator new() and use IoSizeofWorkItem() to compute the right size.  If you do this, you have now created a "finalized" class, you cannot derive from this class b/c it has a variable length. "

    This can’t go in the constructor why?

  6. MajorTom says:

    I don’t understand where is the regression with the new IoWorkItem interface.

    the "previous code snippet modified to use the new work item type" is safe and bug free as far as I can see

  7. doronh says:

    MajorTom:  it is not a regression in behavior, it is a regression in programming patterns.  the pattern has become slightly more complicated.  maybe i am overloading "regression" here, but I have done enough code reviews where developers have screwed up the transition from an Ex to Io work item that I view it as a regression.

    d

  8. doronh says:

    DrPizza:   I can do the same thing in C.  I create fwd declared pointer type and accessors/constructor/desctructor in a header and have the ccessors/constructor/desctructor in its own C file.   C++ formalizes the pattern slightly, but it is not a magic bullet. You still need to delete the object in C++, so instead of a delete pMyWorkItem you have a MyWorkItemFree() call.

    You can’t handle variable length structures in the constructor, it’s too late.  the memory for the object has already been allocated.  You need to precompute the size of the object in an overloaded operator new(), you now have to do something like this

      struct WORK_ITEM_CONTEXT {
           ULONG Size;
           ULONG Context;
           UCHAR WorkItemStart[1];
     
           PIO_WORKITEM GetWorkItem() { return (PIO_WORKITEM) &WorkItemStart[0]); }

           WORK_ITEM_CONTEXT(PDEVICE_OBJECT DeviceObject, ULONG Size) : Size(Size), Context(NULL)
           {
                 IoInitializeWorkItem(DeviceObject,  GetWorkItem());
          }

           PVOID operator new(size_t Size)
           {
               UNREFERENCED_PARAMETER(Size);
               return ExAllocatePoolWithTag(NonPagedPool, FIELD_OFFSET(WORK_ITEM_CONTEXT, WorkItemStart) + IoSizeofWorkItem(), <tag>);
           }
       };

    Now, when you derive from WORK_ITEM_CONTEXT , the compiler does not know that WORK_ITEM_CONTEXT  is actually variable sized and will start the derived object’s fields at sizeof(WORK_ITEM_CONTEXT), while it really needs to start it at
    FIELD_OFFSET(WORK_ITEM_CONTEXT, WorkItemStart) + IoSizeofWorkItem().  And there is no way to convey to the compiler where to start the derived object, hence you have created what in C# is a finalized class, but way more dangerous b/c you now have a ticking time bomb where a corruption is just waiting to happen…

    …and, yes, you could make the constructor private and then have a static function new a WORK_ITEM_CONTEXT for you.  I personally feel that this is overkill for such a simple task. You can spend more time fighting the language and its patterns then actually the time spent implementing the real guts of the solution.