Why is there an Ex and Io work item in WDM?

Have you ever looked at the work item APIs and wondered why there are two different
types of work items? Or for that matter, why are there so many work item APIs?
As Paul wrote
last week, the work item API set has grown for Vista. Today I will try to explain
how we got into this state.

Up until Windows 2000, there was only one type of work item,
WORK_QUEUE_ITEM. You could embed the work item
structure in your own structure and it was quite simple to use. All you to do
is call ExQueueWorkItem() and you were done. There
was one glaring problem with the way WORK_QUEUE_ITEMs worked.

 
    You could not safely unload a driver which had queued a work item. 

Safe unload is not possible with this type of work item because there is no outstanding reference on your
device or driver object. A reference on your device or driver object will keep
your driver's image from unloading. Since there is no reference on eithe robject,
the image can be unloaded before the work item has run or while the work item is executing. But what if you added your own reference
and then released it when the work item ended?

For instance, if you had code that did something like this:

 
    typedef struct _MY_WORK_ITEM {
        WORK_QUEUE_ITEM WorkItem;
        PDEVICE_OBJECT DeviceObject;
    } MY_WORK_ITEM, *PMY_WORK_ITEM;

    NTSTATUS QueueWorkItem(PDEVICE_OBJECT DeviceObject)
    {
        PMY_WORK_ITEM pItem;

        pItem = (PMY_WORK_ITEM) ExAllocatePoolWithTag(NonPagedPool, sizeof(MY_WORK_ITEM), tag);
        if (pItem == NULL) {
            return STATUS_INSUFFICIENT_RESOURCES;
        }

        ExInitializeWorkItem(&pItem->WorkItem, WorkItemRoutine, pItem);
        pItem->DeviceObject = DeviceObject;
        ObReferenceObject(DeviceObject);
        ExQueueWorkItem(&pItem->WorkItem, DelayedWorkQueue);

        return STATUS_SUCCESS;
    }

    VOID WorkItemRoutine(PVOID Context)
    {
        PMY_WORK_ITEM pItem = (PMY_WORK_ITEM) Context;
        PDEVICE_OBJECT pDevice = pItem->DeviceObject;

        // ... do work ...

        ExFreePool(pItem);
        ObDereferenceObject(pDevice);
    }

The problem is that there is still code execute to execute after the ObDereferenceObject(pDevice);
and the ending } as seen in this disassembly, so
there is still a short window of time where your driver could be unloaded while
your driver is still executing code.

 
    0:000> u WorkItemRoutine+0x23
    WorkItemRoutine+0x23

    // Put the parameter into ecx and call ObDeferenceObject
    000843e3 8b4dfc          mov     ecx,dword ptr [ebp-4]
    000843e6 ff1564a00a00    call    dword ptr [wdf01000!_imp_ObfDereferenceObject (000aa064)]

    // We still have to execute this code to return to the caller!  It is during
    // these 3 instructions that the driver can unload
    000843ec 8be5            mov     esp,ebp
    000843ee 5d              pop     ebp
    000843ef c20400          ret     4

To address this problem a new work item type, PIO_WORKITEM, was added.
If the management of the reference was taken care of for the driver in another module, the driver
would not have this problem anymore. This is exactly what PIO_WORKITEM and
IoQueueWorkItem() does. Upon queueing the work
item, the I/O manager takes a reference on the device object and then releases it
after the work item routine returns back to the I/O manager. This means
that all of your driver's work item code runs while the reference is held, including
the code to return to the caller and it is now possible to safely unload a driver
using this new work item type.

So, the problem is solved right? Well, technically yes, but the new
PIO_WORKITEM type introducted a regression of sorts. The
actual size of the IO_WORKITEM structure is not
exposed publicly which means you can longer embed a work item structure in your
own structure. This results in having to allocate a context and to allocate the
work item separately. This introduces another point of failure and makes the
initialization and destroy code more complex. Here is the previous code snippet
modified to use the new work item type:

 
    typedef struct _MY_WORK_ITEM {
        PIO_WORKITEM WorkItem;
        // ...other context fields...
    } MY_WORK_ITEM, *PMY_WORK_ITEM;

    NTSTATUS QueueWorkItem(PDEVICE_OBJECT DeviceObject)
    {
        PMY_WORK_ITEM pItem;

        pItem = (PMY_WORK_ITEM) ExAllocatePoolWithTag(NonPagedPool, sizeof(MY_WORK_ITEM), tag);
        if (pItem == NULL) {
            return STATUS_INSUFFICIENT_RESOURCES;
        }

        pItem->WorkItem = IoAllocateWorkItem(DeviceObject);
        if (pItem->WorkItem == NULL) {
            ExFreePool(pItem);
            return STATUS_INSUFFICIENT_RESOURCES;
        }

        // ...initialize the rest of pItem...
        IoQueueWorkItem(pItem->WorkItem, IoWorkItemRoutine, DelayedWorkQueue, pItem);

        return STATUS_SUCCESS;
    }

    VOID IoWorkItemRoutine(PDEVICE_OBJECT DeviceObject, PVOID Context)
    {
        PMY_WORK_ITEM pItem = (PMY_WORK_ITEM) Context;

        // ... do work ...

        IoFreeWorkItem(pItem->WorkItem);
        ExFreePool(pItem);
    }

To address the embedded work item "regresssion, Vista introduced
IoSizeofWorkItem() (which you can read about
in Paul's article which I referenced at the top of this entry). In conclusion,
it is not hard to see why there are two different types of work items and so
many work item APIs in WDM. The problem set has grown over time and the OS
has evolved to solve those problems.