Why Your User Mode Pointer Captures Are Probably Broken

Article
03/31/2008

There is a problem that I suspect is pretty widespread in the majority of driver code. The problem is the improper capturing of user mode pointers. I decided to write a blog about it and try to get a feel for if I am right or not. J I figure that if people comment with “of course we knew that you moron!” then I’ll assume that I am totally wrong. If not then I hope this will help at least one person.

User mode pointers passed to kernel mode must point to data that is wholly contained in user mode address space. Checking this property of a user mode pointer is called probing. Contrary to popular belief a probe does not touch the memory pointed to by the pointer, it just does an address range calculation on it. The calculation is basically “(pointer + LengthOfDataPointedTo) must be less than the highest legal user-mode (UM) address”. The reason we need to probe is to make sure that a UM component can’t write or read kernel space. When a user passes a pointer into a kernel mode component, the pointer is copied onto the kernel stack as part of the calling mechanism (sometimes called a “system call”, sysenter, a “trap”, etc.). The user has no way to change the value of that variable once it is passed in – so we can validate the pointer with confidence – since we know that its value won’t change underneath us. However, if that pointer is a pointer to a structure with embedded pointers, those internal pointers can be changing asynchronously from another thread. This is problematic because we need to validate all of the embedded pointers in a passed in structure. This is where capturing comes in to play. We capture the embedded pointers by storing their value in kernel mode space – usually the stack – by reading the embedded pointers through the already validated pointer. Once the embedded pointer is captured – we probe it, lather – rinse – repeat for the entire depth of the embedded pointers tree.

Consider this code:

typedef struct _USER_DATA {

PULONG_PTR Data1;

PULONG_PTR Data2;

} USER_DATA, *PUSER_DATA;

NTSTATUS

Foo(

PUSER_DATA Data

)

{

PULONG_PTR CapturedData1;

ULONG_PTR Data1Value;

PULONG_PTR CapturedData2;

ULONG_PTR Data2Value;

// See if this is user mode –in a driver PreviousMode

// would normally be read from a field in the IRP and the

// pointer would come from the Type3InputBuffer field

if (ExGetPreviousMode() != KernelMode) {

try {

// Probe the passed in structure

ProbeForRead(Data,

sizeof(USER_DATA),

__alignof(USER_DATA));

// Capture the embedded pointers

CapturedData1 = Data->Data1;

CapturedData2 = Data->Data2;

// Probe the first captured pointer

ProbeForRead(CapturedData1,

sizeof(ULONG_PTR),

__alignof(ULONG_PTR));

// Probe the second captured pointer

ProbeForRead(CapturedData2,

sizeof(ULONG_PTR),

__alignof(ULONG_PTR));

// Read the first embedded pointer

Data1Value = *CapturedData1;

// Read the second embedded pointer

Data2Value = *CapturedData2;

// More of your code here that does really cool stuff…

} except (EXCEPTION_EXECUTE_HANDLER) {

return GetExceptionCode();

}

return STATUS_SUCCESS;

}

Everything seems to be OK with this code. We probe the structure pointer, capture the embedded pointers to local variables and then probe them. But wait - let’s think about our ever important capture code a little deeper. The most important attribute of our capture code is that it stores the embedded pointer in a location where the user can’t modify it. If it didn’t, then we would be in really bad shape. So the question is – does our capture code in fact guarantee that the embedded pointers will be in a location such that they can’t be modified from user mode? Unfortunately, the answer is NO! How is this possible? We told the compiler that we wanted to store the pointers locally. However – we didn’t do anything to tell the compiler that it was critical that they were stored locally. So the compiler can freely skip the local storage of the embedded pointers and just refetch them from user mode through the original pointer upon each reference, if it so chooses. This is potentially disastrous for our kernel mode code and not at all what we expected or intended.

So what can we do to get the behavior we expect? Well – we have to tell the compiler the truth about the code that we are writing. That’s right – the truth. We are lying in our code. Our code has implicitly told the compiler that our embedded pointers can’t change asynchronously. This is a lie. They can change. So in order for our code to be correct, we need to change it to a truthful representation of itself. How do we tell the compiler that our pointed to structure can change? Well – there are a couple of ways. The most straightforward way is to mark the passed in parameter with the keyword volatile. volatile when applied to a parameter or variable definition, tells the compiler that that memory location’s contents can change asynchronously – so all reads and writes to it must really happen and in the order they are specified in the code. This facility was put into the C language to deal with code that reads and writes memory that can change in a different scope (i.e. hardware device registers, device memory, shared memory) and we can take advantage of its semantics for our user mode pointer captures. With hardware - a reordered, omitted or combined read or write could lead to real life disasters. For hardware, reads as well as writes have side effects; this is completely analogous to our code. A read can have the side effect of violating our security mechanism.

So we can fix our code by changing our routine like so:

NTSTATUS

Foo(

volatile USER_DATA* Data

);

By changing the pointer Data, to be a pointer to a volatile structure we will force all reads and writes to the structure to really happen in our code (bonus points for explaining why we can't use "volatile PUSER_DATA" as our parameter type instead of "volatile USER_DATA*" - aren't they the same thing? :D ). However, if we have existing interfaces that we must maintain - it prevents us from being able to do this. What to do? Well – there is another way to get the behavior we want. We can cast at the capture site. This technique is called using “volatile glasses”. Here is an example:

// Capture the embedded pointers

CapturedData1 = ((volatile USER_DATA*)Data)->Data1;

CapturedData2 = ((volatile USER_DATA*)Data)->Data2;

This will cause the compiler to perform the capture as if the variable Data had been declared volatile. Using this technique prevents the compiler from re-fetching from the passed in pointer because we told the compiler the truth. We said “Hey compiler – this thing that Data points to can change asynchronously, so you’d better not be playing any funny games with it”. And the compiler will honor that. It has to if it honors the volatile keyword. We would then have to do the same thing for the internal reads as well:

// Read the first embedded pointer

Data1Value = *(volatile ULONG_PTR*)CapturedData1;

// Read the second embedded pointer

Data2Value = *(volatile ULONG_PTR*)CapturedData2;

Again, we are telling the compiler the truth here – that the ULONG_PTR value can change asynchronously and it needs to really capture it locally.

This is a really esoteric topic – but very important IMHO. Please let me know your thoughts on this – I am highly interested. Thanks!

Why Your User Mode Pointer Captures Are Probably Broken

Additional resources