Memory marshalling in Windows CE

Posted by: Sue Loh 

This article explains how memory access and memory passing is implemented in Windows CE 6 as well as previous versions of the OS.  My intention is to explain the significant differences in CE6 by contrasting it against earlier OS versions.  I structured this explanation to talk mostly about drivers, about how drivers used to work in CE5 and how they will work in CE6.  That’s because it’s most urgent for our BSP and driver developers to understand how their code is going to have to change.  But these explanations also cover system servers: how the implementations of APIs and services work.  Drivers and servers work the same way.

Let’s begin with some quick definitions related to passing a pointer from client to server.  Each term will be covered in more detail as we go.

  • Pointer parameter:  A pointer that’s passed as a parameter to an API.
  • Embedded pointer:  A pointer that’s passed to an API by storing it inside a buffer.
  • Access Checking: Verifying that the caller process has privilege to access a buffer.
  • Marshalling: Preparing a pointer that a server can use to access a caller’s buffer.
  • Secure-copy: Making a copy of a buffer to prevent against asynchronous modification by the caller.
  • Synchronous: Accesses during an API call, on the caller’s thread.

A pointer parameter is a pointer that’s passed as a parameter to an API.  For example, the pBuffer parameter to the ReadFile() API is a pointer parameter.

ReadFile (hFile, pBuffer, dwBufferSize, ...);

An embedded pointer is a pointer that’s passed to an API by storing it inside a pointer parameter, or nested inside another embedded pointer.  For example, while the pMyStruct parameter to the following DeviceIoControl() call is a pointer parameter, the pEmbedded pointer that is stored inside MyStruct is an embedded pointer.

struct MyStruct {

  BYTE *pEmbedded;

  DWORD dwSize;

};

DeviceIoControl (hFile, pMyStruct, sizeof(MyStruct), ...);

Pointers that are passed by other means, for example by storing them inside shared memory or by using SetEventData() to attach them to an event, end up having all the same properties as embedded pointers and so should be treated as such.

Access checking is verifying that the caller of an API has enough privilege to access a buffer that it passed to the API.  (Access checking is not limited to memory, but in this case I’m only defining it with regard to memory.)  The reason access checking is necessary is to prevent malicious applications from being able to induce driver code to perform actions on their behalf.  Drivers have a lot of privilege, and can access a lot of system data.  Applications can not.  If a malicious application could cause a driver to read or write system memory on its behalf, then that driver is essentially granting the malicious application access to data it should not.  Proper access checking inside the driver can protect system memory.

In CE5:

  • Drivers used MapCallerPtr() to access-check pointer parameters and embedded pointers.  The CE5 kernel also redundantly access-checked pointer parameters, but had no way to know the size of the buffers being passed.  So it only checked the caller’s access to a single byte of the buffer.
  • The access was granted or denied based on the “trust level” of the caller process.

In CE6:

  • The API call definitions were changed to also include the sizes of pointer parameters.  So the kernel now performs a full access check on pointer parameters.  (I will explain this in more detail when I post about how API calls are implemented in CE6.)
  • Drivers only need to access check embedded pointers, and they do this using the new API CeOpenCallerBuffer().  This API is also responsible for marshalling the data, as explained below.
  • The access is granted or denied based on whether the caller is the kernel or a user-mode process.  (It may change to a more granular determination in the future, based on privilege levels.)

Synchronous memory access is done during an API call, on the caller’s thread.  If a driver has a thread which accesses the other process’ memory after the API call returns, that’s asynchronous access.  But just as significantly, if the driver has a thread which is guaranteed to access the other process’ memory during the course of the API call – before it returns – for the purpose of this discussion, that access is asynchronous too.

Pointer mapping or marshalling is the preparation of a pointer that a driver can use to access a caller’s buffer.  Drivers run inside a different process than the application which calls them.  The virtual memory space of every process is, by default, protected against access by other processes.  A driver must do some work in order to access a buffer inside another process’ memory.

In CE5, all processes shared a common address space.  To obtain a pointer to its caller’s memory, a driver would have to “map” the pointer into that process’ address space.  “Mapping” was a simple transformation of the pointer value, to make it point at the other process “slot” inside the common address space.  The following picture shows device.exe accessing data in-place inside its caller.

In CE6, each process has its own unique address space.  Marshalling memory cannot be as simple as a pointer transformation.  Either the memory must be copied from one process to another (duplication) or a new virtual address must be allocated in the driver process and pointed at the same physical memory the caller was using (aliasing).  Either way, resources are allocated inside the driver process, and must be freed when the driver is done with them.  The following pictures show a marshalled version of the caller’s buffer being created inside the kernel (for kernel-mode drivers) or udevice.exe (for user-mode drivers.)

 

 

 

 

 

 

The CE6 marshalling is also more formalized about declaring whether the buffer is in-only, in/out or out-only.  Based on these settings, the marshalling helpers will ensure that copy-in and copy-out happen at the appropriate times.  They are also used for access checking, for example a user-mode application cannot pass a shared heap address (which is read-only to applications) as an in/out or out-only parameter.

To explain what drivers must do to marshal memory, it is simpler to examine synchronous and asynchronous accesses separately.  First, for synchronous access:

  • The kernel automatically maps or marshals pointer parameters.
  • The driver must take care of embedded pointers.  In CE5, drivers used MapCallerPtr() for this.  In CE6, drivers use CeOpenCallerBuffer() to marshal embedded pointers, and CeFreeCallerBuffer() when they are done.

Both MapCallerPtr and CeOpenCallerBuffer have the added benefit that they access-check the buffer as they prepare it for use.

Asynchronous accesses are more complicated.  In CE5, Additional work must be done to access the caller’s memory on a different thread.  Each process “slot” was protected from access by other processes.  Each thread had a its own set of “permissions” to access the various process slots.  As the caller’s thread jumped into the driver, it carried with it permission to access its owner process slot.  So accesses to caller’s memory would succeed as long as they were done on that thread.  Other threads would first have to obtain permission to access to the other process slot.

In CE6, like CE5, additional work must be done to access the caller’s memory on a different thread.  The reasons are different, and not as easy to explain.  The way memory is marshalled differs between kernel mode and user mode, and differs between pointer parameters and embedded pointers.  The only way to guarantee that the driver code is going to work properly in all modes is to prepare buffers for asynchronous access before accessing them on another thread.

For asynchronous access, pointer parameters and embedded pointers are handled the same way.  Assuming that we start with a buffer that is already mapped or marshalled for synchronous access, the steps a driver must take in order to access it asynchronously are:

  • In CE5, a driver must call SetProcPermissions() on its asynchronous thread, in order to access a buffer in a different process.
  • In CE6, a drivers must call CeAllocAsynchronousBuffer() to prepare an “asynchronous ready” version of the buffer that is already prepared for synchronous use.  That call must be made synchronously, before passing the buffer to the asynchronous thread.  When the thread is done with the buffer, it calls CeFreeAsynchronousBuffer() to release the resources associated with it.

Also, unfortunately, not all asynchronous cases are supported for user-mode drivers.  What a user-mode driver cannot do is asynchronously write back to a pointer parameter.  Kernel-mode drivers always work, embedded pointers always work, and read-only pointers (no write-back to the caller) always work fine too.  I personally feel more comfortable saying that we simply don’t support asynchronous access in user-mode drivers.  If people listen to that, they can never get into trouble.  If your driver needs asynchronous access to caller buffers, in CE6 you should run it in kernel mode.  (Or if it’s an option, rearchitect your protocol so that caller memory access is never asynchronous, eg. notify the caller that data is ready and have them call back into your driver to retrieve it.)

Other details for production quality drivers

You may say that the following two topics, secure copy and exception handling, are not part of memory marshalling.  But they are required in today’s world for safely receiving memory from other processes, and I believe that any discussion of memory passing is not complete without covering them.

There is a security risk a lot of developers are not aware of: callers can modify the buffers they pass, while a driver is still using it.  The caller application could have a secondary thread which manipulates the data in a buffer while the primary thread is inside a driver call.  Malicious applications could manipulate embedded pointers to get access to memory they shouldn’t, or cause buffer overruns by manipulating buffer sizes, or cause other problems like exceptions and leaks.  To prevent against this class of attacks, drivers must make a copy of the caller’s data, called a secure copy, to prevent the caller from modifying it asynchronously.

For my first example of an attack that can be prevented using secure copies, imagine that the caller passes an embedded pointer to a driver.  The driver uses MapCallerPtr (in CE5) or CeOpenCallerBuffer (in CE6) to access check the pointer and map/marshal it for use.  If the driver continues to store that pointer into the caller’s buffer, the caller could later manipulate it to point at other memory, and the driver would access the wrong memory.  Drivers must make copies of the pointers they receive from callers to prevent asynchronous modification.  Similarly, drivers must make copies of buffer size values they get from callers.

So, always copy embedded pointers to a local variable.  This is easily accomplished as part of mapping/marshalling since you have to call MapCallerPtr or CeOpenCallerBuffer anyway.  Never store the mapped/marshalled pointer back to the caller’s buffer.  Never use the pointer in the caller’s buffer after it has been mapped/marshalled.  Treat buffer size and length variables with the same caution, so that callers cannot manipulate sizes any more than they can manipulate pointers.

My second example of why secure copy is necessary involves file names.  The CreateFile API, which takes a file name, validates that the caller is allowed to access that file.  Suppose CreateFile read the file name, checked access, then used the file name to open the file when the access check passed.  If the caller passes the name of a file it can access, then asynchronously changes it to a file name the caller is NOT supposed to be able to access, then there is a small window of time in which the caller could trick CreateFile into opening a file it’s not supposed to.  Perhaps it would only be able to get access 1% of the tries, but a hacker program could keep trying and trying until the trick worked.  It only has to work once in order to compromise system security.  The way to protect against this type of attack is that CreateFile must make a copy of the filename, in memory that the caller cannot access, before validating the caller’s access to that file.  (By the way, the OS already does a secure-copy of the file name before passing it to a driver’s CreateFile in CE6, this is just a thought experiment.)

You should make a copy of any data that requires validation, to prevent asynchronous modification after the validation is done.  Making a secure copy can be as simple as copying a buffer or pointer into a stack variable.  Or you could make a temporary heap allocation to copy the caller’s data into.  You will notice that CeOpenCallerBuffer has a ForceDuplicate parameter you can use to guarantee that you get a secure copy of an embedded buffer.  We’ve also created a CeAllocDuplicateBuffer helper function that you can choose to use.  (It is basically a heap alloc, with memcpy as necessary for copy-in or copy-out.)  It does not matter how you make the secure copy, as long as you do something to protect the data you take from callers.

Similar to secure copy is how drivers must use exception handling to protect their access of caller memory.  It is important to note that, even if a caller has access to an address, that address may not refer to valid memory.  An application can pass a pointer to a user-mode address that was never allocated.  Or it could asynchronously free the buffer.  So, drivers should always surround user buffer accesses with try/except blocks, and clean up resources during __except or __finally.  For example, make sure to free memory that was allocated during the call, and release any critical sections, before returning to the caller.

In Summary

As you can see, passing memory between processes is a complicated matter.  But don’t despair.  There are relatively simple rules governing drivers, as covered in the following table.

 

 

… and remember, always use try/except so you can clean up properly if you get exceptions on caller memory!

One other tip: CE6 has some helper C++ classes to simplify your usage of these APIs.  In public\common\oak\inc\marshal.hpp you will find:

  • MarshalledBuffer_t: wrapper for CeOpenCallerBuffer, CeAllocAsynchronousBuffer, and their cleanup functions.  Use for all of your embedded pointers.
  • DuplicatedBuffer_t: wrapper for CeAllocDuplicateBuffer and its free.  Use for pointer parameters that need a secure copy.
  • AsynchronousBuffer_t: wrapper for CeAllocAsynchronousBuffer and its free.  Use for pointer parameters you need to access asynchronously.

The C++ version of the table then becomes:

Use Case

What the driver must do in CE6

Parameter – used synchronously

If a secure copy is necessary, use DuplicatedBuffer_t. Otherwise just use the pointer.

Parameter – used asynchronously

If a secure copy is necessary, use DuplicatedBuffer_t.

Otherwise use AsynchronousBuffer_t.

Embedded Pointer

Use MarshalledBuffer_t.

… and always use try/except!

Juggs Ravalia did a Channel 9 interview on this topic – if you don’t like my explanation, maybe you’ll like his better.  https://channel9.msdn.com/Showpost.aspx?postid=233119