How can a NULL terminated string be misinterpreted in a UNICODE_STRING?

A NULL terminated string can be mismisinterpreted if the Length field contains the NULL part of the string.  Let’s look at the the definition of DECLARE_CONST_UNICODE_STRING again before I go into how it can be misinterpreted.

#define DECLARE_CONST_UNICODE_STRING(_var, _string) \
const WCHAR _var ## _buffer[] = _string; \
const UNICODE_STRING _var = { sizeof(_string) – sizeof(WCHAR), sizeof(_string), (PWCH) _var ## _buffer }

note the– sizeof(WCHAR) in the initialization of the Length field

const UNICODE_STRING _var = { sizeof(_string) – sizeof(WCHAR), …

For clarity’s sake, it should really be sizeof(UNICODE_NULL), but they are functionally the same.  sizeof(_string) will return the size in bytes of the string including the terminating NULL.  By subtracting off the size in bytes of one unicode character, we initialize the Length field to the size in bytes of the NULL terminated string without counting the NULL itself.

So, how can the misinterpretation occur?  Since the UNICODE_STRING gives the length of the Buffer, there is no need to stop at the terminating NULL when comparing two UNICODE_STRING Buffers against each other.  This means that by including the NULL in the length, you change how much of the Buffer is evaluated (*).  For instance, the registry code stores the name of a Value as a UNICODE_STRING (e.g. “MyValue”).  If you passed in a UNICODE_STRING for the value name which was initialized like this

UNICODE_STRING name = { sizeof(“MyValue”), sizeof(“MyValue”), “MyValue” };

to retrieve the value contained by “MyValue”, the query would fail because the registry API would be comparing “MyValue” (the value name in the registyr) against “MyValue\0” (the name you passed in).  I know from personal experience that you have to be exact with this, especially when creating a UNICODE_STRING by hand that is not based on a constant string.  I have been bit by this issue and have spent alot of time figuring out what I did wrong.

(*) A side affect of not stopping to evaluate the Buffer when you encounter a NULL is that you can create a multi-sz value without having to compute the length of each substring to find the total length of the multi sz.

Comments (2)

  1. mattd says:

    Off topic but I’m very interested in library design. Can you explain how the the handles are generated on the DDIs. Are the pointers to your data structures just XORed with some random number(similar to the Win32 EncodePointer/DecodePointer) then just casted to the handle type? Or is it more complicated than that? BTW great stuff Doron, keep it up.

  2. doronh says:

    yes, currently it is a structure pointer that is XOR’ed with a value like EncodePointer/DecodePointer.  A handle table was considered, except for kernel mode we synchronization issues due to the callers IRQL.  The synch mechanism would have to be the lowest common denominator of PASSIVE_LEVEL, DISPATCH_LEVEL, and DIRQL (device interrupt IRQL)…so we would have to use DIRQL.  But we can’t use DIRQL because we don’t always have an interrupt object to sync against and it performance would be awful if everytime you called a KMDF DDI, we sync’ed at DIRQL (even for a short period of time).

    The goal of our handles was to have an opaque type, so that the underlying type could change from version to version.  It was also a goal to have a typesafe handle in terms of the compiler validating the type (vs everything just being a HANDLE) at compile time.  It was a non goal to provide a safe pointer that could be validated and rejected on entry into the DDI (although this was at one point in time considered a real goal of the algorithm).