On generating sentinel pointer values in Windows


Suppose you have a need for sentinel pointer values. Let's say that your function operates on pointers to Widget objects, but you need a few special values that convey special meaning, like "There is no widget" or "Please use a default widget" or "Inherit the widget from the parent object" or "All the widgets."

Well, many languages give you a sentinel pointer value up front, typically called null or Nothing or nullptr or something emptyish like that. If you need only one sentinel value, then that's a pretty simple choice.

On the other hand, this emptyish sentinel value is a common thing that could be generat3ed by mistake. Some languages use it as the default value for pointers. And the emptyish value might come out of an earlier failed operation, like an allocation. So you might want to avoid using the emptyish value as a sentinel because it is too easy to pass by mistake.

If you need a small number of sentinel values, you could just allocate a few objects for the sole purpose of providing an address. Some classes in the C++ standard library do this. For example, std::map might allocate a sentinel value to represent end(). (That sentinel value serves other purposes, too.)

Another idea is to create a bunch of addresses for your own use and carve your sentinel values out of them. You can Virtual­Alloc(MEM_RESERVE) some address space, and nothing will go into that address space unless you put it there. If you reserve the address space and intentionally put nothing in it, then all the addresses in that reserved region are potentially usable as sentinels.

Windows itself does this for you: As part of setting up the process address space, the kernel reserves the bottom 64KB of address space, so no valid objects will be allocated there. That gives you 65536 sentinel values, although one of them matches nullptr, so it's 65535 new sentinel values. This is the technique used by the MAKE­INT­RESOURCE and MAKE­INT­ATOM macros to allow an integer to be smuggled inside a string pointer.

Prior to Windows 8, applications could unreserve the bottom 64KB of address space and allocate actual memory there, which created the opportunity for mass confusion. Windows 8 put a stop to that.

If your widget object has alignment requirements (and if it consists of anything other than raw bytes, it probably does), you can use any pointer value that does not conform to those requirements. For example, if widgets must be 4-byte aligned, then any pointer value which is not divisible by four can be used as a sentinel, since it will never match the address of a valid widget.

If your widget object has no alignment requirements, you could always invent one by using a declaration appropriate to your toolset, such as __declspec(align(2)) or __attribute__(aligned(2)) or whatever.

Even if your alignment requirements are only word-alignment, that gives you two billion possible 32-bit sentinel values, which is quite a lot. You can use the encoding f(n) = n × 2 + 1 to create a sentinel and its inverse g(n) = (n − 1) / 2 to convert a sentinel back to its magic number.

(And if you're using 64-bit pointers, then the number of possible sentinel values is staggering.)

Exercise: Critique the following suggestion: "You can pick any value greater than 0x80000000 to use as a sentinel value."

Comments (30)

  1. SimonRev says:

    You are lobbing us an easy one today Raymond. 15 years ago that would have been a tricky exercise, but today hopefully anyone who does programming that involves pointers is cognizant of LARGEADDRESSAWARE and the distinct possibility that a 32 bit user mode app may have valid memory above the 2gb barrier.

  2. xcomcmdr says:

    “generat3ed ” : typo

    1. Brian_EE says:

      No, Raymond was smuggling the sentinel value 3 inside a string. If you don’t know what the 3 means, I’m afraid there’s no hope for you.

    2. Andrew says:

      Not a typo. The “3” was silent you see.

      Homage to Tom Lehrer, We Will All Go Together When We Go: “I am reminded at this point of a fellow I used to know who’s name was Henry, only to give you an idea of what an individualist he was he spelt it HEN3RY. The 3 was silent, you see.”

  3. Cesar says:

    Raymond mentioned the bottom of the address space, but what about the top?

    If your widget object has at least x bytes, the last x-1 bytes of the address space can be used as sentinel values. In particular, unless your widget object it a single byte, -1 (the last byte of the address space) is a good sentinel value.

    (The Linux kernel does something similar: if the return value of a system call is a negative value above -4096, it’s an error value. This works even for system calls like mmap which returns a pointer, since its successful return value points to a 4096-byte (or greater) page.)

  4. Irek Zielinski says:

    Values above 0x80000000 are typically not used in 32 bit applications, unless one is marked as “Large Address Aware” and running on 64 bit windows.

    1. Eric Wilson says:

      I think you mean “OR running on win64”. If you are large address aware on win32, you have to be prepared for addresses larger than 0x80000000.

      1. Yuhong Bao says:

        I think they mean “running on 64-bit Windows or booted with the /3GB switch”

  5. kantos says:

    This reminds me of what LLVM does with bitpacking values into pointers Chandler Carruth goes into detail https://youtu.be/vElZc6zSIXM?t=22m

  6. Ken Hagan says:

    As well as not being largeaddressaware, 0x80000000 is, after suitable casting, going to compile perfectly happily in the 64-bit port of your software that will be done several years after everyone has forgotten how cleverly the sentinel values were done.

  7. Antonio Rodríguez says:

    Even in 32-bit software without LARGEADDRESSAWARE, most of the addresses above 0x80000000 aren’t usable as sentinel values. IIRC, Win32 reserves two sentinel areas, 64 KB in size, just above and bellow the border, but any address above the upper sentinel area is kernel space and, thus, can be returned by a function which references kernel memory. So, of the values above 0x80000000, just a fraction of them can be safely used as sentinel values, and just under some conditions. Too variable to be useful.

    1. Eric Wilson says:

      Why would a user mode function return a pointer to an address in kernel space? You can’t read or write to it from user space, so that doesn’t sound especially useful.

      1. poizan42 says:

        Why are there kernel mode pointers in the TEB? – Well actually I’m not sure I want to know the answer to that…

      2. Antonio Rodríguez says:

        It’s not that an user function could return a kernel space pointer. It’s that, AFAIK, a kernel space pointer can, under some circumstances, be valid in user mode. And this opens the possibility that the pointer gets mistaken for a sentinel value.

  8. Yuhong Bao says:

    “Prior to Windows 8, applications could unreserve the bottom 64KB of address space and allocate actual memory there, which created the opportunity for mass confusion.”
    And security problems. Of course, you can still enable NTVDM and get the support back.

    1. Joshua says:

      Oh man. Something I discussed once to make .NET code untraceable was to allocate that bottom 64k and fill it with zeros. Deferencing nulls in C# yielded zeroed objects. Usually this would try to follow a virtual function call and crash, but if it wasn’t a virtual call it would actually call the object code with null this, which might succeed on doing something interesting if the code is just so.

  9. voo says:

    There is also the easiest of all solutions although it may waste some space depending on how large your widgets are:

    Widget *my_great_sentinal = new Widget;

    If your widget is only a few bytes large that’s a nice, easy to understand, hard to get wrong solution. Although it’s a pity that you can’t demonstrate how clever you are by using it.

    1. pc says:

      Raymond does even suggest that in the article: “If you need a small number of sentinel values, you could just allocate a few objects for the sole purpose of providing an address.”

      If one wanted to be fancy, one can be all object oriented, with a CouldBeAWidget base class/interface, and SentinelWidget and RealWidget implementations. I’m sure that that’s all I’d ever need to do on the systems I generally deal with, but Raymond (and presumably others) sometimes deal with the lower levels of things where I can imagine it’d be useful to overload pointer values in these various ways.

    2. kme says:

      You can also do things like:

      static const char sentinels[3];

      const Widget * const widget_default = (const Widget *)&sentinels[0];
      const Widget * const widget_all = (const Widget *)&sentinels[1];
      const Widget * const widget_inherit = (const Widget *)&sentinels[2];

      1. David Haim says:

        I’m not sure how valid it is. It definitely breaks the strict aliasing rule , and I’m not sure about the alignment ..

        1. David Haim says:

          Although now that I think about it it could work with array of unions where each Union contains T and char c. This will fix both the strict aliasing and the alignment issues . Just don’t dereference that pointer..

        2. Karellen says:

          The strict aliasing rule only affects dereferenced pointers. If you never dereference the sentinel pointers (and you shouldn’t if you’re using most of the other ideas in this post/comments) then you’re OK. Ditto on the alignment. It doesn’t matter. The pointers are only sentinels to indicate special cases; like NULL, they’re not *meant* to be pointers to valid objects, or to be dereferenced. They’re just values that can be identified, and won’t ever be used as pointers to valid objects.

          The GP’s solution has the advantage of only using 1 byte per sentinel (plus the sentinel value itself), whereas your other suggestion uses sizeof(Widget) per sentinel. At that point, I don’t see the advantage in using an array of unions over a plain array of Widgets.

          Although I think the GPs suggestion falls under Raymond’s “you could just allocate a few objects for the sole purpose of providing an address.” (with the caveat that the objects you’ve allocated (chars) aren’t of the expected type, but not in any way that matters) – because that’s the reason I didn’t post the exact same comment myself a few hours before!

          1. Ben Voigt (Visual Studio and Development Technologies MVP with C++ focus) says:

            Alignment absolutely does still matter. The only rule that guarantees that widget_all != widget_inherit is the one that makes them round-trip when converted back to the original pointer type (char*), and that rule doesn’t apply unless alignment is respected.

  10. Martin Bonner says:

    Critique: Presumably the idea is that 32-bit Windows uses the top half of the address space for the kernel so user objects won’t go there. Problems:
    64-bit totally messes this up
    Large-address-space free everything below 3G for the user.
    Shared memory goes above the limit anyway.

  11. MV says:

    I once wrote an application that kept important data in the low-order 2 bits of 4-byte-aligned pointers. Essentially the pointer could point to any of 4 different kinds of things, without requiring the pointed-to things to inherit from a common base. I still can’t decide if it was an elegant solution, or a horrible hack.

    1. Electron Shepherd says:

      Well, elegant or hack, the kernel people were thinking along the same lines.

      see https://blogs.msdn.microsoft.com/oldnewthing/20050121-00/?p=36633

  12. Tor says:

    Does using misaligned pointers as sentinels not violate some C (and C++) rule about pointers?

    1. laonianren says:

      Some processors have used pointer registers that validate a pointer when they are loaded, so a bad pointer will immediately fault. Thus it is undefined behaviour in C/C++ to even form an invalid pointer. (At least it used to be. I don’t keep up with this kind of thing any more.)

      Of course, you are free to use generally undefined behaviour that actually is defined on your target platform.

      1. laonianren says:

        Though it might still trip you up. A clever compiler could easily decide that a test like “if (1 & (int)pointer) DoSentinelStuff();” could never succeed and throw it away. Though if msvc did this I suspect Windows would break long before I was troubled by it.

        1. Tor says:

          Yes, “your target platform” here has to include all of CPU, OS, compiler and C library. Even if your CPU does a well-defined thing the other parts may still assume your C program is free of undefined behavior.

Skip to main content