How to check if a pointer is in a range of memory


Suppose you have a range of memory described by two variables, say,

byte* regionStart;
size_t regionSize;

And suppose you want to check whether a pointers lies within that region. You might be tempted to write

if (p >= regionStart && p < regionStart + regionSize)

but is this actually guaranteed according to the standard?

The relevant portion of the C standard (6.5.8 Relational Operators)¹ says

If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.

Now remember that the C language was defined to cover a large range of computer architectures, including many which would be considered museum relics today. It therefore takes a very conservative view of what is permitted, so that it remains possible to write C programs for those ancient systems. (Which weren't quite so ancient at the time.)

Bearing that in mind, it is still possible for an allocation to generate a pointer that satisfies the condition despite the pointer not pointing into the region. This will happen, for example, on an 80286 in protected mode, which is used by Windows 3.x in Standard mode and OS/2 1.x.

In this system, pointers are 32-bit values, split into two 16-bit parts, traditionally written as XXXX:YYYY. The first 16-bit part (XXXX) is the "selector", which chooses a bank of 64KB. The second 16-bit part (YYYY) is the "offset", which chooses a byte within that 64KB bank. (It's more complicated than this, but let's just leave it at that for the purpose of this discussion.)

Memory blocks larger than 64KB are broken up into 64KB chunks. To move from one chunk to the next, you add 8 to the selector. For example, the byte after 0101:FFFF is 0109:0000.

But why do you add 8 to move to the next selector? Why not just increment the selector? Because the bottom three bits of the selector are used for other things. In particular, the bottom bit of the selector is used to choose the selector table. Let's ignore bits 1 and 2 since they are not relevant to the discussion. Assume for convenience that they are always zero.²

There are two tables which describe how selectors correspond to physical memory, the Global Descriptor Table (for memory shared across all processes) and the Local Descriptor Table (for memory private to a single process). Therefore, the selectors available for process private memory are 0001, 0009, 0011, 0019, etc. Meanwhile, the selectors available for global memory are 0008, 0010, 0018, 0020, etc. (Selector 0000 is reserved.)

Okay, now we can set up our counter-example. Suppose regionStart = 0101:0000 and regionSize = 0x00020000. This means that the guarded addresses are 0101:0000 through 0101:FFFF and 0109:0000 through 0109:FFFF. Furthermore, regionStart + regionSize = 0111:0000.

Meanwhile, suppose there is some global memory that happens to be allocated at 0108:0000. This is a global memory allocation because the selector is an even number.

Observe that the global memory allocation is not part of the guarded region, but its pointer value does satisfy the numeric inequality 0101:00000108:0000 < 0111:0000.

Bonus chatter: Even on CPU architectures with a flat memory model, the test can fail. Modern compilers take advantage of undefined behavior and optimize accordingly. If they see a relational comparison between pointers, they are permitted to assume that the pointers point into the same aggregate or array (or one past the last element of that array), because any other type of relational comparison is undefined. Specifically, if regionStart points to the start of an array or aggregate, then the only pointers that can legally be relationally compared with regionStart are the ones of the form regionStart, regionStart + 1, regionStart + 2, …, regionStart + regionSize. For all of these pointers, the condition p >= regionStart is true and can therefore be optimized out, reducing the test to

if (p < regionStart + regionSize)

which will now be satisfied for pointers that are numerically less than regionStart.

(You might run into this scenario if, as in the original question that inspired this answer, you allocated the region with regionStart = malloc(n), or if your region is a "quick access" pool of preallocated items and you want to decide whether you need to free the pointer.)

Moral of the story: This code is not safe, not even on flat architectures.

But all is not lost: The pointer-to-integer conversion is implementation-defined, which means that your implementation must document how it works. If your implementation defines the pointer-to-integer conversion as producing the numeric value of the linear address of the object referenced by the pointer, and you know that you are on a flat architecture, then what you can do is compare integers rather than pointers. Integer comparisons are not constrained in the same way that pointer comparisons are.

    if ((uintptr_t)p >= (uintptr_t)regionStart &&
        (uintptr_t)p < (uintptr_t)regionStart + (uintptr_t)regionSize)

¹ Note that comparison for equality and inequality are not considered relational comparisons.

² I know that in practice they aren't. I'm assuming they are zero for convenience.

(This article was adapted from my answer on StackOverflow.)

Update: Clarification that the "start of region" optimization is available only when regionStart points to the start of an array or aggregate.

Comments (33)
  1. JB says:

    If you can’t compare two general pointer to see which one is “bigger” then it’s broken.
    I’ll leave others to argue if it’s the compiler or the standard that’s broken, but something certainly is.

    1. Joshua says:

      If you’re on a flat architecture you can write portable code to suoress that optimization. Pass the address of the pointer to a function that does nothing but can’t be inlined.

      If you’re not on a flat architecture then there’s no way to write this. This is one of the reasons memmove is a platform function. You cannot write memmove in portable C.

    2. Antonio Rodríguez says:

      You may say vintage architectures are broken from our modern point of view, now that non-flat models are little more than relics. But good luck taking the time machine and convincing designers of former architectures, which had to put up with compatibility, performance and memory economy. The original 8086, for example, introduced the segmented model so it could both run 8-bit software with little modification (if you use the “tiny memory model”, where all segments pointed to the same 64 KB area, you get something pretty similar to a “virtual 8080” mode) while allowing modern software to address up to 1 MB of memory.

      Of course, the 8086 is not binary compatible with the 8080. But Intel specifically designed its architecture and instruction set so each 8080 opcode had an 8086 equivalent. That way, you could use an special “translation assembler” to generate 8086 binaries from 8080 assembly files, which would run fine in the tiny memory model. This allowed to easily port CP/M software into MS-DOS, which was very important back in the day.

  2. Chris says:

    There’s a very inefficient method that is 100% portable and doesn’t rely on implementation-defined behavior: Use == rather than . Walk the region one byte at a time and compare each byte == to p. In theory the compiler could convert this into the efficient form in assembly, but I’ve never seen this optimization occur in practice.

    1. Dave Harris says:

      That’s not portable. It will crash if you make a pointer to memory that doesn’t exist, and if the hardware checks for such pointers when they are loaded into registers rather than when the register is dereferenced. (Checking on load may be more efficient if a pointer is loaded once and dereferenced many times.)

  3. Andre says:

    > If they see a relational comparison between pointers, they are permitted to assume that the pointers point into the same aggregate or array
    Okay
    > Specifically, the only pointers that can legally be relationally compared with regionStart are the ones of the form regionStart,
    regionStart + 1, regionStart + 2, …,

    How does the second follow from the first? If the compiler can’t see where regionStart comes from, it may be possible regionStart itself points inside an array, but not to the first element. For example:

    byte data[50];
    byte* regionStart = data+10;
    size_t regionSize = 20;
    non_inline_function_doing_the_comparison(regionStart, regionSize, data+5);

    Maybe I missed something.

    1. You’re right. The optimization is available only if the compiler can observe that regionStart is the start of an array. (Which might be something people do. “If the pointer is in my “quick access” array, then don’t call free.”)

      1. Andre says:

        Ah, I see. That “local pool” scenario indeed seems very realistic and a hard to catch bug.

      2. Also: Lost in editing was that the original code initialized regionStart as regionStart = malloc(n);, and that puts regionStart at the start of an aggregate.

        1. Joshua says:

          And the compiler knows I didn’t replace malloc with an arena implementation how?

          1. Because you’re not allowed to do that. 7.1.3 Reserved Identifiers: “[A]ll external identifiers defined by the library are reserved in a hosted environment. This means, in effect, that no user-supplied external names may match library names, not even if the user function has the same specification.” (Emphasis in original.) This is what allows, for example, the compiler to inline standard library functions.

          2. skSdnW says:

            @Raymond VC will sometimes convert a loop into a call to memset and it does this even when you use /Zl. How are you supposed to provide your own memset (to work around a VC bug) if it is reserved?

          3. It should be clear that the standard cannot provide standard-compliant ways of working around bugs in the implementation. It’s not like the standard can say “If the implementation implements integer division incorrectly, a conforming program can use this workaround to force the implementation to use an alternate division algorithm,” (And repeat for each possible thing an implementation could have a bug in.) Working around implementation bugs is clearly an implementation-specific matter, outside the scope of the standard.

  4. Matt says:

    So the only way to do it “properly” is to check each pointer in the region one-by-one?

    1. Antonio Rodríguez says:

      Basically, the specification boils down to “only comparisons of pointers to the same object [array, structure, etc.] are defined”. As noted by Raymond, the only way to do it “properly” depends on the conversion from pointer to integer. And, as that operation is machine-dependant, there is not way to make it in a portable fashion.

      Not that it matters much. It would be argued that code which compares pointers to different objects and, thus, depends on memory allocation order is broken. Range-checking only makes sense in the cases defined by the C specification.

      1. Viila says:

        Sometimes depending on the memory allocation order is exactly what you want. I have few times used the memory addresses to provide total ordering for a group of objects. It’s especially handy to breaking ties (two objects compare equal in the ordering criteria? Fine, the one with lower address is less than), or sometimes the objects don’t even have any useful way to compare them (in which case you probably don’t care in what order the are, just that they are in some stable order, so memory order is fine).

  5. Antonio Rodríguez says:

    One for the nitpicker’s corner: the selector bit which toggled between the GDT and the LDT was bit 2, not bit 0. See http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and-protection/ . This doesn’t change Raymond’s exposition, as GDT and LDT selectors are still interleaved.

    Bits 0 and 1 contains the “requested RPL” for the segment. Basically, it’s a two bit integer (0, 1, 2 or 3) telling which privilege level you are requesting when accessing the segment. If your requested privilege level is wrong (it compares with the descriptor’s privilege level on segment register load), the petition will be denied, and you won’t go to space today :-) .

    1. parkrrrr says:

      Thank you for making me feel slightly less insane. I had a vague thought that that was the case, from a youth misspent doing assembly language for the 80286 on an operating system that actually used call gates and such, but I was feeling too lazy to look it up.

  6. ranta says:

    C++ has std::less, which returns consistent results and does not cause undefined behavior when comparing pointers into different arrays; but AFAICT it still isn’t required to be useful for checking whether an address is within an array. For instance, with far pointers, it could treat offsets as more significant than selectors. And unlike casting to uintptr_t, the behavior isn’t even implementation-defined.

    1. Good point. std::less merely extends the existing partial order to a total order, so it removes the undefined behavior, but you could still have pointers p that satisfy x ≤ p < (x+n) even though p is not in the array of length n pointed to by x.

      1. ranta says:

        Thinking a bit further…

        You could first use std::less etc. to check whether the pointer seems to be between the beginning and the end of the array. If the result is negative, then that is reliable. If the result is positive, then run a binary search to find the apparent position in the array, and finally use == to make sure. The complexity would be O(log(size of array)).

        OTOH, this kind of code would be useful only as a fallback for (architecture, compiler) pairs on which the uintptr_t cast doesn’t work. So if you don’t support customers building your code on random DeathStations, then there’s hardly a reason to optimize the fallback.

  7. DonH says:

    It was even worse than that on OS/2 1.x, because you couldn’t just add 8 to the selector. You needed to call DosGetHugeShift(&usShiftCount), then shift 1 left by usShiftCount, and then use the resulting number as the selector increment. I can’t imagine why this system never caught on.

  8. James Curran says:

    I recall being at a C++ standardization meeting were they had a similar debate. They wanted to say something (an array or vector or such) was in “contiguous” memory. But what exactly is “contiguous” memory? When one byte is “next” to another? So the diodes have to be adjacent on the chip? And an algorithmic method of defining it runs into problems when you remember that in C++ you can redefine the meaning of ++, [ ], ==, in reference to your object.

  9. Fred says:

    This is the kind of optimisation that would make me consider the compiler “broken”, because any compiler smart enough to see that it’s “UB that can be optimised away” should also be smart enough to see that the test was put here for a reason (much like, say, an attempt to detect overflow in the result of a mathematical operation).

    Or at least, it should output a warning that the comparison relies on UB.

    1. Fred says:

      In fact, range checking and after-the-fact overflow checks seem to be the main occurrences of the “ha ha it’s actually UB” gotcha, but are there other similar cases?

      1. The premature downcast and calling a never-called function are two examples that aren’t array-out-of-bounds or overflow.

  10. Anon says:

    “But why do you add 8 to move to the next selector? Why not just increment the selector? Because the bottom three bits of the selector are used for other things. In particular, the bottom bit of the selector is used to choose the selector table. Let’s ignore bits 1 and 2 since they are not relevant to the discussion. Assume for convenience that they are always zero.²”

    Back in the days of 16 bit Windows you, or more likely a C compiler, could write code which referenced a magic windows kernel variable called (IIRC)_AHINCR. That told you ‘how much to increment a selector by to move 64 bytes forward’. So in 16 bit protected mode it would be 8 and in 16 bit real mode it would 1000h (65536>>4).

    So a n*64K buffer would have n 64K selectors assigned to it, and you add _AHINCR to the value in the segment register to move between them.

    Once 32 bit protected mode was possible the kernel would set the limit to the first selector to span the whole buffer on a 386+. So if the application knew about 32 bit mode it could just use a 32 bit offset from the first selector to access the whole buffer and it didn’t need to reload segment values. Which was no bad thing because loading a selector value into a segment register was not a fast process.

    I always thought the whole scheme was rather neat – you could write gnarly segment arithmetic code for huge pointer access which was portable between real and protected mode and run on 8086,80286,80386+. And you could optionally write fast code for that avoid reloading segment values if you knew you were on a 386+.

  11. Stephen Hewitt says:

    As so often happens when I read your blog I realise I’ve been doing it wrong for years.

  12. Dave Harris says:

    As an aside, range checks are much easier to read if the ‘<' signs all point the same way, like (low <= value && value = low && value < high).

    1. It also works best when you use actual comparison operators (==) instead of assignment operators (=).

      1. Dave Harris says:

        Looks like the text editor got confused by the < & characters. What I typed had "this rather than that“, with two expressions for this and that, but the text edited combined them to look like one monster expression. It’s pure luck it compiles at all.

  13. Andrew Rogers (ex-MSFT) says:

    It’s also interesting to note that there is a similar problem with UB in (signed) integer overflow if you’re trying to validate “(pointer + size)” where “pointer” and “size” are incorrect / untrusted and may cause integer overflow.

    There is an interesting security corollary to these facts: pointer tests that can only fail if they trigger undefined behaviour can be validly optimised away. An example of this with integer overflow UB was reported on GCC, way back in 2006,:

    http://www.kb.cert.org/vuls/id/162289
    http://gcc.gnu.org/ml/gcc-bugs/2006-04/msg01297.html

    Though MSVC at the time wasn’t using this UB as an optimisation opportunity, it sparked a purge of such UB on integer overflow within MSFT. Interestingly, some 10 years later, the codegen (compiler) team have created a new optimiser that can and does take advantage of integer overflow UB (c.f. “Taking advantage of signed integer overflow being undefined”):

    https://blogs.msdn.microsoft.com/vcblog/2016/05/04/new-code-optimizer/

    While they are still specifically trying to avoid optimising away anything that looks like a pointer test, the moral is that you need to avoid relying on UB in any integer arithmetic and in particular, if you rely on integer arithmetic UB, you may find parts of your code changed in ways that you didn’t expect (like having security checks removed)!

  14. Neil says:

    Of course, an 80286 OS might decide not to use GDT entries for “compatibility” so that no valid pointer value can occur in the middle of a huge allocation. (I understand that Windows Standard Mode made very sparse use of GDT entries and it wouldn’t surprise me if this was a contributing factor.)

Comments are closed.

Skip to main content