The memcmp function reports the result of the comparison at the point of the first difference, but it can still read past that point


This story originally involved a more complex data structure, but that would have required too much explaining (with relatively little benefit since the data structure was not related to the moral of the story), so I'm going to retell it with double null-terminated strings as the data structure instead.

Consider the following code to compare two double-null-terminated strings for equality:

size_t SizeOfDoubleNullTerminatedString(const char *s)
{
  const char *start = s;
  for (; *s; s += strlen(s) + 1) { }
  return s - start + 1;
}

BOOL AreDoubleNullTerminatedStringsEqual(
    const char *s, const char *t)
{
 size_t slen = SizeOfDoubleNullTerminatedString(s);
 size_t tlen = SizeOfDoubleNullTerminatedString(t);
 return slen == tlen && memcmp(s, t, slen) == 0;
}

"Aha, this code is inefficient. Since the memcmp function stops comparing as soon as it finds a difference, I can skip the call to SizeOfDoubleNullTerminatedString(t) and simply write

BOOL AreDoubleNullTerminatedStringsEqual(
    const char *s, const char *t)
{
 return memcmp(s, t, SizeOfDoubleNullTerminatedString(s)) == 0;
}

because we can never read past the end of t: If the strings are equal, then tlen will be equal to slen anyway, so the buffer size is correct. And if the strings are different, the difference will be found at or before the end of t, since it is not possible for a double-null-terminated string to be a prefix of another double-null-terminated string. In both cases, we never read past the end of t."

This analysis is based on a flawed assumption, namely, that memcmp compares byte-by-byte and does not look at bytes beyond the first point of difference. The memcmp function makes no such guarantee. It is permitted to read all the bytes from both buffers before reporting the result of the comparison.

In fact, most implementations of memcmp do read past the point of first difference. Your typical library will try to compare the two buffers in register-sized chunks rather than byte-by-byte. (This is particularly convenient on x86 thanks to the block comparison instruction rep cmpsd which compares two memory blocks in DWORD-sized chunks, and x64 doubles your fun with rep cmpsq.) Once it finds two chunks which differ, it then studies the bytes within the chunks to determine what the return value should be.

(Indeed, people with free time on their hands or simply enjoy a challenge will try to outdo the runtime library with fancy-pants memcmp algorithms which compare the buffers in larger-than-normal chunks by doing things like comparing via SIMD registers.)

To illustrate, consider an implementation of memcmp which uses 4-byte chunks. Typically, memory comparison functions do some preliminary work to get the buffers aligned, but let's ignore that part since it isn't interesting. The inner loop goes like this:

while (length >= 4)
{
 int32 schunk = *(int32*)s;
 int32 tchunk = *(int32*)t;
 if (schunk != tchunk) {
   -- difference found - calculate and return result
 }
 length -= 4;
 s += 4;
 t += 4;
}

Let's compare the strings s = "a\0b\0\0" and t = "a\0\0". The size of the double-null-terminated string s is 4, so the memory comparison goes like this: First we read four bytes from s into schunk, resulting in (on a little-endian machine) 0x00620061. Next, we read four bytes from t into tchunk, resulting in 0x??000061. Oops, we read one byte past the end of the buffer.

If t happened to sit right at the end of a page, and the next page was uncommitted memory, then you take an access violation while trying to read tchunk. Your optimization turned into a crash.

Remember, when you say that a buffer is a particular size, the basic ground rules of programming say that it really has to be that size.

Comments (27)
  1. Anonymous says:

    It turns into a very unlikely crash. Very bad situation.

  2. Anonymous says:

    "we read four bytes from t into tchunk, resulting in 0x??000061"

    mmm … no. Correct me if i am wrong, but the t string is "a", so it has another 0 in the end. You probably wanted to type t = "a" ???

    [More precisely, t is char t[3] = "a". Don't make me bring back the nitpicker's corner. -Raymond]
  3. Anonymous says:

    That read 4 bytes at a time trick is used in the standard strlen() implementation in my copy of Visual Studio, and even though it can read off the end of the string by up to 3 bytes it won't crash as the pointer it reads from is 4 byte aligned, which means the extra 3 bytes have to come from the same page.

    Of course if the memcmp algorithm does misaligned 4 byte reads in some cases (for example it might be quicker to do misaligned reads if the offset between the two inputs isn't a multiple of 4 bytes) then that could cause a crash.

    You could easily write a custom memcmp function that did read 4 (or more) bytes at a time and wouldn't crash in this case though – you just need to just avoid misaligned reads. If it's worth doing that to speed things up depends on how performance critical that bit of code is.

  4. Anonymous says:

    I was going to make the same comment as Adam: I would assume that the 4-byte reads are memory-aligned, in which case all four bytes in a single read must be in the same page and no access violation can occur.

    If that assumption isn't true, then what was Raymond's comment "Typically, memory comparison functions do some preliminary work to get the buffers aligned, but let's ignore that part since it isn't interesting" about?

    [Sometimes you can't align both pointers simultaneously. It's in those cases where you get the misaligned read from t. -Raymond]
  5. You probably wanted to type t = "a"

    Dangerous waters… most ways to write this are confusing. How about:

    char s[] = { 'a', '', 'b', '', '' }; // ARRAYSIZE(s) == 5

    char t[] = { 'a', '', '' }; // ARRAYSIZE(t) == 3

  6. Anonymous says:

    I'd be incredibly surprised if this implementation of AreDoubleNullTerminatedStringsEqual ever caused a crash.  Yes, it's undefined behavior, but I doubt any memcmp implementations could cause a crash in this situation.

    If you're optimizing by doing word-sized (or double- or quad-word sized) reads, those reads are going to be aligned; you would only get a crash by reading across a page boundary if those reads were unaligned.

    And Raymond, thanks for the link to the "Exploring memcmp" article, which linked to an intriguing benchmark of various algorithms for the needle-in-a-haystack problem (news.ycombinator.com/item).

    [Then I guess you're incredibly surprised, because I wrote up this article in response to a crash I was asked to investigate. -Raymond]
  7. Anonymous says:

    Oh, excellent point about comparing pointers of different alignments.  I guess I'd assumed that the implementation would still do aligned reads and bit-twiddle before comparing words, as opposed to doing an aligned read and an unaligned read (letting the processor due the bit twiddling).  I apparently guessed wrong, and you can color me surprised.

  8. Anonymous says:

    Incidentally, on a big-endian architecture, I think the optimized function could skip the byte-per-byte comparison altogether and simply return the result of the chunk comparison…

    I like putting buffers at the end of a page. I did it once to safely play with the dreaded gets() function…

  9. Anonymous says:

    memcmp returns 0 if the blocks of memory are identical, so the return statement in the original function (for example) should be:

     return slen == tlen && memcmp(s, t, slen) == 0;

    [Fixed, thanks. -Raymond]
  10. Anonymous says:

    > Incidentally, on a big-endian architecture, I think the optimized function could skip the byte-per-byte comparison altogether and simply return the result of the chunk comparison… <<

    Be careful with that – I don't think it would hold for situations where the plain-old-char type is signed.

  11. DWalker59 says:

    "In fact, most implementations of memcmp do read past the point of first difference": I would have guessed that most implementations wouldn't read past the declared length of the strings, but it sounds like I might be wrong here.

    IIRC, IBM mainframes have a single instruction that will compare strings of any length, and return the first place where the strings differ.  The instruction is documented as "interruptible" every time data was fetched from a new 4096-byte memory page (which means that the underlying data can be changed by a different process if it's not properly locked).  There's a different (faster) instruction that can compare strings up to 255 or 256 bytes in length.

  12. Anonymous says:

    DWalker59:

    It won't read past the declared length of the string, in cases where the length isn't a multiple of the chunk size then it will read byte by byte for a small part of the buffer. So if it is 17 bytes long on a 32 bit system, it will read 4 32 bit values and then it reads the last byte on it's own.

  13. Anonymous says:

    > I would have guessed that most implementations wouldn't read past the declared length of the strings, but it sounds like I might be wrong here. <<

    In Raymond's example, the 'declared' length of the buffer(s) is passed as the last parameter to memcmp().  The problem is that the caller is counting on the fact that if the second buffer is smaller than the first that memcmp() will find the difference before accessing past the end of that buffer.

    However, library implementations that might read past the end of a declared buffer – if they can determine that it's safe to do so (as Adam Rosenfield indicated, if the library can determine that the few bytes past the end of a buffer would have to be on the same page, for example).  However, for the library to know whether it's safe or not, you cannot lie to it about the actual size of the buffers.

  14. Anonymous says:

    Indeed, an implementation of memcmp that does stop at the point of first difference is insecure for comparing crypto keys, because of timing attacks. See Nate Lawson's blog, for example, for more info.

  15. Anonymous says:

    I remember there was a bug on some gcc version where the memcmp() function didn't set the x86 direction flag. As almost nothing sets that flag it would work ok most of the time, but weird bugs would appear on occasion.

    There is nothing that says a memcmp() must be from lower address to higher address. It will need to do that when it finds a difference (<0 or >0 return value), but if they are equal the result is the same.

  16. Anonymous says:

    Once again null terminated strings displays an obvious flaw compared to pascal strings.

    [As I noted, the original problem was unrelated to strings. And you can make this same mistake with pascal strings:

    bool ArePascalStringsEqual(PascalString s, PascalString t)
    {
     return memcmp(s.characters, t.characters, s.length) == 0 && s.length == t.length;
    }

    -Raymond]

  17. Anonymous says:

    * If s and t are aligned on register-size boundaries, all it good.

    * If s and t are similarly-unaligned, code can compensate for it

    * If s and t are differently-aligned, what would the code do? Make aligned reads from both buffer and do partial/shifted comparisons of registers? Sounds risky business to me.

  18. an implementation of memcmp that does stop at the point of first difference is insecure for comparing crypto keys

    Yeah.

    Um… what?

  19. Anonymous says:

    Maurits; timing attacks

  20. Anonymous says:

    RE: "try to outdo the runtime library" using SIMD. Some runtime libraries do this for you, e.g. sourceware.org/git

    Warning: may infect your brain with viral license.

  21. Anonymous says:

    @mikeb: >>Be careful with that – I don't think it would hold for situations where the plain-old-char type is signed.<<

    You may be right. Comparison with signed chars seems to have some weird results, regardless of endianness. Interestingly, I tried comparison functions under Microsoft Windows 32-bit on x86, and both strcmp() and memcmp() seem to assume chars are unsigned, and consider accented letters to be higher than ASCII.

    @Yuhong Bao: Thanks, that was instructive (rdist.root.org/…/exploiting-remote-timing-attacks).

  22. Anonymous says:

    @nlucas: If I recall correctly, it was even more exotic. The direction flag is supposed to be always clear, so memcmp() did not clear it. However, there was a bug in the Linux kernel where it did not clear the direction flag when calling a signal handler. So, if you had a memcmp() call within a signal handler, AND there was something elsewhere in your program which temporarily set the direction flag (perhaps to do a backwards memmove()) and cleared it afterwards, AND that specific signal happened to hit in that small window where the direction flag was set, AND the code was compiled with a newer gcc version which did not neurotically clear the direction flag every time before doing a memcmp(), then the memcmp() code (which expected the direction flag clear) ran with the direction flag set.

    No wonder it took a long time for that Linux kernel bug to be found.

  23. Ah… I see… timing attacks allow you to guess how many bytes you have right based on how quickly it takes to fail.

  24. Anonymous says:

    @DWalker:

    "IIRC, IBM mainframes have a single instruction that will compare strings of any length, and return the first place where the strings differ"

    Um… Just like old good REP CMPS in x86?

  25. Anonymous says:

    [I'm sure I posted this up before, but for some reason my post hasn't appeared … blogs.msdn.com doesn't seem to like me :/ Just as well I cut and pasted it into Notepad before submission…]

    Hah! The direction flag problem reminds me of a similarly obscure problem with DLLs compiled with some versions of the Borland runtime, which would cause random crashes in programs which loaded the DLL. In this case, it was a keyboard hook DLL, so the program had no choice but to load it once you pressed a key. [Hint, Raymond: I am sure Microsoft must have had cases like this and it'd make a good story ;)]

    Once that was done, the trap was set so to speak. You see, there's a floating point control word which controls the behaviour of the floating point coprocessor. And in there is a single bit, which if you set it tells the processor 'raise an exception if I try and do something impossible'. By default, Microsoft leaves this unset, as do virtually all applications; after all IIRC the IEEE standard expects the application to do error checking.

    Here's the insidious bit. Say you have a calculation, a = x+y, and earlier in the code, you have x=b/c. If c=0, you get a divide by zero, the code merrily continues and the floating point processor returns the answer NaN (Not a Number). Yes, the floating point processor doesn't consider divide by zero to be an exception worthy of signalling a fault. It just returns the special NaN value. But, what happens when you try to calculate a = x+y…? Well, NaN+anything raises an exception. You see, it's fine to return NaN as a value, but using it in a function will trigger the bug.

    And that was where the DLL in question caused the problem. By flipping that bit, it destabilised operations which had been fine before. But why would you operate on invalid values in the first place? Well, say you had to add up twenty values, divide them by the sum of another five values, raise it to the power of N, then take its sine … it's far more efficient to just chuck the lot in there and check if you get a NaN out of the other end than to check all the values you're using!

    Nothing you can do about it. Your code is now unstable thanks to a design decision someone else took when designing their runtime libraries. I believe this problem was fixed in later versions of the Borland runtimes. My writeup's somewhere in gtk's Bugzilla system…

    [Yup, I had an example of a DLL which messed with the host process's floating point state a few years ago. -Raymond]
  26. Anonymous says:

    We don't say "if are double null terminated strings equal", we say "if double null terminated strings are equal", so it should be DoubleNullTerminatedStringsAreEqual.

    It's a freaking PREDICATE, not a QUESTION!

    [By convention in Win32, predicates begin with Is or Are (e.g. IsWindow, AreFileAPIsANSI). -Raymond]
  27. Anonymous says:

    Regarding the direction flag, there is a bug in Windows XP and 2003 regarding vectored exception handlers: if you do a memcmp() inside a vectored exception handler, and the exception occurred during a backward memmove() that sets the direction flag, your memcmp() will go the wrong direction; KiUserExceptionDispatcher doesn't clear the direction flag like it should according to the x86 ABI.

    The fix for this is to do __asm cld at the beginning of a vectored exception handler.  The same bug applies for x86-64; to work around it on that, since you can't use __asm, try __writeeflags(__readeflags() & ~((__int64) 0x400)).

    The bug was fixed in Vista.  The reason that normal exception handlers don't have this problem is because _except_handler4 executes a cld opcode.

Comments are closed.