Can memcpy go into an infinite loop? Why is it being blamed for a busy hang?

A customer reported that their program locks up once or twice each day. They took a full dump of the file while it was hung and asked windbg to analyze it. Here's a redacted version of what it said:



The Buffer::Compact method shifts some memory around inside the buffer:

void Buffer::Compact()
  if (m_bytesRead > 0) {
    memmove(m_buffer, m_buffer + m_bytesRead, m_capacity - m_bytesRead);
    m_capacity -= m_bytesRead;
    m_bytesRead = 0;

"Is it possible that memmove has a busy wait? What could it be waiting for?"

The memmove function doesn't have a busy loop where it waits for something. It just moves the memory from one location to another.

What's probably happening is that there is a busy loop higher up the stack. Maybe Buffer­Reader::Parse has gotten into a loop, or (my guess) handle_events is stuck in a loop processing a huge number of incoming events.

When you take the memory dump, you are capturing the program at a moment in time. All you know is that the thread is probably in a busy wait, but the source of the busy wait need not be the thing at the top of the stack.

If memcpy is consistently at the top of the stack, then it means that the thread is spending most of its time copying memory. But that doesn't necessarily mean that memcpy is stuck in a loop. The more likely reason is that the thread is busy doing some larger operation, and that larger operation entails a lot of memcpy operations.

Though in extreme edge cases it might be a busy loop.

Sort-of related.

Exercise: The customer's code calls memmove, so why is the memcpy function the one at the top of the stack? What happened to memmove?

Comments (18)
  1. kantos says:

    Exercise answer: This would seem to be a case of the c++ “As-if” optimization clause striking, where the compiler knows that _memcpy is safe to use in this instance for various reasons and substitutes it in. The restrictions on non-overlapping and non-aliased buffers for memcpy is a standard detail that the implementation is free to ignore as it likes.

    1. Myria says:

      It’s clearly unsafe to substitute memcpy for memmove in this instance – the destination pointer and source pointer are derived from the same pointer. It’s more likely that memmove itself noticed that the default memcpy (forward mode) worked for the pointer values it was given and jumped to memcpy.

      1. Kevin says:

        Linus wrote a lengthy rant about this on a thread with some glibc developers.

        The very short version is that the memcpy implementation cannot be materially faster than the memmove implementation, so the former should just be an alias for the latter on most “reasonable” libc implementations. Or equivalently, you fold the latter’s guarantees into the former and do the aliasing the other way around.

        1. Zan Lynx' says:

          I remember some oprofile runs on Linux around 2005 that disagree with you and Linus. memmove() had a consistent branch prediction failure.

          Maybe more modern CPU designs have improved the branch prediction to avoid this. I don’t see how, but maybe. But I’d bet that if I construct a benchmark that builds an array of non-overlapping, randomized memory blocks and I run a few million memmove vs memcpy, memcpy will come out faster.

          1. Kevin says:

            If the branch mispredict is consistent, and therefore predictable, then maybe the branch predictor sucks. Just a thought.

        2. Dave says:

          @Kevin: The thread is here, Linus’ comments start about 20% of the way down.

          Oh, and in terms of it being a rant: That wasn’t a Linus rant, it never made the evening news, there were no overflowing morgues, no cleanup crews had to go in with shovels and quicklime, I mean it was practically a love bite.

          1. Kevin says:

            The glibc bug ( had the potential to turn into an epic flame war when Ulrich Drepper rather brusquely disagreed with Linus, but surprisingly they were able to work it out like civilized adults. I would not have called that outcome.

  2. Brian_EE says:

    Exercise: The compiler was able to detect that the source is always higher than the destination. So it inserted an optimization that substitutes memcpy.

  3. Darran Rowe says:

    The most logical thing would be that is how memmove is coded.
    It tests to see if there are any overlapped regions of memory, if there are not then it will jump straight to memcpy. If memory serves me correctly, it was written in assembly all the way up to VC2013, VC2015 had the great CRT refactor and changed things.

    1. Ar says:

      This! You can easily see by statically linking the CRT and disassembling your program. You’ll see memmove and memcpy are internally the same handwritten assembly function, but just some checks and handling for the overlap. And you’ll also see that the compiler tries to inline many common variants of memcpy and memmove, so it can leave out many of these checks and handling. Last you’ll see that memcpy’s code has changed almost with every CRT release to account for changing capabilities of never Intel and AMD chips. And as such the exact callstack of such a memcpy/memmove hang can differ depending on what Visual Studio version or optimisation settings were used.

      Who knew memcpy could be so complex :-)

  4. mc says:

    Exercise answer: perhaps memmove is implemented as pre-processor macro that gets expanded into inline code which does a memcpy?

  5. The MAZZTer says:

    I imagine memmove is implemented as a macro that calls memcpy followed by ZeroMemory or something.

    1. Lars Viklund says:

      Contrary to what the name may lead you to think, it has nothing to do with erasing the old storage.
      memmove is like memcpy in that it copies one range of memory to another range, but handles the case where the two ranges overlap, correctly.

  6. DoomMuffins says:

    I’m more inclined to say it’s a tail-call optimization rather than a compiler optimization (i.e. memmove performs a jmp rather than a call to memcpy in certain cases).

  7. Mike says:

    Exercise: memcpy and memmove is the same function.

    Microsoft early noticed that many developers failed to realize that memcpy couldn’t handle overlapping memory blocks reliably, so they simply put the memmove functionality into memcpy.

    The CRT (C RunTime) source is included in MSVC, and has so been since at least two decades.
    The only thing surprising me about it is; why isn’t this a simple ALIAS? Why the song-and-dance to assemble it twice, only with different names?

  8. Dave says:

    There’s another explanation that immediately came to mind, the memcpy() is going across one or more special-status pages, e.g. nonresident or a guard page or something. This suspends the thread the memcpy() is on while possibly lengthy fixups occur in the background, leading to the same misleading reporting of where the problem lies.

  9. Medinoc says:

    When I last checked, memmove() and memcpy() in the CRT shared the exact same source code file (with some #if around the function prototype), so they got ICF’d.

Comments are closed.

Skip to main content