You can’t rule out a total breakdown of normal functioning, because a total breakdown of normal functioning could manifest itself as anything


A customer was attempting to study a problem that their analysis traced back to the malloc function returning NULL.

Is it a valid conclusion that there is no heap corruption?

While heap corruption may not be the avenue of investigation you'd first pursue, you can't rule it out. In the presence of a total breakdown of normal functioning, anything can happen, including appearing to be some other type of failure entirely.

For example, the heap corruption might have corrupted the bookkeeping data in such a way as to make the heap behave as if it were a fixed-sized heap, say by corrupting the location where the heap manager remembered the dwMaximumSize parameter and changing it from zero to nonzero. Now, the next time the heap manager wants to expand the heap, it sees that the heap is no longer expandable and returns NULL.

Or maybe the heap corruption tricked the heap manager into thinking that it was operating under low resource simulation, so it returned NULL even though there was plenty of memory available.

Remember, once you've entered the realm of undefined behavior, anything is possible. Heck, one possible response to heap corruption is the installation of a rootkit.

After all, that's how more advanced classes of malware work. They exploit a vulnerability to nudge a process into a subtle failure mode, and then push the failure mode over the edge into a breakdown, and then exploit the breakdown to get themselves installed onto your system, and then cover their tracks so you don't realize you've been pwned.

Maybe the heap was corrupted in a way that cause a rootkit to become installed, and the rootkit patched the malloc function so it returned NULL.

Like I said earlier, the possibility of heap corruption is probably not the avenue I would investigate first. But you can't rule it out either.

Bonus chatter: Since heap corruption can in principle lead to anything, any bug that results in heap corruption automatically gets a default classification of Arbitrary Code Execution, and if the heap corruption can be triggered via the network, it gets an automatic default classification of Remote Code Execution (RCE). Even if the likelihood of transforming the heap corruption into remote code execution is exceedingly low, you still have to classify it as RCE until you can rule out all possibility of code execution. (And it is extremely rare that one can successfully prove that a heap overflow is not exploitable under any possible conditions.)

Comments (16)
  1. I read this attempting to be as open-minded as possible, because it sounds foolish to start with such a terrible assumption.  It's like starting a logical analysis of military options after an ambush with the presumption that the apocalypse is occurring.

    But, whenever there's a bug in my code I always find it tempting to believe that the provided resources are not functioning as defined, when in fact 99 times out of 100 they are and it's my fault.

  2. Adam Rosenfield says:

    Even something like writing a single zero byte out-of-bounds, doing something like a printf, and then restoring that byte to what it was before can be a serious security vulnerability.  See CVE-2001-0279 and the analysis in section 2 of this article: http://www.phrack.org/…/p57_0x08_Vudo%20malloc%20tricks_by_MaXX.txt .

  3. henke37 says:

    In fact, it is more common that people prove that a "less harmful" vulnerability can be used for code execution than the opposite.

  4. Brian says:

    When debugging, I always assume the problem is caused by cosmic radiation until I'm proven otherwise.

  5. DaveR says:

    I am always amazed at how many programmers don't understand the implications of "undefined behaviour".  They don't get that anything can happen.  While it might appear to do the same thing each time, it doesn't mean that they can depend on that behaviour.

  6. [Since heap corruption can in principle lead to anything, any bug that results in heap corruption automatically gets a default classification of Arbitrary Code Execution]

    How do you classify stack overflows? IE8 and IE9 are giving me stack overflows in MSXML.DLL (runaway recursion).

  7. Matt says:

    @Raymond:

    "And it is extremely rare that one can successfully prove that a heap overflow is not exploitable under any possible conditions"

    That's because under almost any conditions a heap-overflow actually IS an exploitable bug, that actually CAN be turned into installation of a rootkit. This isn't security experts being nampy-pamby and hedging their bets by saying "it MIGHT cause code-execution". Chances are, if there's a heap overflow involved, someone at some point actually WILL turn it into an exploit to install rootkits on your unpatched systems.

    Memory corruption of any sort, any time is always a critical bug that you must stop everything right now to fix. It is never acceptable to leave memory corruption bugs as "acceptable risk" or to say "we'll fix that, right after we finish this feature".

  8. Matt says:

    @alegr1: Stack buffer overwrites are "Arbitrary Code Execution". Stack exhaustion bugs when compiled with all of the options in Visual Studio set to default and in Microsoft libraries since Win Vista are "Denial of Service". Other stack overflows might be "Arbitrary Code Execution".

  9. Joshua says:

    @alegrl: Stack overflow is just a crash thanks to guard page.

    We had a stack buffer overflow we classified as not exploitable (just a crash) because there was no way to write past the NUL sentinal without being caught and the program falling on its sword immediately (in fact that's what was happening when we discovered it–one of our customers managed to hit the bug).

  10. Henning Makholm says:

    @Matt: Sure — as soon as we figure out where the memory corruption happens. If all we're seeing is unreproducible weird effects that we attribute to "prior memory corruption by unknown causes" by a process of elimination, then often there is not much more to do than shrug and hope for better clues to turn up later.

    After all, it is awfully hard to rule out cosmic radiation, in the technical sense of "anything that randomly alters a program's memory behind its back" — such as an already-present malware infaction that (for its own murky reasons) attach to our process as a debugger, changes a random byte somewhere, and then scoots away. Or, for that matter, actual cosmic rays.

  11. Matt says:

    @Joshua: It's only a DoS if you're smart and are working with all of the security options on a modern compiler like Visual Studio. Not all stack exhaustion vulnerabilities are unexploitable (http://www.exploit-db.com/…/17784.pdf). In Visual Studio it is the "chkstk" function that is turning what would be a RCE into a DoS.

    @Henning: That is why you design your programs to fail fast when they go wrong. Cosmic radiation causing your bugs is vastly less likely than because your code is crappy.

    [You're not allowed to jump any stack pages (guard page or otherwise) in Win32, so any compiler that didn't do the chkstk would have crashed even under normal operation. -Raymond]
  12. JDF says:

    This puts me in mind of Tilton's Law: Solve the First Problem.

    Story here: smuglispweeny.blogspot.com/…/tiltons-law-solve-first-problem.html

    Favorite quote: "This stuff is hard enough to get right when things are working nominally, but once they go wrong we no longer have a system that even should work."

    I have referred support techs to this article over and over in the last few years, usually after hearing "This failed, then that failed, and now X isn't working. I'm trying to solve X by frobulating the sprocket…" No, stop right there, I won't help you frobulate, you need to fix the first problem!

  13. Alexey says:

    A post most appropriate in view of Mark Russinovich's new novel's recent release.

  14. Cheong says:

    I think "cosmic radiation" is a bit over… "memory chip went bad" is more plausible explanation.

  15. Matt says:

    @Raymond

    >>> "[You're not allowed to jump any stack pages (guard page or otherwise) in Win32, so any compiler that didn't do the chkstk would have crashed even under normal operation. -Raymond]"

    The key there being Win32. Stack exhaustion bugs on other platforms are not always safe, so an assertion that stack exhaustion bugs are always a DoS isn't true in the general case.

    cansecwest.com/…/memory_vulns_delalleau.pdf:

    Linux 2.4 (SAFE)

    ► Linux 2.6 (UNSAFE)

    ► FreeBSD 5.3 (MMAP UNSAFE)

    ► OpenBSD 3.6 (SAFE but…)

    ► Linux emulation on FreeBSD 5.3 (UNSAFE)

    ► Linux emulation on OpenBSD 3.6 (SAFE but…)

    ► Solaris 10 / x86 (SAFE)

    ► Solaris 9 / Sparc (SAFE)

    ► Windows XP SP1 (SAFE)

    ► Any OS with certain uncommon threading libraries (UNSAFE)

    Even in Win32, if you're using a library that does stack-copying (e.g. some implementations of fork on Win32), they just allocate a lot of stack and don't bother with a stack-guard page. So even on Win32 these might be exploitable.

    As a general rule – if it's a memory corruption, just fix it. Don't argue about whether it's exploitable. Just fix it.

    [My point was that chkstk is not a "security option" in Visual Studio. There is no way to turn it off (it's not an "option") because it is required for correct functioning. -Raymond]
  16. Gabe says:

    The worst thing about heap corruption is that it usually manifests itself as normal operation. I would suggest that it's actually quite rare for heap corruption to ever manifest as a bug.

    In fact, maybe that's the second-worst thing about heap corruption. I guess the worst thing is when heap corruption is what *causes* normal functioning, as in fixing the heap corruption causes the program to stop functioning normally. For example, you might be writing something out of bounds of an array that causes the same memory block to get returned from malloc over and over again, masking the fact that you never free it.

Comments are closed.