Just follow the rules and nobody gets hurt


You may have been lazy and not bothered calling VirtualProtect(PAGE_EXECUTE) when you generated some code on the fly. You got away with it because the i386 processor page protections do not have a "read but don't execute" mode, so anything you could read you could also execute.

Until now.

Starting with Windows XP Service Pack 2, on processors which support it (according to the web page, currently AMD K8, Itanium, and AMD64), the stack and heap will not be executable. If you try to execute the stack or the heap, an exception will be raised and the code will not execute. In other words, execute page protection will soon be enforced, now that processors exist that support it. (Actually, I believe Windows XP for Itanium already used this new protection level, so those of you who have been playing around with your Itanium may have seen this already.)

If you were a good developer and followed the rules on page protections, then this has no effect on you. But if you cheated the rules and took advantage of specific hardware implementation details, you may find yourself in trouble. Consider yourselves warned.

Comments (16)
  1. Karl says:

    I updated all of my code that creates and executes other code dynamically the other day, so I’m already ready for this.

    I considered writing you a few weeks ago wondering why PAGE_READWRITE acted as PAGE_READWRITE_EXECUTE, but I discovered the sad truth myself: the x86 doesn’t support it.

    However, I’ve come across several tricks to work around this issue. The first is to take advantage of the x86 segment registers, by making the code segment not cover the entire 4 GB address space, and putting the stack and default heap above this boundry, so it can’t be executed.

    The second is to take advantage of an undocumented quirk of Intel and AMD processors: there are actually two TLBs, one for code, and one for data. The trick is to make the data page table entries unavailable most of the time. When the CPU tries to read/write to the page, it causes a page fault. The OS then determines what operation caused the page fault. If it was an instruction fetch, it terminates the process unless the page was marked as executable. If it was a read or write operation, it creates the page table entry and resumes execution. Then, a few milliseconds later, it removes the page table entry. The PTE is cached by the data TLB, so further reads/writes to the page won’t cause a page fault, but instruction fetches will (since it’s not cached by the instruction TLB).

    Any chance SP2 will use tricks like those mentioned above to enforce page execute protection on CPUs where it isn’t explicitly supported?

    Oh, and how will backwards compatibility be dealt with? I imagine this will break a LOT of apps.

  2. Smack says:

    Are there actually programmers that follow the rules? I thought they everyone just screwed with their code until it compiled and worked well enough.

  3. Florian W. says:

    >Oh, and how will backwards compatibility be dealt with? I imagine this will break a LOT of apps.

    May be I’m wrong, but every ATL based window application would fail, because MS does such a trick inside message dispatching.

  4. runtime says:

    OpenBSD 3.4 just added a similar feature for x86 processors. Since Intel x86 does not support per-page "read but don’t execute" mode, they use changed their system compiler to fake it using segment protection.

    i386 binaries have their executable segments rearranged to support isolating code from data, and the cpu CS limit is used to impose a best effort limit on code execution.

    http://www.openbsd.org/34.html

  5. Sean McLeod says:

    Any idea when Intel x86 CPUs will start supporting the execute permission bit? Maybe the Prescott cpus?

  6. Joe says:

    Intel CPUs do support execute protection, but only if the OS writers don’t cheat :) by using flat memory model. By deciding that VA == PA always (except when using PAE) you lose a lot of the security that the x86 gives you (but not for free, segmented memory models are a pain). Of course all modern operating systems go the flat memory-model route (although some have created various hacks and patches to make a non-executable stack that get rather complex — like the above OpenBSD mention, Solar Designer’s patch for Linux, or RedHat’s Exec-Shield).

    But I just have to say "Yay! Good on you for ignoring backwards compatibility!" I mean that seriously. Windows has suffered too much at the hands of backwards compatibility.

  7. Joe says:

    Oh, look. I just read that article and have found that the changes to RPC will kick my ass. Why must you give up on backwards compatibility?!

    I guess I can’t have it both ways. :)

  8. Mike Dimmick says:

    I posted recently on Chris Brumme’s blog about this.

    ATL uses a thunk code block allocated as part of the window object (ATL 3.1 and earlier on x86) or on the heap (ATL 7.0 and later) as the window procedure for a window. The thunk block replaces the first parameter on the stack (an HWND) with the value of the window object’s this pointer, then calls the class’s WindowProc (returned from the virtual GetWindowProc function). As such, ATL UI programs will break if the heap is protected. I expect that MS will include an AppCompat patch for this issue.

    The problem with code segments in a flat memory model is basically that your code segment would need to be contiguous, which Windows does not guarantee. Indeed, Windows fills up your process address space from the bottom (low addresses) with your executable, but from the top (high addresses) with system DLLs, with the stacks and heaps in the middle. Windows never knows how many threads you’re going to create and hence how many stacks you will have, so it can’t easily reserve space for them.

    The two TLBs solution will perform really poorly, IMO, and your ‘few milliseconds’ could be enough to still allow an attacker through. The only way you could be sure would be to single-step the processor – ouch. I’m also not sure you can rely on there being separate instruction and data TLBs.

  9. Raymond Chen says:

    Yes, ATL is probably the biggest offender. We’ve got a good amount of app compat work ahead of us. If a service pack breaks a program you use every day, your take-away is "Don’t install service packs!" which is not the message we are trying to get across.

    Taking advantage of undocumented behavior of the Intel and AMD processors is hardly something an OS should be doing. (Imagine the flamage if it Windows XP were incompatible with AMD’s next CPU – but worked on Intel’s – because it relied on some undocumented CPU quirk. Conspiracy theorists would have a field day.) It’s this sort of "well that’s how the chip works today so I’ll rely on it" shortcutting that is creating this app compat mess in the first place!

  10. SP2 is going to break *every* app that creates a window using ATL? Wow–that will be a lot of broken programs. No matter how many AppCompats you do, you are going to miss a lot of them. End users will not know why their applications have started crashing, and they will blame Microsoft for putting out a buggy SP.

  11. Frederik Slijkerman says:

    This will also break all programs created with Borland RAD tools such as Delphi and C++Builder. The VCL library creates small routines on the fly to route window procedures to method calls on objects. No doubt ATL does something similar.

    Adding the execute protection to SP2 might be something you/Microsoft may want to reconsider, since this will break a *lot* of applications.

  12. Mike Dimmick says:

    I got confused here – AMD never used K8 as a codename, but it’s typically used to refer to the first generation Opteron and Athlon 64 processors.

    According to the documentation, these processors do support a no-execute bit, but only when run in Physical Address Extension mode. In this mode, Page Table Entries and Page Directory Entries are 64 bits in size; a PDE can describe the location of a Page Table (of PTEs) for 4Kbyte pages, or it can describe the location of a 2Mbyte page. AMD have chosen to use bit 63 (the Most Significant Bit) to represent ‘No Execute’ – set to 1 for execution not allowed, 0 for execution allowed. This should therefore be compatible with PAE mode on Intel processors (these bits were supposed to be set to zero).

    I can’t remember if Windows runs itself in PAE mode if the processor supports it, or only if there is more than 4Gb of memory actually fitted. There are more structures to negotiate (one more level) in PAE mode, which I imagine slows down TLB fills slightly (although x86 processors do this in hardware, if the PTE is valid). I assume on Opteron/Athlon 64 32-bit Windows will now always run in PAE mode. It’s not entirely clear to me whether the AMD64 architecture supports DOS and Win16 – IIRC, Itanium does not support Virtual 8086 mode at all.

    Intel should be encouraged to add No Execute to Prescott (currently expected to be called Pentium 5) or the following generation, in exactly the same way as AMD have done.

    Maybe no-execute needs to work on an opt-in basis, rather than opt-out: the app tells the loader if it’s no-execute safe. Is there room in the image header for a bit to represent this? As far as I can see, bit 7 in the Characteristics WORD in IMAGE_FILE_HEADER appears to be unused.

    All images in the process would need to have the bit set in order for the stack and heap to be protected with NoExecute. You’d have to be careful with LoadLibrary still: if you loaded a DLL with this bit not set, and the process was running with No Execute turned on, you’d have to turn No Execute off as the process tried to execute pages (taking a page fault each time it tries to execute a new page). That would of course enable the attacker to get access to the stack by forcing the process to load a ‘no execute unsafe’ DLL, though…

    You’d have to extend the linker to support the new flag, and probably provide a tool for setting it as well (enhancement to EDITBIN, I expect). I would expect all binaries shipped with XP SP2 to be compiled with this flag turned on.

    Will there be some way to tell the Heap code that you want to execute from a heap? A new option on HeapSetInformation seems appropriate.

  13. Jordan Russell says:

    Frederik: It won’t break Delphi and C++Builder because they already include the EXECUTE flag (and always have).

  14. Frederik Slijkerman says:

    Jordan: You are correct, sorry for the interruption. :-)

  15. Jonathan Wilson says:

    I suspect the answer for ATL is to ship a new atl.dll with XPSP2 that contains the proper call to VirtualProtect (and to document it for those who have for whatever reasons decided to build their own atl.dll file)

  16. Raymond Chen says:

    Commenting on this article has been closed.

Comments are closed.

Skip to main content