If relocated DLLs cannot share pages, then doesn’t ASLR cause all pages to be non-shared?

Commenter Medinoc wonders whether it's still the case that relocated DLLs can't be shared in memory. If so, then doesn't ASLR cause all pages to become non-sharable?

There are multiple things in play here. We'll take them up in historical order, but I'll start with Windows NT 3.1 instead of Windows 95 because I already discussed Windows 95 a while back.

Windows NT 3.1 tried to load DLLs at their preferred address. If that happened, then the pages were demand-paged from the executable on disk, and if multiple processes loaded the DLL at the preferred address, then the memory was physically shared.¹ On the other hand, if the DLL could not load at its preferred address, then fixups were applied to the entire DLL to relocate it, and the relocated DLL was dumped into the pagefile, and not only did further demand paging come from there, but that relocated copy was not shared between processes.

In other words, if two processes both loaded a DLL, and the DLL got relocated in both of the processes, and it got relocated to the same address in each process, there would nonetheless be two copies of the DLL in the page file, not one copy that was shared between the two processes.

The reason for not sharing the pages in this case is that the likelihood of all the stars aligning is relatively low. Under the Windows NT 3.1 model, each process did its own relocating, and each process chose where the DLL would get relocated to. The likelihood that two processes would both load the same DLL, and have the same virtual memory layout so that they would choose the same relocation destination were relatively low, so the benefit of getting the processes to coordinate among themselves was not worth the effort.

And then ASLR showed up and changed the cost/benefit calculations. With ASLR, DLLs are being relocated constantly, and if the old rules were followed, there would be as many copies of a DLL in the page file as there were processes that used the DLL. This was clearly not a good thing.

The solution is that when a DLL is loaded, ASLR chooses a random destination address, but it then remembers that address for future use, and if another process loads the DLL, the kernel will try to use the same destination address for the DLL in that other process. This means that if two processes load a DLL, that DLL will probably get the same destination address in both processes, which establishes one of the prerequisites for sharing.

ASLR goes further. The kernel doesn't even bother fixing up the entire DLL and dumping it into the page file. Instead, it fixes up the DLL on the fly as it is loaded (stealing a trick from Windows 95), and shares the fixed-up pages.

Another way of looking at this is that the kernel is pretending that the preferred address of the file on disk happens to have matched the ASLR-chosen address all along. It carries out this ruse by patching the bytes of the file as they are read off the disk.

¹ For simplicity of exposition, let's assume that nobody changes page protection. If you are smart enough to ask, "What if somebody changes the page protection?" then you are smart enough to know the answer.

Comments (21)
  1. jspenguin says:

    Does it still have to do fixups on x86-64, which was designed to do position independent code? On Linux, all shared objects are PIC, so they can share physical pages at different addresses.

    The Windows approach has the downside that if one service that loads a DLL has a vulnerability that reveals the base address of that DLL, it can be used to help compromise another service that loads the same DLL.

    1. There are still fixups in the data segment.

      1. Joshua says:

        But I compile without fixups. I wonder if the loader is smart enough to figure out it can share pages across addresses in that case.

      2. Mike says:

        you mean just the writable data segment right ? in most ELF systems, the text & read-only data pages are shared regardless of mapping address. obviously the writable data is going to be unique to every process. the blog says things like “entire DLL” which makes me think DLLs aren’t even PIC which would be crazy (as jspenguin highlights).

        1. The read-only data segment can contain fixups. (And Windows tries to share writable data as much as possible. See copy-on-write.)

  2. Myria says:

    Yep. I reverse-engineered this and noticed how there are two relocation engines, one in ntsokrnl.exe and one in ntdll.dll. If the ASLR-chosen address is unavailable at load time, the user-mode (ntdll.dll) relocation engine relocates the DLL, marking the pages writable, which triggers a copy-on-write.

    An interesting side effect of the kernel-mode relocation engine is that when the kernel does ASLR relocation, it updates IMAGE_OPTIONAL_HEADER::ImageBase in the loaded image, simulating that the image wasn’t relocated at all. ntdll.dll doesn’t update ImageBase. Thus, you can distinguish the two types of relocation at runtime by checking ImageBase against the disk file.

    One thing I haven’t figured out about ASLR is how the demand-paging system Raymond mentioned handles relocations that straddle a page boundary. Relocations themselves belong to a page, but the size of a relocation can be up to 8 bytes (since Itanic sank, anyway), and there’s no restriction about a relocation crossing into the next page.

    1. Joshua says:

      What you do is you fuzz your fixup range. If the previous page was already paged in you take addresses that straddle the boundary. If the previous page was not already paged in you don’t. Same for next page.

      The Windows 95 solution is inefficient.

      1. smf says:

        x86 is little endian so the values in the previous page are very important to fixing up the bytes in the faulted page.

        i.e. you won’t know xx…. $7fxx+$1234=??xx

        The potential inefficiency in windows 95 is that they then fixup the previous page as well. It’s a page that hasn’t been faulted in yet, so it may not be worth keeping in memory. Fixing it up may be pointless as it may never be used, but as the disk is slower than cpu and memory then it’s better to keep it fixed up than throwing it away.

        1. Joshua says:

          The trick is to defer that particular fixup until paging in both pages.

          1. How do you defer a fixup? Do you just leave it not-fixed-up and hope nobody reads from that byte of memory?

          2. smf says:

            I think he would leave the page set to fault so you could catch it again and again. I don’t think he realises that the overhead of faulting on every access to the page and tracking whether it’s been fixed up yet is going to be much higher than the time to read the previous page.

          3. smf says:

            Hoping that nobody will read the straddled fixup until the previous page is read in could work in a lot of circumstances, code especially is unlikely to be executed in the middle of an instruction. For data it’s like Russian roulette. Another example of “I can speed this program up if it doesn’t have to work properly”, you can make all programs run in zero time if they don’t have to work properly.

          4. > I think he would leave the page set to fault so you could catch it again and again.

            But how would you execute from the page? There’s no such thing as execute-no-read. If you grant execute, then you implicitly grant read, at which point you lose the ability to trap accesses outside the fixup range.

      2. smf says:

        If you’re suggesting only fixing up the spanning address if that is the address that is being read then not only would you have to keep track of what parts of the page have been fixed up but you would also to repeatedly faulting on the page until it is completely fixed. That would be very inefficient.

    2. alegr1 says:

      Can you explain what’s the problem? Does it happen with 32-bit executables, or with x64 only?
      Windows 8+ (or even 7+) requires a processor that can do atomic 64 bit writes. A write that crosses page boundary will not begin untl both pages are present. A write across two cache lines will not be atomic, anyway, with paging or not, so the point is moot.

      1. smf says:

        There are (or were) cache coherency issues if you were writing to different entries of the same cache line. You’re thinking about writing to the fixed up data that straddles the boundary, but you could have written to the byte that followed.

        For data you have no guarantee that the program will fault the previous page in before reading part of the value that crossed the boundary. You can assume nobody would do anything like that, but you can’t know.

        i.e. if x crossed the boundary then doing something like this would non-deterministic.

        void *x = (void *)&main;
        printf( “%02x\n”, ((unsigned char *)(&x))[3]);

  3. Joshua says:

    smf guesses more or less correctly. I’ve never seen a case of either trying to read half an instruction offset (I’ve seen trying to pull the offset out but that’s guaranteed to fault anyway) and I’ve never seen a case of a misaligned fixup in a data page (can you even do that?).

    1. smf says:

      I don’t see why fixed up data couldn’t straddle a page boundary, (i.e. you’re using #pragma pack(1) and have some pointers in your data).

      There may be some data cache issues, especially with multiple cores with doing what you suggest. It feels very icky.

    2. Here’s an example of reading half an instruction: Checksumming pages.

  4. Peter Newman says:

    That seems like a flaw in the ASLR implementation. An unprivileged program could load a DLL (that respects the programs privileges), then use the address information when exploiting a locally executing privileged program.

    Small attack surface that’s probably worth the trade off, though.

  5. Medinoc says:

    Thanks for the answer.

Comments are closed.

Skip to main content