Why does GetFileVersionInfo map the whole image into memory instead of just parsing out the pieces it needs?


Commenter acq responds (with expletive deleted), "the whole file is mapped into the process' memory only for version info that's certainly only a few kilobytes to be read?" Why not map only the parts that are needed? "I don't understand the necessity to map the whole file except that it was easier to write that code without thinking too much."

That was exactly the reason. But not because it was to avoid thinking. It was to make things more secure.

Back in the old days, the Get­File­Version­Info function did exactly what acq suggested: It parsed the executable file format manually looking for the file version information. (In other words, the original authors did it the hard way.) And it was the source of security vulnerabilities because malformed executables would cause the parser to "behave erratically".

This is a common problem: Parsing is hard, and parsing bugs are so common that that there's an entire category of software testing focused on throwing malformed data at parsers to try to trip them up. The general solution for this sort of thing is to establish one "standard parser" and make everybody use that one rather than rolling their own. That way, the security efforts can be focused on making that one standard parser resilient to malformed data. Otherwise, you have a whole bunch of parsers all over the place, and a bad guy can just shop around looking for the buggiest one.

And it so happens that there is already a standard parser for resources. It's known as the loader.

The function Get­File­Version­Info therefore got out of the file parsing business (it wasn't profitable anyway) and subcontracted the work to the loader.

Pre-emptive xpclient rant: "Removing the icon extractor for 16-bit DLLs was a mistake of the highest order, even worse than Component Based Servicing."

Comments (46)
  1. 7client says:

    <Thing of the day> is the worst idea EVER! Worse than <old thing>. Bring back <older thing>.

  2. Joshua says:

    Which is why GetFileVersionInfo fails when called on a binary from another architecture.

  3. Anon says:

    I lost it at "Pre-emptive xpclient rant".

  4. JackTripper says:

    And i thought that version checking was bringing my entire 7MB executable across the network because my executable has the `IMAGE_FILE_NET_RUN_FROM_SWAP` PE flag.

    The change me made to improve performance (a few thousand-fold) was not to read the version info of my file (i.e. GetCommandLine), but rather read the version of my instance (HInstance). It causes the loader to simply use the already running "me", rather than loading another copy of me (across the network).

    Checking version info of startup went from 8 seconds to 0 seconds.

    [Mapping an image does not read the entire file, so that doesn't explain why the entire image is being read. My guess is that it's the "run from swap" flag that's doing it. -Raymond]
  5. Joshua says:

    > And i thought that version checking was bringing my entire 7MB executable across the network because my executable has the `IMAGE_FILE_NET_RUN_FROM_SWAP` PE flag.

    GAAAAAAAAAAAAAAAAAAA.

    Ok that settles it. Using the loader to implement GetFileVersionInfo is now known to be a terrible idea.

  6. Chris Crowther says:

    There's no point in reinventing the wheel; especially when someone else has already done it better than you will.

  7. sajjad says:

    Mapping a file from the network (through samba) will read it to a local file before mapping it to the memory, as far as I know and tested anyway.

    NFS doesn't have the same problem/mechanism.

  8. Deduplicator says:

    @Joshua: Writing things only once whenever you can get away with it is a wonderful idea. One of the base tenets of software development is avoiding duplication, keepig only one authoritative definition of everything / only one implementation if possible. The trade-off Raymond mentioned for having less and better debugged code is certainly worth it.

  9. Clovis says:

    I very much admire that design decision. And I'm not being sarcastic, I promise.

  10. Joshua says:

    [My guess is that it's the "run from swap" flag that's doing it. -Raymond]

    No point in guessing. He said that's what does was causing it (and it's set for some reason that would be obviously necessary given context).

    [No, he said "And I thought version checking was bringing my 7MB executable across the network because my executable has the IMAGE_FILE_NET_RUN_FROM_SWAP PE flag." The implied other half of the paragraph is "But I was wrong. The reason version checking brings in the entire file is because version checking maps the entire file." The phrasing "And I thought…" implies that the thought was incorrect. -Raymond]
  11. Joker_vD says:

    @Mike Dimmick: You know what, strictly speaking, hard drives too should count as removable media. I had a wonderful experience just a couple of months ago: the SATA-cable connecting the system disk to the mainboard was defective, so every two minutes all disk operations would halt for half a minute. The results were spectacular: everything except the mouse cursor froze. Even the programs that had no business touching the disk at all… maybe not, maybe the window manager just couldn't re-draw their windows properly, no idea.

  12. Anon says:

    @Joker_vD

    Actually, SATA drives are, on some boards/drivers, treated as "Removable." I know at home, the "Remove Hardware" button includes all of my non-OS drives.

    The above is always true for eSATA.

    Unfortunately, this does mean that those drives will run vastly slower — no write caching on removable drives.

  13. Joshua says:

    @Myria: I'll bet it uses LOAD_LIBRARY_AS_DATAFILE which doesn't.

    @Anon: You know you can turn write-behind caching back on, right?

  14. Joker_vD says:

    "The file is warning the loader that it's rather transient"

    Actually, how on Earth would the file know that? The exact nature of the file's location and transport layer between that location and local RAM is completely independent of the file's contents.

  15. alegr1 says:

    Here is my take on the whole SWAPRUN controversy.

    Some executables (mostly installers) are designed to run while the source media is being swapped. Also, they may not be designed to handle paging failures while they read the embedded data, and may produce a corrupted install if that happens. So Microsoft came with the flag that guaranteed that the page-in failure will never happen. The usage pattern that needs that flag is completely up to the program itself.

    Then, during Vista development, some guy (perhaps from the crowd who is overly concerned with FILE_ATTRIBUTE_OFFLINE) thought that the SWAPRUN flag needs to be honored even if the file is being mapped by some third party for resource extraction. Which is bad idea, because such mapping is typically done for short periods of time anyway, and the caller can very well (and MUST anyway) handle page-in failures.

    Microsoft, please, reverse that nonsense and ignore SWAPRUN if the file is not loaded for execution.

  16. alegr1 says:

    @IanBoyd:

    "So who cares if an image is simply "mapped" into my 2 GB address space? Is the use of 7 MB of virtual memory for 300 ms so terrible?"

    What if it's a 700MB setup.exe being opened for icon extraction by Explorer?

  17. alegr1 says:

    Note that use of mapping instead of ReadFile does not save you from exploits. On the contrary, it allows for quite exotic exploits, such as touching random memory (other thread stack guard page, for example), if the parser is not written carefully.

    [Of course. But now there's only one parser you need to debug instead of two. (And the mapped parser is needed anyway for other reasons.) -Raymond]
  18. yuhong2 says:

    Reminds me of why they restricted loading of keyboard layouts to system32 for Vista, which they also later backported to XP/Server 2003 in a security update.

  19. Joshua says:

    > What if it's a 700MB setup.exe being opened for icon extraction by Explorer?

    If you're using 32 bit Windows & stock resource segments, no icon for you.

    This is one of the reasons I much prefer my custom resource format for this kind of thing (very large resources in EXE for whatever reason). They aren't in any PE section and so don't get mapped. In case you were wondering how this works, I used spare bytes in the MS-DOS stub program (not the MZ header — the program) to contain the pointer; therefore we can trivially say there are no compatibility issues with this as the stub program will never be executed anyway.

  20. Crescens2k says:

    @alegr1:

    "What if it's a 700MB setup.exe being opened for icon extraction by Explorer?"

    It's ok, I use Windows 8.1 x64, Explorer has 128TB user address space.

    But on a more serious note, you'd expect the loader to be able to handle things like that and these particular functions could deal with it. MapViewOfFile and the underlying Sections don't require the entire file be mapped all the time. So the loader could be told, "I want this information", it then parses the executable only mapping in what it needs if the file is too large and then extracts the information.

    Remember a few years ago with one of the Visual Studio updates that came as a large Windows Installer package? Because Windows Installer tried to map the entire package into memory at once to do a hash check, it was failing. That was fixed in the next version of Windows Installer though. But would the fix be to stop the hash check? No, the simplest way would be to only map smaller sections of the file.

    Either way, the executable loader has these things available to it, and because it isn't being loaded for code, it can take some steps to reduce address space usage if needed.

    Of course this is just speculation that I made up right now, but I have downloaded patches that were distributed as self extracting archives, and these never failed to extract the icons, even on 32 bit systems.

  21. Mike Dimmick says:

    Normally, mapping the whole file would not cause the whole file to be read. Only the touched pages would be read. However, if it's on removable media (and networks should be counted as removable!), you can get page fault exceptions if the media has been removed and a previously unread (or discarded) page is touched.

    IMAGE_FILE_NET_RUN_FROM_SWAP says 'hey, this file might go away, so read it all into (swap-file backed) memory in case it does'. So the loader does what it was told.

    It might be nice if there was a flag value for LoadLibraryEx that told it 'I don't care about this file going away, ignore that switch' but, to be honest, that sounds like it wouldn't meet the bar for new features. You could do that if one of the LOAD_LIBRARY_AS_DATAFILE or LOAD_LIBRARY_AS_IMAGE_RESOURCE flags is set, but that's potentially a behaviour change that could crash an application depending on the feature, so a separate flag would be better.

    It's not particularly clear why you might use LOAD_LIBRARY_AS_IMAGE_RESOURCE instead of/as well as LOAD_LIBRARY_AS_DATAFILE. I *think* it means that Windows doesn't bother aligning each section of the file to the appropriate offset, it just maps the file as a single view. PE files omit unnecessary chunks of zeroes – sections start on a 512-byte boundary, but the processor requires that code + data with different access permissions start on 4KB boundaries.

  22. alegr1 says:

    >"What if it's a 700MB setup.exe being opened for icon extraction by Explorer?"

    >It's ok, I use Windows 8.1 x64, Explorer has 128TB user address space.

    And then wait while the whole file gets read from network or DVD, because Vista+ wants so.

  23. Myria says:

    Does it apply relocations to the image?  Because *that* would be dangerous.

  24. Euro Micelli says:

    @Mike Dimmick: It might be nice if there was a flag value for LoadLibraryEx that told it 'I don't care about this file going away, ignore that switch'

    I don't agree. If the file says it might go away, the loader has to assume it might go away. What if it *does* go away?

    The file is warning the loader that it's rather transient. There is a lost-optimization cost to that fact, and you're going to have to pay for it. If you don't like the cost, you need to make sure the file is "non-transient", not allow remote code to override the warning.

    Besides, that would just start the settings race again: "I need a setting in the PE that says 'no, really, I truly might go away, so ignore requests to ignore IMAGE_FILE_NET_RUN_FROM_SWAP'"

  25. IanBoyd says:

    I *used* to think that "IMAGE_FILE_NET_RUN_FROM_SWAP" was the cause of the version check loading the entire executable across the network. I used to think that until this morning, when i read today's blog entry. Today's blog entry indicates that an entire executable is mapped into my address space in order to do a version check.

    "Oh", i thought. "Perhaps it wasn't my use of swap-run after all. I guess it's actually because the loader insists on loading the entire image".

    So, i *used* to think that it was the fault of IMAGE_FILE_NET_RUN_FROM_SWAP. But after this morning, i assumed it is the loader's fault. Case closed, right? No:

    > Mapping an image does not read the entire file, so that doesn't explain why the entire image is being read.

    What? But you just said….!

    So now i'm left trying to reconcile what this blog entry means:

    – the loader maps the whole image into process memory

    – mapping an image does not read the entire file

    Oh, wait. It *maps* the image into my address space, but does not actually need to access the network until there is a fault on pages. On the other hand, the use of IMAGE_FILE_NET_RUN_FROM_SWAP causes the entire image to be copied across the network, stored in swapfile space, and then the image is mapped into my process's virtual address space. So that makes sense, and seems reasonable.

    So who cares if an image is simply "mapped" into my 2 GB address space? Is the use of 7 MB of virtual memory for 300 ms so terrible?

  26. alt-92_ says:

    Or, like any sane person would (those with /dev/brain mounted rw) copy that 700MB file to local disk and THEN execute actions.

  27. Muzer_ says:

    Joshua is the new xpclient.

  28. alegr1 says:

    @alt-92:

    So to just browse a network/CD folder, you'd need to copy it to the local drive first? Why not just use command line, like in old good times?

  29. voo says:

    @alegr1: "And then wait while the whole file gets read from network or DVD, because Vista+ wants so."

    You mean because the executable itself specified it wanted so? There's a really easy solution.. don't set the flag if you don't want it! Or are we complaining now about giving programmers too many options?

  30. Dylan says:

    @Euro Micelli

    >If you don't like the cost, you need to make sure the file is "non-transient", not allow remote code to override the warning.

    >Besides, that would just start the settings race again: "I need a setting in the PE that says 'no, really, I truly might go away, so ignore requests to ignore IMAGE_FILE_NET_RUN_FROM_SWAP'"

    Overriding swaprun wouldn't be an assertion of "the file won't go away".  It would be an assertion of "I don't care that the file might go away".  There would be no arms race.  Oh, the file really truly might go away?  STILL don't care, don't load the entire thing just because it was mapped.

    And as far as cost, take into account that the file might go away while it's being copied into memory, so if you scanned version info and unmapped it would actually have a smaller risk window than copying megabytes.

  31. alegr1 says:

    @voo:

    >You mean because the executable itself specified it wanted so?

    Let's refresh our knowledge. The option is called /SWAPRUN, which means *run* from swap. The documentation for the option says the executable will be loaded to the swap if it's *run* from a CD or a remote share. A mapping of the file for an icon extraction doesn't constitute running.

  32. nksingh [msft] says:

    @Mike Dimmick

    > It's not particularly clear why you might use LOAD_LIBRARY_AS_IMAGE_RESOURCE instead of/as well as LOAD_LIBRARY_AS_DATAFILE.

    I'm pretty sure that there's an advantage to using LOAD_LIBRARY_AS_IMAGE_RESOURCE if the relevant module also contains code that is likely to be loaded at some point. As you mentioned, the section alignment is different for a flat-mapped file versus an image-mapped file. The memory manager will be able to reuse existing image pages if you pass at least LOAD_LIBRARY_AS_IMAGE_RESOURCE. It seems like using both flags together only has a special meaning in a Protected Process.

  33. Crescens2k says:

    @alegr1:

    But isn't that also a case where /SWAPRUNANDGETMETADATA and IMAGE_FILE_NET_RUN_AND_GET_METADATA_FROM_SWAP are overly large?

    While it is true that it does state run, it really doesn't state what the side effects of that flag actually are. I feel this is like complaining that copying a file copies the file metadata too, or doing a full format also checks for bad sectors. These commands don't state that they do them after all.

    But I guess the change is what causes you to complain since if Windows always did it, you would have just dealt with it.

  34. Neil says:

    I know someone who wanted to run a 16-bit application on Windows 7 64-bit, so we dutifully installed XP Mode.

    XP Mode wasn't able to extract the application's icon and use it in the XP Mode Applications section of the Start menu. Instead we got a generic XP Mode icon.

    We went on to find that if the XP shortcut specified a custom icon from a .ico file then XP Mode would simply fail to integrate the shortcut.

    I considered writing a 32-bit "loader" application with the icon in its resources that would load the 16-bit application but starting 16-bit applications is apparently nontrivial and we didn't think it was worth the effort just to see the right icon.

  35. 640k says:

    @Joker_vD

    If you treat local storage as removable, then why not treat all storage as removable, like ram disks. I've seen it happen. But then the flag is useless because all things are special. What should be done instead is to optimize for the common case. That's the correct approach.

    Then of course, as usual there is always the problem with sloppy developers which only wants to develop for permanent attached storage and don't want to spend a second on, or write a single line of code to account for, removable media. It think that's the biggest problem. As usual.

    The problems with applications that are freezing could probably be blamed on sloppy developers in general, and does probably not depend on sata drives being treated as removable or not. Explorer.exe is a joke when it comes to robust programming, it depends on perfect hardware way to much.

    @Raymond

    DRY is generally a good principle, but you don't fool anyone into believing that windows binaries doesn't get 10x more bloated with every refactoring, no matter which design pattern is used. Please show me smaller binaries and smaller memory footprint, then I believe you. It will not happen.

  36. John Doe says:

    I don't get it, couldn't the parser be the same without having to map a whole file in memory?  Now, really, Isn't It Obvious™, since you chose to have a single parser, that you could have gone to the trouble of keeping the streaming/file positioning, or even making one, just because of this case?

    [Sure, you could write a parser that supported both mapped mode and ReadFile mode, but then you basically created two parsers. -Raymond]
  37. acq says:

    Thanks Raymond. I understand now fully. I'm still curious though if "the hard way" initial implementation was really "beyond repairing" or was it more a "we do now things differently by policy" decision. I can imagine that the initial implementation was from the times before the security initiative and nobody wanted to spend the time analyzing the "classic" code and implementing all the SafeXXX calls.

    [The old way wasn't beyond repair, but it created an ongoing support burden. Somebody has to update it to support 64-bit binaries, for example. And leaving the custom parser means risking that there is some security flaw still lurking. (The code was already converted to use SafeXXX. But there are other flaws that could occur, like an offset that reaches beyond the end of the binary, or an offset that is not properly aligned.) -Raymond]
  38. carbon twelve says:

    Thanks Raymond! I really enjoyed reading your post. It is baffling to me how the comments thread has become a flame war!

    It surprises me that so many people think that this is an issue in practice; surely they have used 32bit Windows and recall seeing icons and .exe details in explorer without any issues?

    This post appears to have evoked hysteria. For example:

    Professor: "Stars have finite lifetime"

    Student: "But, I know the Earth orbits a star, and the Earth is /very/ old — so our star must not exist anymore, or be about to die — we're all going to die!"

    See what I mean? People, having heard about the mechanism, are going nuts about associated problems that somehow fail to manifest themselves.

    Anyway, Raymond, I hope you don't stop posting on such "controversial" topics, it's very interesting. On that subject, I miss the DRM posts. :(

  39. Anon says:

    @Joshua

    I know write-behind can be re-enabled. I'd rather have my data not go missing when a Removable device is Removed.

  40. John Doe says:

    @Raymond, not exactly.  The same parser could be used, only the reader had to be abstracted.  The loader's reader would fetch bytes from memory and update a memory pointer, while the GetFileVersionInfo's reader would use ReadFile[Ex] and SetFilePointer[Ex].

    Actually, if the loader, or any other component for that matter, will ever need to know version info before or without actually mapping the file, it's more than a good enough reason to only keep a file-scanning reader (remember, the parser would still be the same, only the in-memory reader would be gone).  And there's no need for a time machine to reason up to this point, but it might be needed if someone wants to fix tight schedules or lazyness.

    I share the feeling that this kind of posts is quite clarifying and indicative of the line of thought behind implicit historic reasons which we must keep dealing with.

    I don't share the feeling that they were always good calls, even for their time.

    [It's a little messier than just mapping pointer access to ReadFile. You have to double-buffer reads, so you would have to double-buffer everything, at which point you lose the benefit of memory-mapping. (Aside from the fact that if the DLL is loaded for execution, it is already mapped.) It would also be quite a disruptive change to the loader, changing the way it accesses memory. -Raymond]
  41. Myria says:

    @Crescens2k: I'm impressed; you're one of the few who's also noticed that Windows 8.1 x86-64 increased the user-mode address space from 0x800`00000000 to 0x8000`00000000.

    There's a benefit to not supporting machines that don't support cmpxchg16b. =^-^=

  42. Klimax says:

    @640k

    "DRY is generally a good principle, but you don't fool anyone into believing that windows binaries doesn't get 10x more bloated with every refactoring, no matter which design pattern is used. Please show me smaller binaries and smaller memory footprint, then I believe you. It will not happen."

    And basis for your assertion is what? Feelings? Windows 8.0 and onward. That should be enough. (Although I am pretty sure I could start with Windows 7…)

  43. yuhong2 says:

    @Myria: What is funny is that support.microsoft.com/…/lifecycle-Windows81-faq claims that "Windows 8.1 does not change any hardware requirements compared with Windows 8 or Windows 7"

  44. 640k says:

    @Yuhong: For starter, Windows 8 requires the hardware to be windows 8 certified to be able to use some features.

  45. John Doe says:

    @640k, that's almost like saying that if you want to use a computer, you need a computer.  Or a closer call, to use a <foo>, you need a <foo reader/interface device>, like "to hear sound or music, you need a sound card connected to a headset or a couple of external speakers," or "to have a multi-touch interface experience, you need a multi-touch screen."

    But at the end of the day, if your machine runs Windows 7, it runs Windows 8.1 as well.  Now, you could say that the modern interface is quite a hurdle for a keyboard & mouse setup, but that's beyond the main point, particularly for servers.

  46. EricLaw [ex-MSFT] says:

    The IE team used to get burned by the SWAPRUN flag all the time, although it took us quite a while to realize why. We used to keep all of our installers on a network file share; directly invoking one from a hyperlink was fast, but if you opened Explorer to the share it took *forever*. Nobody bothered to look at why for over a year (blaming our poor lab team). When we finally looked, we realized that Explorer was extracting the icon from every one of the 20mb-45mb installers in the folder. It was generating hundreds of megabytes of network traffic just to display the default Shell View.

    We considered removing the SWAPRUN flag from our installers and replacing it with some code we'd found in Microsoft Systems Journal from the 90s (which predated the SWAPRUN flag) but ultimately chickened out.

Comments are closed.