How can I detect programmatically whether the /3GB switch is enabled?


A customer was doing some diagnostic work and wanted a way to detect whether the /3GB switch was enabled. (Remember that the /3GB switch is meaningful only for 32-bit versions of Windows.)

The way to detect the setting is to call Get­System­Info and look at the lp­Maximum­Application­Address.

#include <windows.h>
#include <stdio.h>

int __cdecl main(int, char **)
{
 SYSTEM_INFO si;
 GetSystemInfo(&si);
 printf("%p", si.lpMaximumApplicationAddress);
 return 0;
}

Compile this as a 32-bit program and run it.

Configuration LARGE­ADDRESS­AWARE? Result Meaning
32-bit Windows, standard configuration Any 7FFEFFFF 2GB minus 64KB
32-bit Windows, /3GB Any BFFFFFFF 3GB
32-bit Windows, increaseuserva = 2995 Any BB3EFFFF 2995 MB
64-bit Windows No 7FFEFFFF 2GB minus 64KB
64-bit Windows Yes FFFEFFFF 4GB minus 64KB

On 32-bit systems, this reports the system-wide setting that specifies the maximum user-mode address space, regardless of how your application is marked. Note, however, that your application must be marked LARGE­ADDRESS­AWARE in order to take advantage of the space above 2GB.

On the other hand, when you run a 32-bit application on 64-bit Windows, it runs the application in an emulation layer. Therefore, 64-bit Windows can give each application a different view of the system. In particular, depending on how your application is marked, Windows can emulate a 32-bit system with or without the /3GB switch enabled, based on what the application prefers.

Armed with this knowledge, perhaps you can help this customer. Remember, you sometimes need to go beyond simply answering the question and actually solve the customer's problem.

We would like to know how to detect from our 32-bit application whether the host operating system is 64-bit or 32-bit.

We need to know this because our program does some data processing, and we have to choose an appropriate algorithm. We have written one algorithm that is faster but uses 1½GB of address space, and we have also written a fallback algorithm that is slower but does not use anywhere near as much address space. When running on a native 32-bit system, there is typically not 1½GB of address space available, so we have to use the slow algorithm. But when running on a native 64-bit system (or a native 32-bit system with the /3GB switch enabled), our program can use the fast algorithm. Therefore, we would like to detect whether the native operating system is 64-bit so that we can decide whether to use the fast or slow algorithm.

Here's another customer question you can now answer:

We have a 64-bit program, and since we know that Windows currently does not use the full 64-bit address space, we would like to steal the upper bits of the pointer to hold additional information: If there are at least 8 bits available, we can use a more efficient data format. Otherwise, we fall back to a less efficient format. How can we detect whether the upper 8 bits are being used for addressing?

Update: Clarified the table based on misunderstanding in comments.

Comments (42)
  1. VinDuv says:

    Don’t applications also need to be marked LARGE­ADDRESS­AWARE on 32-bit Windows with /3GB to get 3GB of address space?

  2. Mark says:

    VinDuv: yes, "Note, however, that your application must be marked LARGE­ADDRESS­AWARE in order to take advantage of the space above 2GB." The "marked LARGE­ADDRESS­AWARE" in the table is only to distinguish the two 64-bit scenarios, not an exhaustive list of executable attributes.

  3. Joshua says:

    Writing 32 bit programs that use > 4GB RAM gets interesting. It's not clear what happens with AWE calls on 64 bit processors. Anyway, there's other ways.

  4. Mark says:

    If you like, read "standard configuration" as "standard configuration or application not marked LARGE­ADDRESS­AWARE".

  5. Henke37 says:

    The first one shouldn't check for large address space, just available memory.

    The second one should use a struct that contains a based pointer and the additional data they want to store.

  6. Jon says:

    The correct solution is to use VirtualAlloc to request the 1.5Gb you want.  If you get it,  use it.  If you don't,  use the fallback algorithm.

    Easier than faffing about with system configuration and O/S detection.

  7. Jon says:

    This generation of processors use 48-bits of the 64 available. So you have a short's worth to use.   You have to put the bits back though - the CPU may not use them,  but it checks them.  If the pointer isn't canonical it will raise an exception.

  8. Paul Z says:

    I'm pretty sure the answers to these questions are 1) "Just ask for your 1.5GB. If you got it, you can use it. Otherwise, probably you shouldn't." (as per Jon above) and 2) "You are a terrible person for trying to do this. Even if Windows doesn't use these bits right now, it will almost certainly want to use them one day. Please stop before you make Raymond Chen cry."

  9. Eric says:

    @Paul Z:  I think we need a "Keep your code sane" PSA in the style of en.wikipedia.org/.../Keep_America_Beautiful, with a picture of Raymond crying.

  10. Dan Bugglin says:

    Wouldn't a single 1.5gb allocation try for a single, huge block? Or am I mistaken? This may fail even if enough total memory is available. If the algorithm doesn't need a single contiguous block it shouldn't ask for one. It should allocate all its memory up front, though, and deallocate if it is failing before it finishes.

    @Jon that will work great until a customer uses a brand new processor that uses all 64-bits.

  11. ErikF says:

    For the second person, simply set your alignment to 256 bytes and voila! You have 8 bits that you can steal for pointers! ;-) Honestly though, storing things in unused pointer bits seems like a recipe for bad things happening, regardless of where you are putting the extra data.

  12. alegr1 says:

    >This generation of processors use 48-bits of the 64 available.

    Of physical address. The question was about virtual address.

  13. Gabe says:

    Paul Z: Read the question more closely. They want to detect how many bits Windows isn't using so that they can use the otherwise-unused ones. If Windows starts using those top 8 bits, they will simply use a less-efficient storage mechanism.

    The problem with their plan isn't that it will cause them to trample on valid bits; it's that they will have code that won't run in production (and hence won't be fully tested) until the day CPUs start supporting more bits of virtual address space.

  14. I'm going to echo the "just ask for the 1.5GiB of memory and make your choice accordingly" and "why the hell are you even thinking about doing that?"

    The second one sounds like a really terrible idea and inviting a whole world of future pain.

    The MAZZTer: A virtual memory allocation may not map to a contiguous set of physical memory pages.

  15. ZLB says:

    Answer to customer 1:

    Use VirtualAlloc() to try to reserve (not commit) 1.5Gb of address space. If the alloc is ok, use it and do the fast algoritm. If you want 1.5Gb of address space, just ask for it!

    Answer to customer 2:

    Do you really have to do that? Tricks like that are just asking for trouble! (and future compatibility issues, and porting issues!)

  16. ZLB says:

    An extension to my previous answer to customer 1: It may be appropiate to just alloc the 1.5Gb of address space as early as possible when the program starts if the problem is address space fragmentation rather then address space exhaustion!

  17. KyleJ61782 says:

    @Chris Crowther -

    Re your comment @The MAZZTer:

    The contiguous set of physical memory pages isn't the issue.  The issue is one of whether or not there's a large enough contiguous virtual memory span.  A single 1.5GiB allocation needs contiguous VM address space within the process, which may not be available.

  18. KyleJ61782 says:

    RE customer 2's idea:

    This is an issue that I'm running into currently.  An expression evaluator that my software is moving away from (to one that I just wrote in house) actually uses the top bit of pointers for some reason.  Oh the fun of random access violations occurring in a 3rd party DLL when /3GB is enabled.

  19. Jon says:

    @alegr1

    >>This generation of processors use 48-bits of the 64 available.

    >Of physical address. The question was about virtual address.

    Only 48 bits are used for virtual addresses too.  Bits 48->63 must be the same as bit 47 or the processor will raise #GP when the pointer is used

  20. Adrian says:

    I generally agree with the strategy of trying the allocation and then falling back if you don't get it.

    But you should also consider the circumstances.  If your calculation is in a library and it will be run in a separate thread, then you might want to think about leaving significant memory for the other threads.  The point of using more memory here is to do something faster.  If your 1.5 GB allocation leaves very little VM for other threads, you might just be slowing everything down.  So the broader context may still be a relevant factor in your decision, especially if it was a more borderline case, e.g., using 1 GB on a 3GB system is more plausible than 1 GB on a 2 GB system.

  21. Don't Do That! says:

    > We have a 64-bit program, and since we know that Windows currently does not use the full 64-bit address space, we would like to steal the upper bits of the pointer to hold additional information: If there are at least 8 bits available, we can use a more efficient data format. Otherwise, we fall back to a less efficient format. How can we detect whether the upper 8 bits are being used for addressing?

    > Only 48 bits are used for virtual addresses too.  Bits 48->63 must be the same as bit 47 or the processor will raise #GP when the pointer is used

    Does it strike you that the AMD64 designers were intentionally seeking to break harebrained schemes such as Customer #2's when they came up with the 'canonical pointers' concept?

    [This scheme is quite common in implementations of Scheme. -Raymond]
  22. Justin says:

    Apple did the latter, in the days of the 68k macs, using the top 8 bits for flags.  This obviously caused problems when the 68020 chips were used, which had 32 address lines (early ones had 24 lines).  This led to the development of the "32-bit clean" OS moniker.  (See en.wikipedia.org/.../Mac_OS_memory_management for more details).

    In short: Don't do it.

  23. SimonRev says:

    Given you are working on a 64 bit platform, it is reasonably safe to assume that you want to multiplex your data in the upper bits of the pointers for (dubious) convenience rather than space savings.  The BSTR storage scheme sounds like a better choice -- just allocate a few extra bytes and put your extra data in the first few bytes of the allocation, then increment the pointer by a few bytes and use that pointer (subject to alignment requirements, of course)

  24. Don't Do That! says:

    > [This scheme is quite common in implementations of Scheme. -Raymond]

    Yes, I know about tagged pointer types and their long history: it doesn't mean that they're a good idea, especially stuffing tags into the high bits of the pointer itself.  On the other hand, if you only need a few tag bits, encoding them into the lowest-order bits and masking on access is much less boneheaded, considering that the CPU implementation is much less likely to steal the low order bits for its own purposes.

    > The BSTR storage scheme sounds like a better choice -- just allocate a few extra bytes and put your extra data in the first few bytes of the allocation, then increment the pointer by a few bytes and use that pointer (subject to alignment requirements, of course)

    This is the way I would do type-tagged data; it also means you can't get into an inconsistent situation where you have a (object, type-a) and an (object, type-b) tagged-reference pointing at the same object, unlike tagged pointers/references.

  25. James says:

    The MAZZTer, DLLs with fixed load addresses and other fixed addresses within the program's address space mean that you might want to expect an unacceptable failure rate for contiguous requests for more than around 1.2 gigabytes on 32 bit Windows. That's the threshold that MySQL uses to switch from contiguous to split requests for its main large allocation on 32 bit Windows. You probably wouldn't want to get into discussions about rebasing DLLs to free up a larger contiguous address space.

  26. John Doe says:

    @Don't Do That!, don't do that!

    Seriously, think about the garbage collector: how can it tell, on the stack or otherwise, which is a pointer and which is a number?  You can't feasibly generalize with tagging on the object's memory location.  You may, but complementary to some tagging on the value/pointer.

    Or, are you suggesting that value types should be objects?  That will make any computer look like a primitive crank gear machine.

  27. mikeb says:

    "increaseuserva = 2995"

    The BCDEdit docs say that the value can be as high as 3072, so your table raises the question of: why 2995?

    [Because it's not 3072. -Raymond]
  28. Myria says:

    Windows 8.1 already increased the 64-bit user-mode address space by 4 bits versus Windows 8 - any programs that assumed that Windows wouldn't use address bits 43-46 in user mode surely broke with Windows 8.1.

    Incidentally, the reason Windows 8.1 was able to increase the 64-bit address space is because Windows 8.1 x64 won't run on CPUs that don't have cmpxchg16b.

  29. Silly says:

    For customer number one, MessageBox is your friend. Just ask before running the step. Or if the step runs unattended then add a checkbox to the appropriate sub-tab in the config UI Options area, or an entry in the appropriate xml/ini/custom config file (and remember to seed the value with an invalid option so as to abort the application with a numerical error code if the user hasn't explicitly chosen an algo. And also optionally reprint and distribute new  error index lookup cards).

    MB Title: Choose an appropriate algorithm. Text: We have written one algorithm that is faster but uses 1½GB of address space...

  30. Cody Gray says:

    RE: Silly

    A message box? Surely you're joking. Raymond already covered this silliness several years ago: blogs.msdn.com/.../120193.aspx

  31. ZLB says:

    I was reading about a how another OS on 64bit systems uses the top 9 bits of certain types of pointer to store the Retain/Release count for an object.

    Of course, it all falls down if the retain count is higher than 511!!!!

  32. JGG says:

    For question 1, "just trying the allocation" is a bad approach IMO. First, it has never been mentioned if they're using a contiguous buffer or not. If it is the case, just because the allocation succeeds doesn't mean it won't cause other problems later on, such as leaving too little space for the rest of the program to run with. A random allocation might end up failing with OoM exception or null pointer later down the line, and it will be very annoying to deal with. Also, depending on the nature of the program, the condition for the allocation success may also be random, which makes it even more annoying to debug or understand. If it's not contiguous, then it will fail in the middle of the algorithm with some of the allocations done, where backing out on OoM will be just as annoying.

    I think it is much better to use your upfront knowledge of the matter and do as they suggest, detect the >= 3GB addressing space and branch according on that. So checking if lpMaximumApplicationAddress >= 0xB0000000 (you could always go with the exact values but it wouldn't really matter) should be fine.

    For question 2, it seems a lot of people in the comments have not had to do hardcore optimizations on very low-level code or limited hardware, as it is a very useful trick to use in these conditions, and can cause significant speed-ups for example. Just because it is potentially dangerous if used wrong does not mean you should just blindly dismiss it altogether when the right circumstance shows up.

    And in this context, it is actually perfectly safe to use. For example, I ran Raymond's code in 64 bit, and I obtain 0x000007FFFFFEFFFF, which means you can count the number of 0 upper bits at runtime, and thus safely assume they will never be used because Windows just told you it won't use them. If you have enough, then you can use the faster branch, otherwise you stick to the slow branch. Doing so is entirely forward compatible as you make no hardcoded assumptions, you just query Windows itself for the number of free bits.

  33. Anon says:

    @JGG

    You mean unless Windows is lying to your app, which happens frequently. Maybe you're in WINE. Maybe you're in some compatibility mode.

    What happens when you write wildly popular Enterprise code which requires seven free bits, then the next version of Windows  eats three of them? Microsoft ruins everyone else's day by limiting OS capabilities after testing and discovering that Massive Customers X, Y, and Z are all using your software and refuse to upgrade their systems if it no longer works.

  34. JGG says:

    @Anon

    What happens if the next Windows version uses more bits? You fallback to the slow version of the algorithm. You can detect this at runtime, as I mentioned.

    The whole "but Windows can lie to you" argument is a slippery slope that just won't end. You have to trust something at some point. I would rather go with the innocent until proven guilty road, rather than assuming any API call can potentially lie to me or be bugged. Because the vast majority is working just fine and as intended.

    And if you run production apps on WINE, well you're asking for a whole bunch of troubles I won't get into. You're worrying about the reliability of a software running on a vastly more unreliable emulation layer...

  35. SimonRev says:

    @JGG -- the problem with your take on #1 -- there still is no guarantee that 1.5GB of RAM is available, even if the /3GB switch is on -- so you still have to deal with falling back to the slow algorithm if you cannot allocate memory. Since you have to do the one anyway, why bother with the other? Also what if you happen to get lucky on a machine without /3GB and can get your 1.5GB? Why not let the fast algorithm run?

    Now the contiguous memory question is valid -- if the algorithm doesn't require contiguous memory then you may be doing a disservice to your application to try and get a 1.5GB contiguous block. However, you could just spread that out over multiple VirtualAlloc calls.

    As to the problem #2 -- You call out "hardcore optimizations on very low-level code or limited hardware". It is safe to say that an app that requires a 64 bit desktop machine is not running on "limited hardware". This brings us back to optimizing low-level code. Since this is discussed in the context of an application and not the OS kernel or driver, we can assume that this would refer to needing to optimize code for speed. At that point, we would need to measure whether the extra memory/cache footprint required for something like the BSTR allocation scheme exceeds the cost of always having to mask off the extra bits whenever you deference the pointer. It is far from clear to me which would be faster.

    [In the case of #2, it is a scientific data processing application, which explains why it's a 64-bit application: They are processing huge quantities of data. The tight packing lets them operate on large datasets without thrashing. This can make the difference between a run take minutes instead of hours.-Raymond]
  36. Anon says:

    @JGG

    An API lying to you doesn't mean there's anything wrong with the API. In nearly all cases, Windows APIs which are lying to you are lying to you on purpose, due to precisely what I was saying.

  37. Brian_EE says:

    Raymond, How appropriate the timing on this article. I've been off work on a medical leave and decided to start from the beginning of your blog earlier this week (I came upon this about 2012-ish). I just read about 2-weeks worth of /3GB switch article from the June 2004 archive the other night.

    During that series though, you had a lot of complaint comments of the variety "enough of the /3GB articles already!"

  38. j b says:

    Re. problem #2:

    "64 kbytes should be enough for everyone..."

    In the 1970s, there was a Lisp interpreter for the 16 bit Nord-1 mini. The developers could not in their wildest dreams imagine than any machine would have more than 32K 16-bit words, so they used the upper address bit as a flag: One more level of indrection, please!

    The next machine generation, Nord-10, came with a fancy memory management system (for its time and machine class), interpreting all 16 bits. The Lisp interpreter was never ported to the new generation....

    A closely related note: The Univac 1108 mainframe had a similar addressing mode which was supported by hardware: If the top address bit (of 36) was set, the addressing hardware would indirect one level more. You could make the world's tightest infinite loop by setting a pointer (with this bit set) to point to itself. All that was require for the machine to lock up was an operand fetch - not a single instruction was executed! And there was more: On the 1108, an operand fetch was uninterruptible. (At my university, they did everything to keep this secret; the majority of reboots of that huge mainframe was caused by students who wanted to try out if it really was true that you could block the machine completely with a single, unprivileged machine instruction. You could.)

  39. Evan says:

    @jb: "All that was require for the machine to lock up was an operand fetch - not a single instruction was executed!"

    It is possible to do arbitrary computation on the x86 MMU: you can get the equivalent of a "decrement and brannch if less than" or something, which is sufficient for turing completeness: http://www.youtube.com/watch?v=NGXvJ1GKBKM

    (That's a long video [hopefully the link is right], but you can skip about 25 minutes of it if you're familiar with, e.g., how the PaX NX-bit emulation works.)

  40. JGG says:

    @SimonRev : I am assuming they are already able to run the full program well within 2GB, otherwise the 3GB would have been a requirement and the question would be irrelevant. I am also assuming they are using enough memory that 1.5gb allocation is likely to fail, otherwise they would probably not ask either. Sure, there is no hard guarantee the alloc won't fail with 3GB, but in any case, it's far more likely to succeed than in 2GB.

    I have already explained why I don't like just "trying the alloc". In most cases it is unlikely a contiguous buffer, or everything being allocated upfront, making error handling quite difficult. Even if you end up being lucky, it can cause other, more insidious OoM on seemingly harmless allocations (say, 1MB) since you will be running low on memory in a 2GB addressing space. Given it can be very hard to recover from OoM scenarios, most cases usually ending up in crashes and thus data losses, I would prefer erring on the careful side for the sake of the application's user.

    Raymond has already provided a good answer for #2. Of course, you always need to profile this kind of optimization in the appropriate context, as it is very difficult to predict. But there *are* cases where it is significantly better, so the question remains valid. Given we do not have enough information, we have to assume they did their profiling well, and the optimization is justified, they just want to have a safe way to apply it. Answering them that they should not do this anyway, without further information or context, is kind of disrespectful of their work, and does not help.

    @Anon : If they are lying to you on purpose to preserve backward compatibility, then it is equally safe to assume they will also properly emulate memory allocation to respect the given memory addressing range of the query. Otherwise the problem is in the inconsistency of the emulation layer, not you relying on that information to make an algorithm decision.

  41. Don't Do That! says:

    > Seriously, think about the garbage collector: how can it tell, on the stack or otherwise, which is a pointer and which is a number?  You can't feasibly generalize with tagging on the object's memory location.  You may, but complementary to some tagging on the value/pointer.

    Tagging in the least significant bits is sufficient to do this; the best part is you can do that transparently to the memory access mechanism (0 in the tag field denotes a pointer/reference, non-zero a primitive type)

    > Or, are you suggesting that value types should be objects?  That will make any computer look like a primitive crank gear machine.

    Ask the Java folks.  *chuckles*  (FWIW, I agree with you that giving everything reference semantics is boneheaded.)

  42. Silly says:

    @Cody. Well of course.the message box would use MB_ICONQUESTION to clearly indicate its intent. And the error index cards would be *laminated*.

Comments are closed.

Skip to main content