PAE and /3GB and AWE oh my...

Many customers (and non-customers) are often confused about all the different memory configurations and options available on 32bit systems (64bit systems do not have so many considerations thanks to the large flat memory space and a VAS size of 16 terabytes as of my typing this)...so, here's an attempt to address each of these a bit and simplify everyone's understanding of them.

Before we get started, much of what is going to be discussed is vastly over-simplified for sake of brevity and sanity, as trying to explain everything about everything would make for a very, very, very, very long blog posting...this one will be long enough trying to keep things as simplified as possible without leaving out any important parts (which I'm sure will still be missed somewhere, so by all means post a comment if something is missing or unclear)...

First, a bit of a 32bit memory architecture primer for Windows:

   Windows implements a virtual memory system, which means the OS virtualizes all access to physical memory by
   user-mode processes (Sql Server is a user-mode process, as is all commodity software that runs on windows with
   the exception of things like device drivers, filter drivers, etc.) that run on the system and make use of standard
   memory allocation API's. A kernel mode process referred to as the "Virtual Memory Manager" is the sub-system
   responsible for handling direct memory access for requests to memory, translating virtual memory addresses from an
   application into physical addresses, and handling paging operations (moving data to/from physical RAM to disk, i.e .
   the page/swap file) as necessary based on many different considerations. By default, the 32bit OS can only 'see' and
   use up to 4gb of memory, as it uses a 32bit range to map physical memory space (with a valid range of 0x00000000 up
   to 0xFFFFFFFF, or 0 thru 4,294,967,295 in decimal format, which correlates to a maximum high range pointer of 4GB
   (divide 4294967296 slots (don't forget the 0 slot counts) by 1073741824 (the # of bytes per GB)).  Yes, this means
   that on a default 32bit system no matter how much physical memory you install on the system, the system would only
   ever use at most 4gb of it (kind of a waste if you have a 32bit server with 64GB of memory, huh?)...we'll get into
   how you can make use of more than 4GB on the system later.

   When a process is created on a 32bit Windows system, the OS allocates what is referred to as VAS (Virtual Address
   Space) to that process, and all user-mode processes have their own VAS. The processes' VAS is basically what that
   process can 'see' as memory - for all intents and purposes, the process thinks that this is a map of physical memory
   that it has access to - and this is ALL the memory it thinks it has access to (note that I didn't say that this is
   the only physical memory it can potentially access...more on that later...).  A processes' VAS in 32bit windows is 4GB
   in size, and a VAS in 32bit Windows is typically divided into 2 primary sections: 1 section that is available to the
   process, and 1 section reserved for the system/kernel.  In Windows, this range of memory is split up into the 2 following
   ranges by default:

      Low 2GB Range (0x00000000 through 0x7FFFFFFF) - available to the user-mode process
      High 2GB Range (0x80000000 through 0xFFFFFFFF) - reserved for the kernel/system
 
   So, though a VAS is 4GB in size, a user-mode process by default only has access to up to 2GB of that space - yes,
   this means that by default, a 32bit user-mode process can only ever access 2GB of memory at the most using standard API's.

   This is where a question usually arises - "If every user-mode process gets 4GB of VAS, and I have a machine with only
   2GB of memory, how in the world can I run so many applications concurrently on this machine????".  Good question - it all
   comes back to the fact that Windows implements a virtual memory system, and is also where the page/swap file comes into
   play. First off, just because a user-mode application is given 2GB of VAS space to use doesn't mean that amount of space
   has to be physically backed by memory/storage - this gets into the difference between reserving and committing space,
   which I'm not going to discuss here, but most OTS applications on a standard user computer use memory in the 10's of
   megabytes, not in the hundreds of megabytes or even gigabytes range. Second, if there isn't enough physical memory space
   available for the OS to store all the committed memory for all applications running on the machine, this is where the
   page/swap file comes in - the OS pushes data that is currently not in use by some application to the page file to make
   room in physical memory for currently needed data.  If data that has been pushed to the page file is later requested to
   be read by a given application, the VMM pulls the data back from the page file into physical memory, possibly causing
   other data in physical memory to be pushed out to the page file, updates page entries, and then grants the read request.
   This process of moving data to/from the page file in/out of physical memory is referred to as page faulting, of which
   there are many different types (hard, soft, etc.).

Ok, so that was lots of fun, yes?  Now, let's try and tackle the remaining options in a logical order - I'm going to start with the /3GB switch (and it's related cousin the /USERVA switch).

   The /3GB and /USERVA switches are specified within the boot.ini file, and to take effect require a restart of the system.
   The /3GB and /USERVA switches affect the sizes of the 2 sections of the VAS as mentioned above - by default these 2
   sections are each 2GB in size each, the low range of which is used by the user-mode process, the high range used by
   the kernel system. Again, this means that a user-mode process (such as Sql Server) has the ability to directly
   address only up to 2GB of memory. The /3GB switch changes the split of the VAS from 2GB for each the user and
   system process to be instead 3GB for user processes and leaving only 1GB for the kernel/system processes. The /USERVA
   switch does the same exact thing, however you define the split by specifying a value between 2048 and 3096, which in
   effect becomes the size of the user-mode addressable portion of the VAS (for example, if you specify /USERVA 2560, you'll
   end up with a VAS split of 2.5 GB for the user-mode process and 1.5 GB for the system/kernel). Use of the /3GB and
   /USERVA switches does NOT depend on using /PAE or AWE - you do NOT need /PAE enabled to use these switches, and you
   don't need AWE enabled either - these options are mutually exclusive. You can use either of these switches on servers
   with <4GB of physical memory and still affect the addressable memory space for user mode applications and the kernel.
  
   So, using /3GB effectively changes the partition split for the VAS from what is mentioned above to the following:
  
      Low 3GB Range (0x00000000 through 0xBFFFFFFF) - available to the user-mode process
      High 1GB Range (0xC0000000 through 0xFFFFFFFF) - reserved for the kernel/system
     
   Using the /3GB switch does have some impact to be aware of - one important side-effect being that at most a server with
   /3GB enabled will be able to use 16GB of physical memory in total (due to the limited 1GB of kernel space limiting
   space to be able to store internal memory mapping structures). Additionally, an application must be compiled and linked
   using a special switch to take advantage of the increased addressable memory space. For more information on the /3GB
   switch and related requirements, see the following articles:
  
      Available switch options for the Windows XP and the Windows Server 2003 Boot.ini files
      https://support.microsoft.com/kb/833721
     
      Information on Application Use of 4GT RAM Tuning
      https://support.microsoft.com/kb/171793/

Great, more fun.  Ok, next we'll discuss PAE -

   Similar to the /3GB and /USERVA switches, /PAE is an OS level boot.ini switch/option, not a Sql Server technology of
   any kind. PAE in a nutshell allows the OS to see more than 4GB of physical memory - remember above in the memory
   architecture overview how I mentioned that a 32bit Windows system by default would NEVER be able to see more than 4GB
   of memory due to the limitation of a 32bit pointer size? Well, PAE is the mechanism by which 32bit Windows systems can
   'see' and address above 4GB of physical memory - this is achieved by system level changes that allow mapping of 32bit
   pointers through to an equivalent 36bit physical memory location (allowing up to 64GB of physical memory, 2^36). One
   such system level change is the addition of an extra level in the MMU called the page directory pointer table. Adding
   the PAE switch has NO effect whatsoever on the VAS size in a 32bit system - it remains at 4GB in total, and the sizes of
   the 2 portions of the VAS remain exactly the same also (2GB each by default, unless /3GB or /USERVA is used as mentioned
   above). User-mode applications do not need to do anything special to take advantage of the increased visible memory
   available to the OS, because due to the fact that memory management is virtualized anyhow, nothing changes from the
   application perspective. If nothing else was changed on a given system, what enabling PAE would in effect do for a system
   (assuming >4GB of memory exists before and after the change that is) would be to allow the OS more physical memory space
   for storing data instead of having to page this data to disk (i.e. to the page/swap file). And again, similar to /3GB, PAE
   is NOT dependent on any of the other options (3GB, USERVA, AWE, etc.) - although they can be used together by all means
   with certain distinctions (like for example those mentioned above with /3GB).
  
Ok, and finally, how does AWE fit in here -

   AWE is a bit different from /3GB and PAE in multiple respects. First, AWE is simply a mechanism to access/map memory
   from outside a processes' VAS into it's VAS and vice versa - it is not a type of memory, it is not an extension of
   memory, and it doesn't change any memory structures (this is a bit over-simplified, but again, not going to go much
   deeper than that here). Many people also think AWE is a Sql Server technology - it is not...in fact, it is a
   technology that Sql Server uses, just as many other applications can and do. Second, it is specific to a single
   process, it is not a shared/system wide change (multiple processes can use the AWE mechanism, but they do so
   independently of one another if they do so).
  
   So, what is AWE - AWE is basically a set of system API's that allow a process to access memory outside/larger than
   it's VAS; however, using AWE isn't direct memory access - AWE is a windowing technology by which a process basically
   uses physical memory outside it's VAS for storage of data, however to access this data, it has to map that data into
   it's VAS before accessing it - this is what the AWE API's do, they provide an interface by which a process can reserve
   memory outside it's VAS for storage, however to actually 'touch' the data stored in these memory locations, the process
   has to map that data into it's VAS to read it. For all intents and purposes, you can think of using AWE memory
   management API's as very similar to what happens in the kernel level memory manager when a soft-page fault occurs. Using
   the AWE mechanisms allow a process to allocate physical memory, then using these same API's allocate space within the
   processes' VAS to map to these physical memory locations. So although AWE allows you to access/use memory outside
   the processes' VAS, the process is still confined by the VAS in terms of being able to read/write data into memory
   mapped by AWE mechanisms.
  
Phew...there, we made it through. Now, how does this all affect Sql Server? Well, I'm glad you asked, cause I'm not sick of typing yet at all...

/3GB can benefit Sql Server in many scenarios where you are incurring VAS memory pressure. There are a couple different types of memory pressure (physical memory pressure and VAS memory pressure are the 2 biggest I guess), and you'll typically encounter VAS memory pressure in workloads that generate lots of hashing/sorting operations and/or very heavy hashing/sorting operations (Data Warehouse workloads for example). You can also encounter VAS memory pressure in scenarios where plan cache miss-use is occurring (perhaps do to ad-hoc or poor T-Sql), servers with large numbers of user connections, or cursor-based operations are occurring frequently. You also typically see this type of pressure more pronounced on servers with large data-caches that are using PAE/AWE mechanisms - the reason for this is simple: the only memory consumer in any version of Sql Server that can make use of AWE-mapped memory is the data cache. No other memory consumer can make use of AWE mapped memory, including your plan cache, connection cache, etc. The reason this becomes more profound on these large-cache systems is typically because the data sizes are larger in addition to the things mentioned above.  There are many other considerations to take into account before simply enabling a system with /3GB or /USERVA, this is just a start...here are some VERY GENERAL pointers you can use as guides for determining if /3GB or /USERVA will be useful in your environment, and also some things to look out for (again, by no means an end-all, be-all list of things to consider):

    1) Using /3GB limits the total physical memory that can used on a server to 16GB - this is a side-effect of/due to the
    decreased VAS space for the system/kernel, which is where memory-mapping structures are stored by the VMM. Therefore, if
    you plan to/want to use more than 16GB of memory on your server, you can't use /3GB.
   
    2) Though 16GB is a hard upper limit, most workloads will actually show decreased throughput on systems with 12GB of
    memory, and many on systems/workloads with as low as 8GB of memory. The use of /3GB is an advanced configuration that
    should be tested, understood, and documented within your environment/workload prior to implementing to ensure increased
    throughput.
   
    3) If /3GB or /USERVA is used, monitor "Memory:Free System Page Table Entries" perfmon counter closely - PTE's are one of
    the memory management structures used by the VMM to manage virtual-physical memory mappings, and typically requires an
    absolute minimum of 7,000 free entries for efficient system operation. Systems operating with insufficient Free PTE's are
    prone to major problems such as STOP errors and blue-screens. This is only one of the considerations for monitoring a system
    using /3GB - the kernel is responsible for many, many critical operations on a system. Typically if the kernel is starved
    for memory as a whole, some of the first systems to incur side-effects include network operations (dropped/lost packets) and
    general IO operations (network IO, storage IO).
   
    4) /3GB or /USERVA should only be used if you can properly determine that your workload is VAS-pressured (and none of the
    other considerations preclude the use as well) - if your workload is not under VAS pressure, or if the VAS is not your
    bottleneck, you will probably see minimal (if any) increased throughput.
   
PAE can benefit Sql Server in obvious ways - if the OS can see more memory, than so can Sql Server (both directly via AWE mechanisms, and also indirectly via use of additional physical memory by the OS for multiple system processes). By far, the largest benefit here is for larger Sql systems that can benefit from additional memory space via AWE mechanisms.

Similarly, AWE can benefit Sql Server in obvious ways - the more memory the server has to store data in (even if it isn't 'direct' memory access in a sense, it is still infinitely faster than disk-based access) the better things will typically operate. There are also additional considerations to take into account when working with a Sql Server system that implements AWE (especially with Sql 2000 and/or clustered systems), so ensure you have a good understanding of those prior to sprinting down this path as well.

Finally, thanks goes out to many of my colleagues for assistance with reviewing/recommending/adding/collaborating on content to this particular post:

 Dwayne Lanclos
 Uttam Parui
 Ernie DeVore
 Pam Lahoud
 David Sheaffer
 Shen Xu
 Curtis Krumel

Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at https://www.microsoft.com/info/cpyright.htm.