A look at Windows Virtual Memory mechanisms (continuation of "A look at Virtual Address Space - VAS")

As I promised last time here comes next post on memory J.  Remember, my eventual goal is to reveal how memory management works in SQL Server but for you to really appreciate it, I think, you do need to get good feeling on how Windows manages memory. Understanding details is great however at this point I want you to understand the concepts! In some cases I on purpose skip the details because right now they are not important and many of them can be changing from one release to another.  So lets continue!

 

In my previous post I mentioned that VAS regions could be bound to physical memory right a way or latter on using VirtualAlloc API. As most of the operating system do Windows binds physical memory to the VAS on demand, the first time page in a region is touched. (The story is a bit different when you are running without paging file, then VAS page is bound to physical page right a way) The binding happens only one page at a time. When memory first touched hardware generates exception, called page fault. The exception is handled by Windows. The OS verifies if current VAS region is committed by looking into the VAD structure representing this region. If region is committed and it is accessed for the first time OS will find a physical page in RAM it can use,(Keep in mind that this page will be zeroed out before given away due to security reasons) Finally it binds VAS region to the page by filling appropriate data structures and loads this information into CPU then execution continues right at the point where page fault occurred.

 

Due to shortage of RAM Windows might decide to take a way physical page from a process. Using a reclaiming policy the OS finds a page that it will free. Then OS makes sure that page file contains up to date image of the page on disk by spilling page to paging file if necessary. Once the page is on disk, OS can fix up all required data structures so that next time the page is touched it will know where to find it. When finished, the physical page is zeroed out and then put on the free list to be used for subsequent memory requests.

 

As I said previously a CPU generates exception, a page fault, when it can’t find a physical page corresponding to a virtual address it tries to access. An application can generate page fault by two ways first when it touches VAS just committed region for the first time and second if the OS previously paged out given page to disk.  (when application touches a VAS page that hasn’t been committed yet Windows will generate exception). The type of fault when page needs to be brought from disk is called hard page fault. If application touches just committed VAS region for the first time it will generate a a demand zero page fault. There is also another type of page fault – soft page fault. Soft page fault occurs when Windows can still find the page in physical memory and serve it without bringing the page from disk.

Using Windows virtual memory mechanism the same physical pages can be mapped to different VASes at different places and multiple times. Physical pages that can only be mapped to single VAS are called private, because they can’t be shared across different VASes. Physical pages that are mapped to different VASes at the same time are called shared physical pages.  

 

All VAS regions committed using VirtualAlloc* interfaces bound to physical pages that can’t be shared across different VASes. One can think about them as private physical pages. A sum of all private physical pages, in RAM and on disk is called Private Bytes, using perfmon terminology or VM Size according to Task Manager’s terminology. A sum of all private physical pages that are only in RAM is called working set, or Memory Usage according to Task Manager.

 

As I mention before, depending on physical memory demands Windows might remove physical pages from process’s working set.  This activity is usually referred as paging. The OS provides a way to avoid paging of a given VAS regions. It provides mechanism to “lock” VAS regions in physical RAM. As you would expect an application that tries to lock its VAS regions might destabilized the behavior of the whole system.  To mitigate this behavior Windows has “Lock pages in memory” privilege turned off by default so that only applications that a specifically given this permission by administrator can lock pages in memory.  In addition the OS still might page out the whole process’s working set if needed.

 

Even though the size of x86 registers is 32 bit the platform enables an OS to manipulate with 64GB of RAM using Physical Address Extensions, PAE. On Windows PAE can be turned on from boot.ini file. If PAE is not turned on, Windows will only be able to manipulate with up to 4GB of RAM even though box might have more of RAM installed. 

 

Some user applications require more VAS than 2GB. Windows have ability to configure VAS of user application up to 3GB. This feature has significant drawback. It limits amount of VAS available to kernel down to 1GB. Increasing size of user VAS can be done by adding /3GB switch to boot.ini file. Once modification is complete system will require reboot.

 

Limiting kernel VAS to 1GB affects the whole box not only the application that needs larger size VAS. For example ones 3GB switch is turned on amount of RAM that the OS can support, if PAE is enabled, drops down from 64GB to16GB. 3GB switch affect all kernel components including all drivers. Having 3GB switch turned on can cause effects from drop in performance and memory allocation failures to system stalls. My suggestion is to avoid usage of 3GB switch unless you really really need it.

 

As I described, on x86 platforms VAS is limited to 2GB or 3GB depending on configuration in boot.ini file. On AMD64, 32 bit application can have up to 4GB.   Such small VAS could be a significant drawback for high end servers manipulating with gigabytes and terabytes of data, think about SQL Server. I also said that PAE switch enables Windows to manipulate with up to 64GB of RAM. But how one process does access so much memory? In order to enable single process to manipulate with amount of RAM bigger than size of VAS, Windows provides Address Window Extension, AWE, mechanism (Remember your DOS yearsJ). Be careful many authors use term AWE memory. There is no such thing as AWE memory, one can’t go and buy AWE memory at a store J. . Since there is no AWE memory, there is no either low AWE nor high AWE memory. There is AWE mechanism that enables one to manipulate with amounts of RAM larger than VAS! The principle is simple. Using AWE API one can allocate physical memory, then using VirtualAlloc API allocate VAS regions and then using AWE mechanism bind/ubind VAS regions and physical memory. As with locked pages in order for an application to use AWE mechanism it needs to have Locked pages in memory privilege enabled.

 

 AWE API can be used even on the boxes with RAM below of a process’s VAS.  In fact AWE could be used to avoid any type of paging (Remember that when locking pages in memory using VirtualLock mechanism Windows still can page out the whole process)  When using AWE mechanism, the OS can’t interfere at all. With such flexibility comes a difficulty. Misuse of AWE mechanism can stall the whole box so that only way to recover from such state could be a reboot.

 

Well I think it is enough info for today J From this and previous posts all I want you is to grasp the concepts. They are really important!  Here is the quick summary:

 

      - Every process has its own VAS

- VAS is often neglected by developers.

- VAS is limited resource even on 64 bit platform

- VAS is managed by Windows the same way as one would manage a heap

- VAS's smallest region is 64kb

- VAS regions is managed by corresponding VAD in kernel

- VAD describes state of VAS region - allocated, committed, uncommitted

- A VAS page bound to physical page in RAM on demand, the first time page is accessed, unless running without page file).

- Physical pages can be private or shared

- Set of all private physical pages for a process is called private bytes (PM), or virtual memory size (TM)

- Set of private physical pages in RAM is called process working set.

- Physical pages bound to specific VAS region can be locked in memory

- PAE switch enables Windows’s support for 64GB of RAM

- On x86 VAS is limited to 2GB, 3GB or 4GB

- /3GB switch increases user VAS to 3GB and decreased kernel VAS to 1GB.

- /3GB switch limits amount of physical memory enabled through PAE switch down to 16GB

- /3GB switch is not recommended

- AWE is mechanism, it is not memory.

- AWE enables single process to use memory outside of VAS limits.

      

-     /3GB swith is not required to enable AWE mechanism

-     /PAE switch is not required to enabel AWE mechansim

Some interesting gotchas:

- Allocating physical memory using AWE mechanism is slow on Windows 2000. (This is a reason why SQL Server 2000 doesn’t manage memory dynamically when AWE is enabled). The problem is fixed in Windows 2003 server so that Yukon does allocates physical memory dynamically

- Physical pages allocated through AWE mechanism are not part of process working set neither large pages (I will cover large pages sometime latter). This is exact reason why neither PerfMon or Task Manager show them

 

Next time we will talk about memory pressures and then we will be ready to dive into SQL Server Memory Manager.

 

For now enjoy the weekend!