We have already talked about Windows AWE mechanism on 32 bit and how SQL Server utilizes it. Today I would like to go over AWE & related mechanism on 64 bit platforms.
To some people it comes as a surprise that AWE mechanism is still present and actually could be useful on 64 bit platforms. As you remember the mechanism consists of two parts allocating physical memory and mapping it to the given process’s VAS. The advantage of allocation mechanism is that once physical memory is allocated operating system can’t reclaim it until either the process is terminated or the process frees memory back to the OS. This feature allows an application to control and even avoid paging altogether. Advantage of mapping/unmapping mechanism is that the same physical page could be mapped into different VAS’s regions. As you imaging unmapping is not necessary on 64 bit platforms since we have enough VAS to accommodate all of existing physical memory.
From Operating System theory, OS implements a page table entry, PTE, to describe a mapping of a page in VAS to physical page. Internally physical page is described by page frame number, PFN. Given PFN one can derive complete information about physical page it represents. For example PFN shows to which NUMA node the particular page belongs. OS has a database, collection of PFNs that it manages. If page in VAS is committed, it has PTE which might or might not point to given PFN. Conceptually, page that PTE represents can be either in memory or not, for example swapped out to disk. In the former case it is bound to a given PFN and in latter it is not. In its turn, once a physical page is bound to page in VAS, its PFN points back to PTE.
When OS commits, frees, pages out/in a given PTE or needs to derive some information about it, for example NUMA residency, it has to acquire process’s working set lock – to guarantee stability of PTE to PFN binding. This lock is a rather expensive and might hurt scalability of the process. Latter versions of Windows made this lock as light as possible but avoiding still will benefit application’s scalability..
When allocating physical pages utilizing AWE mechanism we are given a set of PFN entries directly from PFN database – remember that you should not manipulate or modify set of entries you get back nor can you rely on values you get back. OS is required to take a PFN database lock when allocating PFN entries. Using AWE map mechanism you can map allocated PFN entries to the process’s VAS. When mapping occurs PTEs are allocated, bound to PFNs and marked as locked. In this case OS needs to acquire process’s working set lock only ones. When mapping regular pages, OS does it on demand and hence will have to acquire both working set and PFN database lock for every page. Since pages are locked in memory, OS will ignore these PTEs during paging process.
On 64 bit platforms it is better to refer to such pages as locked pages – please don’t confuse them with pages locked through VirtualLock API. As described above locked pages have two major properties – they are not considered for paging by OS and during allocation they acquire both working set and PFN database lock only ones.
The first property has implicit implication on high end hardware such as NUMA. It provides explicit memory residency. Remember that OS commits a page on demand. To allocate physical memory, it will use a node on which a thread touching memory is running. Latter on, the page can be swapped out by OS. Next time it will be brought up into memory, OS will again allocate physical page from the node a thread touching memory is running on. In this case a node could be completely different from original one. Such behavior makes hard for applications to rely on page’s NUMA residency. Locked pages solve this problem by removing themselves from paging altogether. Moreover Windows 2003 SP1 introduced a new API – QueryWorkingSetEx. It allows to query extended information about PTE’s PFN. In order to find out real page residency you should use this API. When pages are locked you need to it only ones. Otherwise you will have to do it periodically since residency of the page can actually change.
The second property – taking both working set and PFN’s database lock only ones enables applications to perform faster and better scalable ramp up.
On NUMA SQL Server’ Buffer Pool marks each allocated page with its node residency. It leverages QueryWorkingSetEx to accomplish it. Once page is allocated it calls the API to find out page residency. It does it only once. Therefore enabling locked pages for SQL server on 64 bit platform will improve SQL Server ramp up time and will improve performance & scalability over longer period of time. When running SQL Server with locked pages enabled you shouldn’t be worried about overall system performance due to memory starvation – SQL Server participates in OS’s paging mechanism by listening on OS’s memory notification API’s and shrinks its working set accordingly.
Let us summarize now – on 64 bit platform, locked pages, awe mechanism, enable better application’s scalability and performance both during ramp up time and over long period of time. However, keep in mind that an application is still required to implement a way of responding to memory pressure to avoid starving the whole system for memory.
Let me know your comments!