Kernel address space consequences of the /3GB switch


One of the adverse consequences of the /3GB switch is that it forces the kernel to operate inside a much smaller space.

One of the biggest casualties of the limited address space is the video driver. To manage the memory on the video card, the driver needs to be able to address it, and the apertures required are typically quite large. When the video driver requests a 256MB aperture, the call is likely to fail since there simply isn't that much address space available to spare.

All of kernel's bookkeeping needs to fit inside that one gigabyte. Page tables, page directories, bitmaps, video driver apertures. It's a very tight squeeze, but if you're willing to cut back (for example by not requiring such a large video aperture), you can barely squeak it through. (A later entry will discuss another casualty of the reduced address space.)

It's like trying to change your clothes inside a small closet. You can do it, but it's a real struggle, you're going to have to make sacrifices, and the results aren't always very pretty.

Comments (22)
  1. asdf says:

    What I’ve always wondered if it was possible that each process could get all 4 gigs of addressable memory and the kernel gets its own 4 gigs too instead of doing this lower 2 gigs are unique per process and the upper 2 gigs are always the same and belong to the kernel (ignoring PAE and /3GB here)? If so, how come the current design was chosen over that one?

  2. winden says:

    asdf: For speed reasons. Giving 4 gigs to each process would involve changing cr3 register (the main pagetable pointer) on each kernel call, which can also involve a TLB flush on some systems… some another operating system calls this the "4g/4g split mode", just in case you want to google it up.

  3. Cooney says:

    Yeah, if you really need that much memory, you should probably look at the AMD64, or maybe a G5.

    As a semi-related aside, I wonder if MS would benefit from porting its server-side apps to Linux. I view the porting of Office as one of the signs of the apocalypse, but MSSQL on Linux might work – after all, sybase did it.

  4. Mike Swaim says:

    As a semi-related aside, I wonder if MS would

    >benefit from porting its server-side apps to

    >Linux. I view the porting of Office as one of

    >the signs of the apocalypse, but MSSQL on

    >Linux might work – after all, sybase did it.

    But Sybase SQL Server was already running on Unix variants back when SQL Server ran on OS/2, so it was a relatively easy port. I suspect that SQL Server 2000 and Yukon are pretty deeply tied to how Windows does things, and a port would be pretty difficult.

  5. Mike Dimmick says:

    I note that the KB article you linked to refers to Windows XP. Does XP still only give 2GB VA to user processes, regardless of the setting of /3GB and the large-address-aware switch? Russinovich and Solomon, in ‘Inside Windows 2000’, state that Windows 2000 Professional and Windows 2000 Server (as opposed to Advanced Server or Datacenter Server) did this.

    The switch does still seem to restrict kernel VA space to 1GB – you just end up with a 1GB hole between user and kernel space.

  6. Mike: No, they changed XP so that you can get the full 3GB of VA space. This was for engineering applications like CAD software, that wants as much RAM for modeling as it can et.

  7. Cooney says:

    Mike:

    > But Sybase SQL Server was already running on Unix variants back when SQL Server ran on OS/2, so it was a relatively easy port.

    I mentioned Sybase specifically because it was the base for SQL Server, IIRC. This means that MS has the potential to get a relatively rapid port for the core bits, provided that it has the interest and Sybase wants to play ball.

  8. Michel says:

    Cooney:

    >I mentioned Sybase specifically because it

    >was the base for SQL Server, IIRC. This means

    >that MS has the potential to get a relatively

    >rapid port for the core bits, provided that

    >it has the interest and Sybase wants to play

    >ball.

    If you read Inside SQL server 2000 you can see from the background and history overview that pretty much of the sybase code has been removed and that the current SQL server uses the operating system services supplied with Windows to a great deal and it even doesn’t seem from the comments that they are insulated from the core of the code. The author even mention the portability layers and least common divisor and reimplementing of things you have to do for different platforms as not good for the kind of performance SQL Server strives to acheive on it’s native platform. So my guess its not a walk in the park backporting SQL Server to Linux/Unix, and why in the world from a business standpoint would MS wan’t to do it anyways.

    Anyways this maybe isn’t a relevant topic when discussing a log on the functionality of the 3Gb switch.

    /M

  9. Another problem with /3GB is the lack of virtual address space for mapping views into the cache manager. As disk sizes grow, this becomes a real issue. Remember that the cache manager doesn’t actually keep all of its mapped views resident, so a 1GB virtual block cache isn’t that outrageous today.

    OSR had a good article on AMD64 recently that lays out the new virtual address space layout. Suffice it to say- we should be good for a few more years.

  10. tim says:

    System PTEs, Paged Pool, and NonPaged Pool share the address space at 0xe1000000-0xffbe0000 (491.875MB), along with the following positioning:

    ———— 0xe1000000

    Paged Pool

    System PTES (7000-50000 pages)

    NonPaged Pool (max 128MB)

    ———— 0xffbe0000

    In a 2gb kernel space, there is an additional address space at 0xa4000000-0xbfffffff (448MB) for additional system PTEs (or large system cache if decided) and an additional 128MB space for NonPaged Pool at around 0x8???????.

    With /3gb, all stuff are locked inside the 491.875MB space. However, you may still be able to tune individual’s size inside the 491.875MB by playing with the "Session ManagerMemory Management" registries.

  11. Petr Kadlec says:

    "It’s like … inside a small closet"

    Well…a few years ago…who would guess that 1GB of RAM address space would look like a "small closet"? :-)

  12. dave stokes says:

    Well, you have basically a rather primitive operating system running on primitive hardware (and no, this is no troll;-). With IBM’s z/OS pretty well every system function runs in its own address space. But z/OS (and the zSeries Hardware) allow what’s called synchronous cross-memory processing and extended addressability. A work unit (thread, task, whatever) can synchronously start executing code in another address space, there’s no real context switch. Data addressability can be in one of three address spaces, primary (wherever the code is), secondary (typically where the data is in cross memory mode) and home (the work unit’s originating address space), so there’s no (necessary) copying of data between address spaces. And there are additionally so-called Access Registers which are registers which address address spaces, and qualify the normal address registers. With this mechanism code can directly address any data in any address space (subject to certain authorisation checks, of course). Access registers also allow so-called Data Spaces, which are address spaces containing only data, probably great for things like giant video buffers. AFAIK this sort of stuff’s not available with any Intel chip, or at least if it’s possible Windows doesn’t seem to make much use of it.

    OTOH it’s much more fun writing Windows code with VS than for z/OS with any IBM tool I’ve encountered…

  13. winden says:

    dave, it sounds very nice, could you point to some IBM docs about these things?

  14. Raymond Chen says:

    Sorry, Chris, but all the selectors point into the same underlying address space. More on this on August 16.

  15. Chris Becke says:

    Very nice? That sounds horribly like what Win16 did with the 386’s segment registers.

    I still have nightmares about near pointers, and far pointers (and damnit HUGE pointers!).

    Anyway, as I understand it (and the last time I even thought I *understood* it was when the 386 was state of the art) the x86 processors still have all the segment registers that can each be pointed to its own 4Gb address space. This means that, OS willing, a single x86 instructction could choose to access memory in one of the CS, DS, ES, FS, GS or SS. 4X 4Gb of data and 4Gb each for code and stack? Interesting but ultimately less hassle to just switch to Win64 and use only near pointers there :P

  16. dave stokes says:

    Hi Chris, no, it really has nothing to do with some old 386 architecture. The 370/ESA/Z architecture has always used a pure linear adressing scheme for virtual memory, just like Intel (nowadays).

    And winden, hardware is described in SA22-7832 Principles of Operation, and z/OS extended addressability in SA22-7614 Programming Extended Addressability, both available at http://www-1.ibm.com/servers/eserver/zseries/zos/bkserv/r5pdf/mvs.html

    but it’ll be hard going without some previous familiarity with IBM mainframes…

    Of course, these features are hardly all that distinguishes the zSeries hardware and software…

  17. dave stokes says:

    Kind of a PS to Chris Becke’s final comment…

    >>Interesting but ultimately less hassle to just switch to Win64 and use only near pointers there :P<<

    Well, it’s probably a good point. From the pure addressability point of view I’d say 64 Bit (z/OS is a 64 Bit OS BTW) could make things a lot simpler. Of course IBM can’t just remove extended addressability since there’s vast quantities of code which use it (it’s been around since the late eighties).

    However there’s maybe another aspect, which is that running system services in their own address spaces can make things like storage protection, error recovery and restart of failing system services much cleaner architecturally. There may well be aspects of the Windows kernel architecture which compensate for the lack of such facilities, I know rather more about z/OS than Windows at this level, and I’d be interested in comments from the experts.

    However, Windows recovery certainly still sucks, as anyone who’s cancelled a process probably knows (well, most people just notice they can’t access files and stuff without rebooting, I guess, without knowing really why). z/OS OTOH…well anyway, I was really more interested in the technical aspects, not praising z/OS to the skies.

  18. Bill H. Gates IV says:

    "640GB of memory should be enough for just about anyone."

    (This had to be posted early because comments will be closed before the target date, 1 April 2007.)

  19. &nbsp; As Evan&nbsp;already mentioned on his blog, Raymond Chen has a great series on /3GB switch on his blog. What is really cool is that Raymond takes on some myths about the /3GB switch and&nbsp; the fact that he…

  20. Piotr's blog says:

    Some time ago, we decided to optimize server settings, as recommended by MVPs and other industry gurus….

Comments are closed.