Why is the virtual address space 4GB anyway?


The size of the address space is capped by the number of unique pointer values. For a 32-bit processor, a 32-bit value can represent 232 distinct values. If you allow each such value to address a different byte of memory, you get 232 bytes, which equals four gigabytes.

If you were willing to forego the flat memory model and deal with selectors, then you can combine a 16-bit selectors value with a 32-bit offset for a combined 48-bit pointer value. This creates a theoretical maximum of 248 distinct pointer values, which if you allowed each such to address a different byte of memory, yields 256TB of memory.

This theoretical maximum cannot be achieved on the Pentium class of processors, however. On reason is that the lower bits of the segment value encode information about the type of selector. As a result, of the 65536 possible selector values, only 8191 of them are usable to access user-mode data. This drops you to 32TB.

The real limitation on the address space using the selector:offset model is that each selector merely describes a subset of a flat 32-bit address space. So even if you could get to use all 8191 selectors, they would all just be views on the same underlying 32-bit address space.

(Besides, I seriously doubt people would be willing to return the the exciting days of segmented programming.)

In 64-bit Windows, the 2GB limit is gone; the user-mode virtual address space is now a stunning 8 terabytes. Even if you allocated a megabyte of address space per second, it would take you three months to run out. (Notice however that you can set /LARGEADDRESSAWARE:NO on your 64-bit program to tell the operating system to force the program to live below the 2GB boundary. It’s unclear why you would ever want to do this, though, since you’re missing out on the 64-bit address space while still paying for it in pointer size. It’s like paying extra for cable television and then not watching.)

Armed with what you have learned so far, maybe you can respond to this request that came in from a customer:

Oen of our boot.ini files has a /7GB switch. Our consultant told us that we should set it to 1GB less than the system memory. Since we have 8GB, 8GB – 1GB = 7GB. The consultant said that setting this value allows an application to allocate more than 2GB of memory. We would like Microsoft to comment on this analysis.

Comments (58)
  1. Jonathan Payne says:

    Why is the split between user mode and the kernel at 8 TB rather than half way through the 64 bit address space? I realize that Intel and AMD processors have 44 and 40 address lines respectively so the architecture currently wouldn’t allow more than 1 TB or 16 TB in total but I don’t understand why the software shouldn’t be ready.

  2. Besides, this whole series of articles have been telling us that physical limits like the number of address lines have nothing to do with the size of the virtual address space.

    I’d also like to know why the limit is so low. I’m sure there’s already applications where terabytes of memory are required. What we call a supercomputing application today is somebody’s screen saver tomorrow…

  3. Raymond Chen says:

    I’ll cover the technical reasons later. Hint: ABA.

  4. ToddM says:

    I’d like to hear about any gotchas in the 64-bit model, such as the off-limits 64KB block just below 2GB in the 32-bit model. And what about system DLLs? Have they been rebased in 64-bit Windows to insanely high addresses?

    In other words, w.r.t. virtual address space, what’s the largest contiguous block one can practically allocate under 64-bit Windows? (Even with some gotchas, I bet it’s still pretty big…>7TB, perhaps?)

  5. RJ says:

    "I’ll cover the technical reasons later"

    I, for one, am looking forward to that article. ABA is one tough word to google for. Maybe the ABA race-condition?

  6. DrPizza says:

    "and, in the SIS world "

    Er, SSI, rather.

  7. DrPizza says:

    "I, for one, am looking forward to that article. ABA is one tough word to google for. Maybe the ABA race-condition? "

    That’s the only thing I can think of, though I can’t immediately see how it applies here.

  8. One thing to keep in mind is that on the 64bit platform, those limits are not hard-set. In future OS revisions, we can change the layout of VA space as needed, and in fact, they probably will…but only when 8TB of physical memory becomes a reality.

    Until then, it doesn’t MATTER if it’s part of your data set, because all you’re going to be doing is pulling it off one disk to put it onto another (the paging file)

  9. mikeb says:

    Setting /LARGEADDRESSAWARE:NO on 64-bit programs would be done for essentially the same reason that you don’t set /LARGEADDRESSAWARE:YES on 32-bit progrmas to take advantage of /3GB.

    Bascially, if you’ve ‘ported’ your 32-bit code to compile with a 64-bit compiler, but you have decided not to fix the bugs that assume that pointers are 32-bit entities that can be passed around in an int.

    There are a lot of programs which don’t really need the huge address space. I wouldn’t be surprised if a 64-bit version of some MS Office applications used this option, if and when Office is made available in a 64-bit edition.

  10. DrPizza says:

    "(Notice however that you can set /LARGEADDRESSAWARE:NO on your 64-bit program to tell the operating system to force the program to live below the 2GB boundary. It’s unclear why you would ever want to do this, though, since you’re missing out on the 64-bit address space while still paying for it in pointer size. It’s like paying extra for cable television and then not watching.)"

    I imagine it means you don’t have to run under WOW64 (so no thunking) but can still safely truncate pointers (so no tedious porting), which for an application such as WinWord would presumably be quite a useful thing.

  11. DrPizza: Being a regular reader of Raymond’s blog, you should know the theme of tradeoffs pretty well by now. Since it looks like Raymond will cover it, I’ll let him do it. Don’t assume decisions like that are made trivially though. There was a lot of deliberation behind it.

  12. Carl says:

    Could you comment on the choice of having 32 bits "int" even in 64 bits? Most codes I know use "int" to access components of an array. So basically this limits a usable array size of 2^31. Isn’t that going to be a problem some day when we will be able to allocate arrays that can’t be accessed entirely?

    Why hasn’t the "int" been promoted to 64 bits like pointers? Backward compatibility issues? Is it a good decision in the long term?

  13. DrPizza says:

    "DrPizza: Being a regular reader of Raymond’s blog, you should know the theme of tradeoffs pretty well by now."

    Indeed. Given the choice between doing the right thing and not doing the right thing, the decision will almost invariably be "not doing the right thing".

    "Since it looks like Raymond will cover it, I’ll let him do it. Don’t assume decisions like that are made trivially though. There was a lot of deliberation behind it."

    And lots of deliberation in applications that have got to work around it too. Glorious.

  14. Raymond Chen says:

    The choice is often between "doing right thing X and doing right thing Y – you can’t do both". Those are the hard choices.

  15. DrPizza says:

    "So, my guess is that the kernel folks realized that to map exabytes of data it’ll take terabytes of virtual address space, and optimized in that direction."

    They can still have their terabytes of virtual address space, though. That’s the thing about a 64-bit address space; it’s bloody big, and you can dedicate thousands of terabytes to both user- and kernel-mode.

    Except that Win64 doesn’t do this. User-mode is left without a simple and effective way of dealing with the vast datasets that exist /today/. Let alone future, even vaster, datasets.

  16. DrPizza says:

    "Why hasn’t the "int" been promoted to 64 bits like pointers? Backward compatibility issues? Is it a good decision in the long term? "

    int probably should be 32-bit.

    long probably shouldn’t.

    Most unixes are LP64 (so it’d ease porting between unix and Win32–which for the kind of scientific applications that benefit from 64-bit even today would be a bonus), and LP64 is simpler to deal with in C89 and C++98, as it doesn’t require the use of non-standard integral types. You leave char as 8, short as 16, and int as 32, and make long 64. Then all is well in the world.

    If people insist on truncating their pointers to ints (a) they’re idiots (b) the compiler can issue pretty good diagnostics (c) there’s always the image option to force all pointers to have the top 32 bits zeroed out. This makes such broken programs generally safe, and unlike the decision to go "LLP64" can be doing without hurting properly-written programs.

  17. asdf says:

    carl: the only guarantee you have is that ints are at least 16 bits. If you want other guarantees, either create your own typedefs or use stdint.h/inttypes.h. I mostly use size_t for accessing an array but if you want to use a signed type, the correct way to do it is ptrdiff_t because that’s what the compiler implicitly casts it to when you add/subtract arithmetic types from pointers (and do array access).

  18. Mike Dimmick says:

    Mapping a huge file in one go wastes page table entries, which obviously take up physical memory if they’re swapped in, and swap space if swapped out. Map enough of the file to work with at the moment, then switch the mappings to work with another part of the file.

  19. DrPizza says:

    "In 64-bit Windows, the 2GB limit is gone; the user-mode virtual address space is now a stunning 8 terabytes. "

    Which is a remarkably short-sighted decision.

    Even today, people have to deal with data sets of *around* that kind of size. Big databases, for example (consider the largest TPC-H data set–10 TB).

    It would afford a certain convenience if these large datasets could be dealt with without having to implement some kind of application-level paging.

    After all, the entire reason we don’t want to use AWE is because it requires us to implement application-level paging (we have to move our window about to pick and choose the chunk of our big block of memory that we can actually address).

    Current 64-bit processors do support a 63- or 64-bit virtual address space (they don’t have enough address lines, of course, because the physical RAM to realize such a space would be vast). This wasn’t always the case (you’d instead have 64-bit pointers with a 4x- or 5x-bit virtual address space), so why not make it bigger?

    A 50:50 (or perhaps 75:25 kernel:user) split would put off these concerns more or less indefinitely; we’d be already at the effective limit of 64-bit addressing, and the only way forward would be 128-bit. The current 8TB split, on the other hand, means that applications with large data sets are already pushing that limit.

    It’s particularly surprising given that MS has stated an interest in moving more into the HPC business, where vast data sets are relatively common, and, in the SIS world (which Windows can’t operate in today, but who knows what the future holds) large address spaces also.

  20. Ian Boyd says:

    "One of our boot.ini files has a /7GB switch."

    Well i’d say that he’s lying. He is mistaken if he thinks he sees a /7GB switch in his boot.ini file.

    Since there is an 8TB user mode address space, Microsoft didn’t invent a /7GB switch.

  21. josh says:

    I’m pretty sure you can access more than 4GB with selectors, it just gets *really* ugly: mark entire selectors as "not present" and do huge amounts of paging when someone starts using a different one.

  22. DrPizza says:

    "they probably will…but only when 8TB of physical memory becomes a reality. "

    Why on earth should virtual address space be constrained in this way by physical address space? One can experience benefits from expanding the address space long before one has the physical memory to realize that address space. The whole Exchange /3GB w/1 GiB RAM issue should have made that one amply clear….

    "Until then, it doesn’t MATTER if it’s part of your data set, because all you’re going to be doing is pulling it off one disk to put it onto another (the paging file) "

    Yes, it does matter. It matters a lot. Some smart OS guy has already written a complex system for pulling things in from disk and flushing things back out to disk. If I can just map a huge file, I can make use of this guy’s hard work, which leaves me free to deal with the problem at hand (which may be lots of things, but writing a paging mechanism ain’t it).

    If I don’t have enough address space to map that huge file, I have to *waste my time* implementing a mechanism that *already exists* within the OS.

    So, yes, it does matter.

  23. Carl says:

    asdf: On the project I’m currently working on, we are using "long int" hidden behind a typedef, so we are covered.

    But my argument was more a general about the majority of codes/libraries using int and what it implies for the future. I guess that only the applications needing huge arrays (e.g. Finite Elements Method programs for storing large sparse matrices) will have to be modified. I don’t see my buddy list going over 2^32 entries any time soon ;-)

    The funny thing is that not many libraries I know manipulating those huge arrays (e.g. PETSc) didn’t see the need to use a typedef to access more that 2^31 entries. I guess the community didn’t expect the "int" staying at 32 bits. At least, that’s my take on it.

    DrPizza: Of course, I wasn’t advocating to have 64 bits "int" to store pointers ;-)

  24. Ian, I know the email thread Raymond’s talking about. The customer HAD a /7GB switch in their boot.ini file.

    The question was how do you respond to that customer :)

  25. Remember that one thing we’re optimizing for is the manipulation of huge amounts of storage space. There are arrays out there with many many terabytes of information on-line. In order for the virtual block cache that NT uses to be effective, it has to have enough virtual memory handy to adequately address the files it’s trying to cache. This is not a physical memory thing, but rather has to do with the way the virtual block cache works: virtual memory is mapped to views of files, and as the physical memory available fluctuates, the memory manager pages data in and out as necessary. The cache competes with the rest of the processes in the system for access to page frames, and the natural working set balancing algorithms in mm make it all work perfectly.

    Virtual memory is already very tight when you think about using 512MB of *virtual* cache to map to large disks today. The memory manager has to start kicking pages out to disk even though it has physical memory to spare, because it is out of virtual memory to map the views.

    So, my guess is that the kernel folks realized that to map exabytes of data it’ll take terabytes of virtual address space, and optimized in that direction.

  26. Norman Diamond says:

    Regarding limitations on virtual address spaces: Remember that Windows 9x had trouble with page tables when physical memory exceeded sizes like 64MB or 512MB. One potential way to avoid trouble could have included limiting virtual address spaces to 512MB (not a complete solution, but better than no solution). This doesn’t mean that a fixed 8TB limit is a good choice though, it just means that an option should be available to set a limit.

    Regarding the size of int: In the spirit of old C, int would be 64 bits on 64-bit machines. In the spirit of new C, int is the same size as everyone else’s int. You might be allowed to notice that some of those everyone elses have different sizes of int from the rest of the everyone elses, but only for a moment and then you have to ignore them.

    Regarding the customer who was told by a consultant to have a /7GB switch: When talking to the customer, inform them that consultants are not exceptions to Sturgeon’s law. If that doesn’t settle it, then there’s only one thing to do. Despite my personality, I do remember that the customer is the customer. The old saw "The customer is always right" is somewhat exaggerated, but if the customer knows what they want then the customer is right about what they want. Either offer to sell them a working /7GB switch (if they want to pay the development expenses) or let them just keep it as is.

  27. RJ says:

    "how do you respond to that customer"

    MS Knowledge Base Article – 1048576

    Boot loader fails to validate boot.ini switches

    CAUSE

    The use of college sophomores as consultants.

    RESOLUTION

    A Hotfix is now available which will issue bug check 0x1337 when invalid switches are applied. We recommend that you wait for the next service pack that contains this hotfix.

    BACKGROUND

    The /7GB switch was depricated as of Windows 3.11. We recomend using the /InfiniteGB switch. Or for X-treme performance requirments use the /OnlyForBillAndNSA switch.

    Note: Those two switches will be disabled by this hotfix due to the fact that our agreement with the US government reserves those hidden resources for use by Carnavore.

  28. James Day says:

    How nice does Microsoft want to be to consultants who may be product champions?:) I suppose we all end up with completely wrong-headed ideas sometimes, though that consultant was pretty far gone.

    If viable I might try contacting the consultant, so the consultant can correct their own error once they understand it. That may still result in a former consultant for that company but it may at least stop the future misinformation spread from that source.

    THe simplest solution is to tell the customer and make the consultant a former consultant with a chip on their shoulder.

  29. asdf says:

    4 gigs of his physical memory will never be used (I’m not sure how the kernel does things but I think there is no way for the page lookup mechanism to map to physical address larger than 4gigs [I’d love to see an article about PAE here]). By allocating more than 2 gigs I assume he means a being able to allocate at least a 2 gig continuous block of virtual memory which you cannot do because there are holes in your address space (system dlls, those 64kb blocks, and the kernel address space).

    Also, isn’t it less than 7TB? 0x6FC00000000 = 7152 gigs.

  30. timchen says:

    On AMD64 it’s 0x80000000000. Why is it 0x6FC00000000 on IA64?

  31. DrPizza says:

    "Mapping a huge file in one go wastes page table entries, which obviously take up physical memory if they’re swapped in, and swap space if swapped out. Map enough of the file to work with at the moment, then switch the mappings to work with another part of the file. "

    Wasn’t the whole file mapping thing reworked in WinXP so that fewer resources were needed?

    In any case, the size of the page tables is going to be much less than the amount of RAM in a system dealing with such files, so it’s no big deal.

  32. mahdi says:

    @carl: int is compiler dependent, but via one of the C(++) standards defined to be 32 bits on 32bit machines and 16 bits on 16 bit machines. On 64 bit machines, only the long value is pumped up to 64 bits, on 128 bit machines long long would be 128 bits, etc. But this is not a general standard and e.g. holds not true for certain embedded platforms.

    In the Windows world 32 bits is a "longword", and a "word" is 16 bits.

  33. Mike Dimmick says:

    OK, you need one PTE per 4KB page (per 2MB if you use large mappings with the MEM_LARGE_PAGE switch on Windows Server 2003) on x64. x64 uses PAE-style 64-bit PTEs in long (64-bit) mode – it’s a four-level hierarchy. Mapping an 8TB file using 4KB pages would need 2Gpages and hence 2G PTEs. Each PTE is 8 bytes so you can get 512 of them onto a page and we need 4Mpages for the lowest level (level 3) page tables, or 16GB of address space. Add 32MB for level 2 and 64KB for level 1.

    x86 not running in PAE mode only uses three levels and has 1024 PTEs per page (32-bit PTE), so the relative cost is lower.

    Having said that, mapped files don’t use valid PTEs directly, PTEs for shared sections are prototype PTEs which point to real PTEs, so there’s at least one extra level of mapping there. I’ve never quite squared away what Windows does when you actually reference a prototype PTE – I think it explicitly loads the mapping into the TLB, leaving the prototype PTE in the actual page table.

  34. Ben Hutchings says:

    DrPizza wrote: "Current 64-bit processors do support a 63- or 64-bit virtual address space…"

    Maybe some of them do, but AMD64 doesn’t; currently it’s limited to 48-bit virtual addresses using 4 levels of page tables (as opposed to 2 on x86). Supporting 64-bit virtual addresses would presumably require 5 or 6 levels, which is bordering on the ridiculous. Every extra level adds complexity to the MMU and the OS’s memory manager.

  35. Ben Hutchings says:

    mahdi wrote: "int is compiler dependent, but via one of the C(++) standards defined to be 32 bits on 32bit machines and 16 bits on 16 bit machines. … But this is not a general standard and e.g. holds not true for certain embedded platforms."

    Yes, exactly, it’s not a standard at all. All the standard says is int has at least 16 significant bits and long has at least 32 (and at least as many as int).

    mahdi wrote: "In the Windows world 32 bits is a "longword", and a "word" is 16 bits."

    I believe those are DWORD (double word) and WORD and they are a holdover from the original implementation of Windows in 16-bit assembly where these terms were actually meaningful. They ought to have been deprecated in the transition to Win32.

  36. timchen says:

    "Wasn’t the whole file mapping thing reworked in WinXP so that fewer resources were needed?"

    I remember this was done at win2000 sp2. Prior to that doing buffered i/o on a file (mapped by cc) would build up the entire section at the very beginning, and every 4k requires 4 bytes in paged pool. Therefore the biggest file you can normally use on such system is around 180GB on NT4 (paged pool maximum at 192MB). Copying big file also fails because CopyFile uses buffered i/o.

    This was changed in sp2 that the allocation will occur only when actually being used.

  37. DrPizza says:

    "Maybe some of them do, but AMD64 doesn’t; currently it’s limited to 48-bit virtual addresses using 4 levels of page tables (as opposed to 2 on x86). Supporting 64-bit virtual addresses would presumably require 5 or 6 levels, which is bordering on the ridiculous. Every extra level adds complexity to the MMU and the OS’s memory manager."

    I don’t pay much attention to that abomination. x86 is crufty enough. We don’t need yet more hacks applied to it.

    Itanium2 is 64-bit virtual and 50-bit physical (and can do weird segmentation things on top of that). I think new POWER family processors are 64-bit virtual and 42-bit physical (and can do weird bank-switching segmentation things to take that up to about 80-bit). Alpha went up to, what, 53-bit? Fujitsu’s SPARC64 V line are 64-bit virtual, 43-bit physical.

    These families have fancier page tables (hashed PTEs in the case of Itanium and POWER for example) than x86 does, though, particularly Itanium, so perhaps x86’s crapness is prohibitive in its case.

  38. asdf says:

    0000000000000000 User-Mode User Space

    000006FC00000000 Kernel-Mode User Space

    1FFFFF0000000000 User Page Tables

    2000000000000000 Session Space

    3FFFFF0000000000 Session Space Page Tables

    E000000000000000-

    E000060000000000 System Space

    FFFFFF0000000000 Session Space Page Tables

    That’s where the 0x6FC00000000 comes from (lets hope it formats correctly when I submit this).

  39. Greg Page says:

    you may have seen this raymond….the 7 GB switch is either a typo or the work of an insane person. I suspect someone was munging the 3GB switch with PAE and had once heard a passing reference to the userva switch, popped them all in a blender and out popped the 7 GB switch.

    That, or it was a typo.

  40. darwou says:

    On 32-bit x86, you have 8 byte atomic compare and swap (CAS) instructions, or "cmpxchg".

    The ABA problem occurs when you have an lock-free algorithm that tests a value to make sure it is A and then uses cmpxchg to change the value to something else… but in the interval between the test and the cmpxchg, someone else changes A to B and then back to A again, hence "ABA".

    Interlocked SList is the Windows implementation of a lock-free singly linked-list. These lists are really fast because they do not have to take out a lock when things are added or removed from them. Hence they’re used in a lot of places in the OS.

    Interlocked SList solves the ABA problem by (basically) keeping a counter and bumping it up whenever an operation happens. This ensures that the second A in ABA is different from the first.

    On 32-bit x86, since you can cmpxchg 8-bytes at a time, everything is easy. You keep the pointer in the first 32-bits, and the other information in the other 32-bits, and life is grand.

    But on both AMD64 and Itanium, you run into a little problem: There is nothing bigger than a 8-byte atomic cmpxchg. No 16-byte atomic cmpxchg instruction exist in either Itanium or AMD64.

    Since pointers on those platforms are also 8-bytes, you have no room to store your extra information.

    The solution is to sacrifice some of the 8-bytes of address.

    Hence the limitation of the address space on AMD64 to 40 bits. The rest of the bits are used to store other information.

  41. DrPizza says:

    "The ABA problem occurs when you have an lock-free algorithm that tests a value to make sure it is A and then uses cmpxchg to change the value to something else… but in the interval between the test and the cmpxchg, someone else changes A to B and then back to A again, hence "ABA". "

    But most of the time you don’t actually need lock-free; "nearly" lock-free is "good enough". By saying that your locked pointers must be aligned on an 8-byte boundary (which is not unreasonable on a 64-bit platform; Itanium for example requires it anyway for its CAS instructions) you can use the bottom 3 bits as your count and a mechanism such as the one described here: http://www.hpl.hp.com/techreports/2004/HPL-2004-105.pdf . That algorithm backs off to spin locks if there’s more than N concurrent accesses, but since N is normally low that’s OK. The particularly troublesome situation (signals/interrupts) needs only to cope with 2. For greater contention spinlocks can be as good (or even better) than CAS.

    I believe that Intel’s x86-64 has a 16-byte CAS anyway, and Itanium’s cmp8xchg16 is probably good enough too.

    In any case, as long as one has pointer-width CAS one can fabricate arbitrarily large CAS.

    As such, removing such a huge chunk of address space seems really rather gratuitous and unnecessary. It’s a solution–of sorts–but it’s hardly a good solution. The ABA rationale seems rather weak.

  42. Raymond Chen says:

    Okay, so explain how the existing SList functions can be implemented using the technique in the paper. Remember, you can’t change the existing interface.

  43. DrPizza says:

    Am I allowed to force alignment of slist pointers to 4 bytes on 32-bit platforms and 8 bytes on 64-bit platforms?

  44. DrPizza says:

    And what do I win?

    Am I allowed to use techniques described elsewhere, or must I use the ones from that article?

  45. Raymond Chen says:

    You can use whatever techniques you want. (And you are allowed to impose aligment restrictions.) Don’t forget to implement QueryDepthSList. My point is that the technique in that paper isn’t the slam dunk you portray it as.

    Oh, and "nearly lock-free" isn’t good enough. Suppose you can handle up to N concurrent operations and the (N+1)th has to block. Say you have a single-processor machine. The first N operations get under way and a hardware interrupt comes in and needs to perform a (N+1)th operation. if you make the hardware interrupt wait for one of the other N to complete, your machine will hang. The other N are waiting for the hardware interrupt to complete, but the hardware interrupt is waiting for the other N.

  46. DrPizza says:

    What do the routines use the spinlock for if not to lock?

  47. Raymond Chen says:

    What spinlock? The SList functions are lock-free. That’s the whole point.

  48. DrPizza says:

    The spinlock in the APIs. e.g. ExInterlockedPopEntrySList, ExInterlockedPushEntrySList

  49. DrPizza says:

    Of course, I’m assuming the new kernel32 exports use those behind the scenes. I don’t have XP or 2K3 immediately available to hand, so perhaps they don’t.

  50. Raymond Chen says:

    Sorry, I was talking about the non-Ex versions. (Those Ex functions aren’t Win32.) The non-Ex functions don’t have a spinlock parameter.

    And no, the kernel32 exports don’t use the spinlocks. (They can’t anyway since spinlocks are kernel objects not user-mode objects.)

  51. Raymond Chen says:

    I wasn’t aware that kernel mode had a special version that took a spinlock. Strikes me as very odd.

    If the user-mode version secretly took a spinlock, then you would end up with all the slists in the process sharing the same spinlock, which would create lock contention – exactly the opposite goal of the slist functions.

  52.   As Evan already mentioned on his blog, Raymond Chen has a great series on /3GB switch on his blog. What is really cool is that Raymond takes on some myths about the /3GB switch and  the fact that he…

  53. darwou says:

    The spinlock parameter is for compatibility with older versions of Windows. The later ones ignore the spinlock parameter because they switched to a lock-free SList implementation.

    If you want a single driver binary that runs on 2K/XP/2K3, you should define _WIN2K_COMPAT_SLIST_USAGE. This will revert to the older version of the API.

  54. DrPizza says:

    "Sorry, I was talking about the non-Ex versions. (Those Ex functions aren’t Win32.) The non-Ex functions don’t have a spinlock parameter. "

    But equally, they’re allowed to let other threads run, because they’re user-mode, in which case nearly-no-lock is "good enough".

    If you’re talking about *Win32* you’re talking about *user-mode* in which case you can *always* be pre-empted, so your hardware interrupt situation *doesn’t matter*. One of the other threads will *eventually* get a cycle and unblock, allowing the N+1th to continue.

    If you’re talking about *kernel* functions, you can’t always be pre-empted, so your hardware interrupt situation *does* matter–but those functions take a spinlock (and so presumably *can* lock and wait) *anyway*.

    "And no, the kernel32 exports don’t use the spinlocks. (They can’t anyway since spinlocks are kernel objects not user-mode objects.) "

    No, but they could use a critical section behind the scenes, for much the same effect (as critical sections are just spinlocks which can escalate to events if they grow tired of spinning).

  55. DrPizza says:

    When did the SList go lock-free?

Comments are closed.