It’s the address space, stupid


Nowadays, computers have so much memory that running out of RAM is rarely the cause for an "out of memory" error.

Actually, let's try that again. For over a decade, hard drive have been so large (and cheap) that running out of swap space is rarely the cause for an "out of memory" error.

In user-mode, the term memory refers to virtual memory, not physical RAM chips. The amount of physical RAM doesn't affect how much memory a user-mode application can allocate; it's all about commit and swap space.¹ But swap space is disk space, and that is in the hundreds of gigabytes for hard drives. (Significantly less for SSDs, but even in that case, it's far more than 4GB.)

The limiting factor these days is address space.

Each thread's stack takes a megabyte, and if you're creating a lot of threads, that can add up to a lot of address space consumed just for stacks. And then you have to include the address space for the DLLs you've loaded (which quickly adds up). And then there's the address space for all the memory you allocated. (Even if you don't end up using it, it still occupies address space until you free it.)

Typically, when you get an ERROR_OUT_OF_MEMORY error, the problem isn't physical memory or virtual memory. It's address space.

This is one of the main benefits of moving to 64-bit computing. It's not that you actually are going to use or need all that memory. But it relieves pressure on the address space: The user-mode address space in 64-bit Windows is eight terabytes.

When the day comes that eight terabytes is not enough, we at least won't have to redesign the application model to expand the address space. The current x86-64 hardware has support for address spaces of up to 256TB, and the theoretical address space for a 64-bit processor is sixteen exabytes.

¹ Of course, physical RAM is a factor if the application is explicitly allocating physical memory, but that's the exception rather than the rule.

Exercise: Help this customer clear up their confusion: They reported that processes were failing to start with STATUS_DLL_INIT_FAILED (0xC0000142), and our diagnosis was that the desktop heap was exhausted. "The system has 8GB of RAM installed, and Task Manager reports that only 2GB of it is being used, so it is unlikely that I am running out of any kind of heap/memory."

Comments (78)
  1. Dan Bugglin says:

    I regularly see out of memory errors in IE8 when I have had its developer tools open even once… it just leaks the memory of every page as I navigate to the next one until it hits 1.5 or 1.6gb then it starts to die.

    Exercise: 2GB is the address space for a 32-bit process.  Customer will need a 64-bit version if they wish to use up the entire 8GB+ (including paging file) and crash their PC.  Or they can try and fix the actual problem, if that's how they roll.

  2. And still, with the giant physical memory, Microsoft gets all obscessed about <10 MB they can save by using paged code in the kernel drivers. "It reduces memory pressure" they say. While not caring a bit about preventing the file cache from pushing executable pages and data pages out. Try to copy a bunch of large files, and see your applications churning like crazy. I guess the tests they run in their ivory towers don't show that.

  3. Barbie says:

    @The MAZZTer, they're still at process startup… That's a hell of a process they're starting up, to already use 2GB!

  4. Ben says:

    In real life, for historical reasons to do with the specification of "malloc()", the most common cause of "out of memory" is a null pointer being returned from a function which normally returns a block of memory. This is in many cases interpreted as "out of memory" even if that is not the actual cause.

    For example GDI+ will throw an out of memory exception if you attempt to load certain malformed image files.

  5. The desktop heap is an legacy of "everybody can screw everybody" USER orgy. Windows needs to change that to "only top level windows have desktop scope" (with a CreateProcess flag to override that).

    [We already did that and more. Windows Store applications cannot access each others' windows at all. -Raymond]
  6. Gizen says:

    When will a windows version come out that can actually use the current hardware?

    Using virtual software limits is evil.

    Like limiting a car capable of speeds above 200 mph to barely only 30 mph.

    If a car manufacturer would try to pull that stunt people would get pissed.

    Why isn't windows users getting angry?

    Have microsoft already sheepified the majority of them?

    Customer question: Why can't i use all my sixteen exabytes address space? Since i cannot turn off metro ui i really need more address space. I have so little address space that the start menu isn't even working or showing up.

    In windows 8.1 the start menu shows up but is still broken and unusable.

    It boggles me that windows xp could be run on a 512 megabyte ram computer. Where did things go so wrong? Have microsoft a secret deal with hardware manufacturers other than those i know about?

    Perhaps NSA knows…

    I joke, satire but i really do wonder why windows isn't taking advantage of the hardware. People did really get upset when they heard of the 3.5 gigabyte ram issue on windows 32-bit. It went so far as computer manufacturers ordering microsoft to make windows lie about the used ram. Apparently educating the users was out of the question.

  7. Mark says:

    @Gizen: Mark Russinovich (yes, the Microsoft genius) explains once and for all why x86 (32-bit) versions of Windows cap memory at 3.5GB even though, mathematically, the addressable space should be 4GB. I read this article a few years ago, and I must have referred others to it a dozen times by now. One of the best reads for any IT pro or developer, aside from Raymond's blog, of course. :)

    blogs.technet.com/…/3092070.aspx

  8. Bruce says:

    @Gizen: complain to AMD, not Microsoft – current CPUs only support a 48-bit address space.

  9. floyd says:

    @The MAZZTer: Suggested heading for the Exercise: "It's the desktop heap, stupid".

    .f

  10. DWalker59 says:

    "Each thread's stack takes a megabyte".  Reminds me of when I wondered about the size of page tables back when I worked on mainframe computers running VM.  It is an interesting area of operating system design.  

    In Wintel-land, each thread's stack should be dynamically allocated out of virtual memory! :-)  Everything should be dynamically allocated out of the total address space, even the address space tables.

  11. Joshua says:

    Incidentally I've reached real out of memory before. Exhausted RAM + Page. Down it goes. Win XP Kernel doesn't like not being able to allocate RAM very much.

    > I joke, satire but i really do wonder why windows isn't taking advantage of the hardware. People did really get upset when they heard of the 3.5 gigabyte ram issue on windows 32-bit. It went so far as computer manufacturers ordering microsoft to make windows lie about the used ram. Apparently educating the users was out of the question.

    What's funny is Windows Server 2003 Enterprise Edition x86 doesn't have the limit. Microsoft could have released XP built from the 5.2 branch completely uncapped and did not. Which is why for Vista and up there are patches floating around that patch only a few bytes of kernel that remove the 4GB limit. Education of users was bankrupt when anybody looking at the specs for Windows Server 2003 could see they were lying.

    [And then your sound card driver corrupts memory due to a truncated DMA address and you blame Windows. (This problem doesn't exist on Server because servers don't have sound cards.) -Raymond]
  12. AsmGuru62 says:

    Fragmentation is also an issue.

    The app in Task Manager shows about 140Mb, but can't allocate continuous block of ~50Mb.

  13. Joshua,

    Beating a dead horse is a thankless job, but I need to remind you again and again that consumer grade drivers were never tested in those times to verify they support 64 bit physical address.

    These days it's a non-issue anymore. Everybody just runs x64.

    @DW:

    >In Wintel-land, each thread's stack should be dynamically allocated out of virtual memory!

    You remember that the virtual space for it has to be reserved for the life of the thread.

  14. Circle of support (life) says:

    @Bruce:

    Here is how it goes:

    You complain to microsoft. Microsoft blames the hardware. Tells you to go complain to them.

    Hardware makers blames microsoft for not providing the support and the market (economic incentive). Tells you to go complain to microsoft.

    They blame each other up the wazoo while you stand there with your unsolved problem and watch them fight and blame each other like a kid watching its parents fight.

    The result: You get scarred for life. Stops trusting companies and their "support". You become a bitter old man while your problems still exist and new ones are created.

  15. Fred says:

    > [And then your sound card driver corrupts memory due to a truncated DMA address and you blame Windows. (This problem doesn't exist on Server because servers don't have sound cards.) -Raymond]

    Isn't that technically a bug?

    The memory allocator should know its hardwares limits. Like how programs have different paths for if SSE is supported and checks at start.

    Servers might not have sound cards but isnt other areas affected by this 'truncated DMA address' bug instead?

    [The only interesting drivers that servers run are storage and networking. (They don't have a sound card, and they use the Plain VGA video driver.) And the person setting up the server is darn well going to ask the driver vendor, "So this driver is certified to run on servers, right?" -Raymond]
  16. Alois Kraus says:

    The more interesing question about Desktop heap is how to debug it? How can I find out which objecs are allocated on the desktop heap so I can check which process did deplete it?

  17. Joshua says:

    [And then your sound card driver corrupts memory due to a truncated DMA address and you blame Windows. (This problem doesn't exist on Server because servers don't have sound cards.) -Raymond]

    That's what the switch in boot.ini to not use the first 4GB of RAM is FOR. So this can be trivially tested for and know full well which driver screwed up. Oh wait. This was meant for a different target audience that can't understand.

    @Fred: The bug is in the driver. It thought that physical addresses could fit in pointer types (they can't) and the cast sheared off the top bits.

    ["I put 4GB of memory in my computer, and I get massive data corruption. Obviously, this is a driver bug, and the clear course of action is to edit some file called "shoes.innie" and type some magic beans, and now my computer has only 0.5GB of memory, so I can't actually use it for anything, but gosh darn it I'm isolating a driver bug people! On second though, I'll just buy a Mac." -Raymond]
  18. Myria says:

    @AsmGuru62: 32-bit builds of our product often go down trying to allocate 3 MB with 200 MB left.  >.<

  19. Silly says:

    Heh. Unrelated but I remember in ye olde days (<2003) when a mate used DWORD timestamps (millisecond resolution) as his time type. Addition of months to that realised the limitations pretty quick.

  20. j b says:

    I rarely (read: never) see anyone pointing out that the 32-bit x86 architecture provides _12_, not 4, gibigytes of virtual address space. Stack space, code space and data space are distinct address spaces, unless deliberately set up to overlap each other (fully or partially).

    It is not that a factor of three makes an ocean of differece (in particular when the OS does not support it…), but if you want a thorough understanding of the addressing mechanisms, you should understand that the theoretical limit is 12 rather than 4.

    [The linear address space is still 32 bits. Selector base addresses are 32-bit linear values. (See Figure 3-8 in the Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3A.) In other words, selectors just select a subset of the existing 32-bit linear address space. You don't get three page tables (one each for ss, cs, and ds). -Raymond]
  21. Myria says:

    Has Windows x64 gotten past the 8-TB address space limit?  I know that previously, the address space was limited to 44 bits because the extra bits were used to avoid the ABA problem with [Ex]InterlockedPopEntrySList when only cmpxchg8b was available on early AMD64s.  Now that Windows 8.1 x64 requires your CPU to support cmpxchg16b, does that mean Windows 8.1 will support more than 8 TB of address space?

    (44 bits = 8 TB for kernel, 8 TB for user)

    [Do you have an app that is bumping into the 8TB limit? Or is this a purely theoretical problem with no practical impact? -Raymond]
  22. Joshua says:

    [The only interesting drivers that servers run are storage and networking. (They don't have a sound card, and they use the Plain VGA video driver.) And the person setting up the server is darn well going to ask the driver vendor, "So this driver is certified to run on servers, right?" -Raymond]

    Did you /see/ the number of people installing Windows Server 2003 on their gaming PCs because it was benchmarking 20% higher framerate than XP on the same hardware?

    OK I'll shut up now. (And no I wasn't trying to get the last word or I'd reply to something more vital.)

  23. Myria says:

    [Do you have an app that is bumping into the 8TB limit? Or is this a purely theoretical problem with no practical impact? -Raymond]

    Nope, not at all! =)  I think most systems would run out of memory just trying to store the page table for 8 TB. =^_^=  This is just a theoretical question; reading Windows 8.1's system requirements this week reminded me of the significance of cmpxchg16b to Windows x64.

  24. Raphael says:

    > Did you /see/ the number of people installing Windows Server 2003 on their gaming PCs because it was benchmarking 20% higher framerate than XP on the same hardware?

    Really? People paid about a thousand dollar for a (dubious, I presume) 20% increase in frame rate?

    [I bet these people also buy a Formula 1 race car and then complain about the audio system. -Raymond]
  25. Joshua says:

    > Really? People paid about a thousand dollar for a (dubious, I presume) 20% increase in frame rate?

    It's about $300 (remember there's no terminal server and no CALs needed).

    Assuming of course they didn't pirate it or use a spare MSDN license from work (we've got 80 (yes 80!) licenses for 2003 and about 4 in use).

  26. > It's about $300 (remember there's no terminal server and no CALs needed).

    The whole argument is an empty rant. $300 will buy you a lot of performance (for example, in a fas SSD or a liquid refrigeration system for overclocking) if you do your homework.

    Anyway, any user capable of configuring NT 5.2 for gaming (and sorting around the incompatibilities) should have little trouble troubleshooting driver issues. On the other hand, most Word-and-Facebook users running the vanilla flavor of NT 5.1 do not have a clue at all about what to do when a blue screen shows. Windows already gets enough bad press from buggy drivers – no need for Microsoft to worsen that!

  27. Goran Mitrovic says:

    Don't forget that every thread takes another 256kb when running on WOW64.

  28. @Myria:

    The preview version of 8.1 requires a processor with cmpxchg16b instruction. So if it isn't supported in 8.1, they are paving the way for future releases.

  29. Pippin says:

    >>>On second though, I'll just buy a Mac." -Raymond

    Plenty of people express that sentiment, somehow people don't express it the other way round. Perhaps embracing broken drivers, apps and such has contributed to the perception that Windows is such a clusterf****. OTOH, microsoft could improve the i/o design too like adding a scheduler that works at the DPC level to prevent errant driver code from locking up a CPU core. I've seen drivers (couugh.. StarForce..  Microsoft Certified) do the DPC re-queue trick and screw up the user experience. Microsofts argument seems to be that the user needs to be shielded from a bad code ruining their windows experience, except in the case of DRM.. then everything is allowed.

    [I'm trying and failing to find a point to this rant. I hope you feel better for having written it. (I also hope that this satisfies your rant quota. Then we can go back to talking about address space exhaustion.) -Raymond]
  30. Myria says:

    @Goran: True, but you probably ought to be using /LARGEADDRESSAWARE in your application already.  The 256K WOW64 stack cost is far outweighed by the extra 2 GB available from running under Win64 as a 32-bit program.

  31. Pippin says:

    >I'm trying and failing to find a point to this rant. – Raymond

    I've been reading and enjoying your blog over the years and I've seen you make the point that X needed to be done or Y would disrupt the users experience and the user would blame Microsoft because when anything goes wrong they blame the OS. My point was this is argument is applied selectively by other people inside MS as several 'microsoft approved' products clearly disrupt the users experience (the example I gave was a widely despised buggy DRM product). Ok.. back to address space exhaustion ! Whats up with ipv6?? :P

    >Really? People paid about a thousand dollar for a (dubious, I presume) 20% increase in frame rate?

    That happened a lot during the release of Vista. People thought hey.. Vista sucks so let me install the same OS with a different name and use the same drivers and see what happens. You saw some UI improvements because desktop rendering via 'classic' shell interface was faster than offloading things on the GPU. But AFAIK there was no documented difference worth mentioning in framerates in games.

    [Glass houses are great places to throw stones. You're ranting to the wrong person. -Raymond]
  32. Neil says:

    Because Novell couldn't even trust disk drivers for ISA cards, Netware would limit you to 16MB unless you had an EISA or PCI machine or used some arcane steps to get your disk driver to recognise all of your memory.

    @Gizen Not to 30mph, but they do get limited to 159mph for some reason.

  33. j b says:

    Raymond> The linear address space is still 32 bits. Selector base addresses are 32-bit linear values.

    At least  some (although I am not in a position to claim 'all') processors in the x85 family, provide externally available signals telling whether the memory access is a code, stack or data reference. In principle, you could set up a machine with three different physical 4GB spaces. I guess that you would have to turn of all paging, though: Even though both the virtual address level and the physical level is aware of the stack/data/code distinction, the paging hardware wouldn't know that (code) page X and (data) page X are different pages.

    Anyhow: The linear address is completely invisible to the software; it is part of the translation process down to a physical address. Whether those 12 GB is hashed down to your 512 Mbyte of RAM in a single tranlation step or it first goes to a 4 GB "linear space" and then further on to 512 MB makes no difference at all to the software. Besides, just like an OS may map physical pages in and out of RAM space, it can map virtual segments in and out of the linear address space; access to a non-present segment causes an exception very similar to a page fault. So even on standard PC hardware, with no phyiscal address bank trickery, an OS could provide 12 GB of adressable *virtual* space to a process.

    There would be some minor limitations, like all segments being present (in linear space) *simultaneously* could not exceed a combined size greater than 4 GB; the OS would have to "page out" ("seg out"??) some segments from linear space to make room for others, and might have to do compaction (which would be a fast operation – the number of segment descriptors is small). I would have to work hard to create a real-world application example where this would be any sort of limitation. Besides, you see lots of such limitations (e.g. you can't set up a 4GB DMA buffer) of similar non-importance.

    I am not claiming that segment swapping in an out of linear space would be any sort of ideal situation. I am mostly using this argument to make sure that people fully understand the x86 segmenting mechanism, before I go on to object oriented systems and hardware handles. 386 memory management was to a large degree a spinoff of the 432 project, implementing object oriented concepts fully in hardware. Going from x86 to a 432-imspired generalizationn (i.e. to arbitrarily small objects, not large segments, a gate-like mechanism to control method activation etc.) turns out to be a very fruitful path to make people understand how handle-based systems such as JVM and .net can be imlemented. If you are still programming in flatland: Sure, going to 64 bits is a far simpler and cleaner solution.

    [You can't provide 12GB (4GB each for CS, DS, SS) because for example a "push ds:[eax]" instruction requires all three segments to be present at the same time. (Actually, even if the instruction doesn't access the selector, it still needs to be present as long as it is loaded into a selector register.) There's really no point going on about these dark corners of the x86 because they are not practical in any way. And selectors are hardly intended for hardware object orientation, since you would be limited to at most 16382 objects! -Raymond]
  34. @j b:

    I'll tell you even more. You can access many many gigabytes in a 32 bit process. There are at least two existing mechanisms in Win32 API to do that. I'll leave it up to you to find out.

    It will also be more efficient than the hypothetical segment shuffling behing your back. By the way, FS: is reserved in Win32; you must not modify it.

  35. j b says:

    @alegr1:

    Sure, you've got various kinds of windowing mechanisms under application control. When you say "API", you most likely imply explicit actions to make >12 GB stack+data+code.

    Re. "more efficient that the hypthetical segment shuffling": It was never my intention to suggest a "more efficient" method, but to understand how the x86 segment mechanisms relate to OO addressing.

  36. yuhong2 says:

    "Microsoft could have released XP built from the 5.2 branch completely uncapped and did not."

    Or even better, backport the fixes from the 5.2 branch related to PAE to the 5.1 branch.

  37. yuhong2 says:

    "It's about $300 (remember there's no terminal server and no CALs needed)."

    I think only the Enterprise Edition is licensed to support >4GB. Standard Edition is pretty much the same as client Windows.

  38. yuhong2 says:

    ["I put 4GB of memory in my computer, and I get massive data corruption. Obviously, this is a driver bug, and the clear course of action is to edit some file called "shoes.innie" and type some magic beans, and now my computer has only 0.5GB of memory, so I can't actually use it for anything, but gosh darn it I'm isolating a driver bug people! On second though, I'll just buy a Mac." -Raymond]

    Agreed, the right thing would have been to set /MAXMEM:4096 by default and allow /MAXMEM:NONE to be inserted in boot.ini to lift the limit.

  39. yuhong2 says:

    BTW, on the matter of XP based on the 5.2 branch, why does "Windows XP Professional x64 Edition" follow the normal XP not Server 2003 support lifecycle?

  40. Goran Mitrovic says:

    @Myria: Depends. In my specific case, a third party crashed when LAA was used. :(

  41. yuhong2 says:

    BTW, what is also frustrating is that there is no 32-bit server version of Win7 or later so the only way to enable >4GB PAE is to binary patch the kernel, especially as Win8 pretty much require the PAE kernel.

  42. @jb:

    Now, how it would work if you need different physical addresses for the same offset in CS: and the other segment? Assuming you say FS and GS can cover all 4GB of separate space.

  43. Also, how about different threads requiring conflicting mapping to emulate 12GB stuff?

  44. 640k says:

    [Do you have an app that is bumping into the 8TB limit? Or is this a purely theoretical problem with no practical impact? -Raymond]

    8TB is enough for everyone.™

  45. @Gizen: Many modern cars do have speed limiters that kick in at around 155MPH, regardless of how fast they could theoretically go. Customers don't tend to complain, since the equally artificial speed limits applied on roads are usually lower still.

    @Yuhong Bao: They can't "backport the fixes from 5.2", because there weren't any fixes there. It's merely an assumption that customers running a Server OS would be doing so on certified hardware with certified drivers, thus avoiding the issue of drivers with truncation bugs, something which rarely applied to home users.

  46. yuhong2 says:

    @AndyCadley: I am sure there probably are at least some fixes from the 5.2 branch related to PAE that were not applied to the 5.1 branch since it doesn't use PAE for anything other than NX, but you are right that is not the main problem.

  47. j b says:

    Raymond> You can't provide 12GB (4GB each for CS, DS, SS) […]

    Sure, if you limit yourself to flatland programming, setting up the segment registers once and for all, with one (or three) 4GB homogenous segment(s), and after that forgetting all about selectors. You identify objects by their (starting) _location_ given by the offset value, used by the application code, the object extends to somewhere out in the fog, and you must go to a different place to tell what access code has to each object. Sure, that is the way we use x86 MMS today.

    I happened to grow up with a machine with a not too different MMS. However, established software design style was to use as many segments as you had protection domains. Say, a memory mapped read-only file was a RO _segment_, not a set of flatland RO pages. E.g. a DBMS would offer to the application, as one or more segments, those buffers and tables the application successfully had opened. Once the database was closed, the segments disappeared from the application address space. In those days, 4GB of physical RAM was a future dream, but having a set of data segments that combined fit into and filled 4GB of virtual space (and similar for code and stack) was fully possible. (And it is on the 386 MMS as well.)

    This was not an OO machine, but used (as 386 OSes could have) segment descriptors as "capabilities" (a.k.a. handles, object indexes,…) to actively  provide access control through adressability: If you cannot identify a data structure because you do not have the capability for it, there is no way you can corrupt it. You don't NEED protect bits on page level for data you have no way of addressing. In true capability-style OO addressing, the application carries the capability (/segment, object index, …) to represent the _object_ itself, not its location. The offset part is (/may be) used in specific methods/functions to address within the object; it in no way identifies the object, the way the offset does in flatland. In flatland adressing, the compiler generates code to enter an object's location (i.e. offset) into an address register before access, in OO adressing, the compiler generates code to enter an object's index into a segment register before access. Both are fully available on the x86.

    [You're just restating the x86 selector model. You claim this allows you to create 12GB of simultaneously addressable memory, which is the part I don't understand. (Selectors specify their location and extent as 32-bit values inside a 32-bit linear address space.) But at this point I don't care either. -Raymond]
  48. j b says:

    Raymond> you would be limited to at most 16382 objects!

    (That is of course 48 Ki objects, for data, stack and code.) In the days of capability based architectures, like the 432, several studies of the size of the "object working set" was made, and none of them got even close to 16 Ki as typical, or even maximum. The results were surprisingly _low_: In very few contexts did a given software module actually address more than a few dozen nameable objects, the number of actually visible nameable objects (regardless of whether code addressed them or not) rarely exeeded a few hundred.

    This obviously depends a lot on how application data is structured: FORTRAN oriented programmers will declare two dozen single variables with a common name prefix to represent one object; Java and C# people do it differently. A compiler may treat an entire stack frame as one struct object, or it may declare each local variable a distinct objects. Also: The 16 Ki(*3) limit applies to _each process_, both in x86 and any other capability machine I have ever seen; it is certainly not a limit on the system as a whole. If process handling can be made so lightweight that it can be compared to thread handling, the risk of filling up the capability table is not very high. Obviously, if you were to create an OO machine today, you would design for more, but used properly, the 386 limit of 48Ki per process would probably be about as sufficient today as 32 bit flatland address space: Most, although not all, problems can be fit in.

    On good days, I am itching to suggest to some master student a thesis work of either making an adaptation of the mono software or the JVM to a standalone (no Windows or other OS support/limitiation) x86 processor, mapping object handles to segment selectors the way the hardware actually permits you to. Certainly: This is a task for a way-above-the-middle student, but I have been impressed earlier. And I would certainly like to see it demonstrated, that it can be done. I know it can, I just don¨t have the time to do it myself :-(

  49. Harry Johnston says:

    @Yuhong Bao: I suspect that Windows Server 2003 got an extra year-and-a-bit of support because Windows Server 2003 R2 is based on it.  You'll notice the end-of-life is the same for 2003 and for 2003 R2.

  50. Myria says:

    Patching the XP32 kernel to support more than 4 GB of RAM is also legally a bad idea.  Enforcement of the 4 GB limit is actually controlled by the licensing code.  Thanks to bad laws, that's illegal, since you're disabling a copy protection feature by their definition.

  51. @j b:

    Trying to jump through the hoops to simulate separate segment spaces doesn't make any sense. Because there are simpler means to use more memory. Use memory mapped files, or (God forbid) Address WIndowing Extensions, or split your process to separate processes. Ultimately, nobody cares now, because we all have x64.

    Maybe, as a professor, you like to come with impractical assignments. Unfortunately, such assignments don't make better programmers.

  52. ErikF says:

    @jb: You seem to forget that memory can be protected at the page level with PTEs, and most operating systems nowadays implement the NX bit at least in software. Other than making coding more difficult by having to ensure that you have the right segment (overlay) loaded, I don't see any advantage to segmentation in modern computers that have caching and paging available; I remember overlays quite well and have no wish to go back to that time. Besides, swapping huge segments in and out has to be a performance hit!

  53. @j b:

    Raymond> you would be limited to at most 16382 objects!

    >(That is of course 48 Ki objects, for data, stack and code.)

    Nope. 8192 in a single Local Segment Table, and 8191 in a single Global Segment Table. Still 16383 objects.

  54. j b says:

    Raymond> You claim this allows you to create 12GB of simultaneously addressable memory,

    Is there any doubt that a process can specify addresses in three logically distinct spaces: Stack, data, code? Yes.

    Is there any doubt that each of these three spaces are of size 4 Gi, using 32 bit addresses? Yes.

    Does three distinct 4 Gi address spaces add up to 12 Gi virtual addresses, at the application level? Yes.

    Can selected ranges of these 3*4Gi addresses be mapped down to a linear address space of 4 Gi? Yes; that is what we got the segment descriptiors for. If the "presnet" bit is set, then that entire segment is mapped to linear space, if reset, the segment is (currently) not in linear space.

    The summed sizes of mapped segment at any moment in time cannot exceed 4 Gi, just like the summed sizes of virtual pages mapped to physical RAM cannot exceed the size of physical RAM.

    No, you cannot set up three 4 Gi segments and have them all mapped simultaneously to linear space. But you can have a dozen segments of, say, 50 MiB, 200 MiB, 100 MiB mapped to linear space. A little later, the selection of segments mapped to linear space may be a different one. The sum of segments currently present may neever exceed 4 Gi (just like the sum of pages present cannot exceed the RAM size), but they may be taken from a virtual address space of 3*4 Gi.

    In a paged system, when a process addresses a non-present page, the interrupt will activate an OS routine to identfy the desired page and map it into phyiscal RAM, possibly after having selected a victim to be removed, to make space. In the segment system, when a process address a non-present segment (in its 3*4 Gi virtual address space), the iterrupt will active an OS routine to identify the desired segment and map it into linerar space, possibly after having selected a victim to be reomoved, possibly after compacting memory. ALL the segments, mapped and unmapped, are just as simultanously addressable as all the virtual pages in a paged memory. Assuming, of course, that the OS provides mechanisms for handling segment not present interrupts from x86 hardware.

    In the 70s and 80s, several segment-based mainframes with no paging hardware did their memory administration this way, except that "linear space" was "physical memory space". THe summed size of the segments of a process most definitely could exceed the size of physical linear meomry space, but one single segment (or sum of segments) could not (just like the sum of active stack, data and code segments cannot exeed linear space).

    Oh well. Most programmers seem not to care about the segmenting hardware; they don't _want_ to understand it. Flatland programming is the PC approach (eeeh… not 'personal computer'…). A stray C pointer sort of has the right to corrupt any memory structure anywhere in (flat) virtual space. It is a tradeoff: Maybe we save a millisecond execution time – at the expense of a week of bug hunting time. Hooray!

    {But your description does not explain how you can have CS, DS, and SS all refer to different 4GB segments and avoid mapping conflicts. For each of the 4GB segments, the base address must be 0 and the segment size must be 4GB. (There is no other way to fit a 4GB segment into a 4GB address space.) Suppose ESP=0xFFFFF000 and EAX=0xFFFFF000, and somebody performs a "push dword ptr [eax]" that happens to be at address EIP=0xFFFFF000. You have to decide which memory is mapped into linear address 0xFFFFF000, and you can have only one at a time. Just tell me what the base address and size is for each of the selectors, and which pages are mapped. (The only thing I can think of is you are pulling some sneaky base address wraparound trick.) That said, this memory model is what 16-bit Windows used, and everybody hated it. -Raymond]
  55. Myria says:

    <overly_pedantic> @alegr1: The null selector is reserved, and one of the GDT selectors needs to point to the LDT.  8190 + 8191 = 16381. </overly_pedantic>

  56. ender says:

    > [And then your sound card driver corrupts memory due to a truncated DMA address and you blame Windows. (This problem doesn't exist on Server because servers don't have sound cards.) -Raymond]

    I've had problems with a certain very popular (at the time) sound card on XP x64 when I enabled memory remapping in BIOS, so I could use all of my 4GB RAM (sound would only come from the front speakers, and IIRC it was at low volume). Disable remapping (and thus only "see" 3200MB RAM), and the card worked fine.

    > I think only the Enterprise Edition is licensed to support >4GB. Standard Edition is pretty much the same as client Windows.

    32-bit Server 2003 Standard (and 2008 Standard) allow you to address the full 4GB RAM (and IIRC, XP did as well with /PAE until SP1). However, even that is enough to bring out bugs in drivers (I tried to use a spare 2003 license from work in /PAE mode when I upgraded my home machine to 4GB at first – the weirdest effect was that the screen would only update approximately once per minute. At first I thought the system was frozen since it was showing the starting up dialog but neither the swirl nor mouse pointer were moving, then the screen went black and after a few seconds the logon dialog appeared; I pressed Ctrl+Alt+Del, and typed my username and password – and noticed that the C+A+D dialog was still shown – then the screen went black again, and after a few seconds frozen desktop appeared).

  57. j b says:

    @alegr1,

    My goal is NOT a Win32 providing 12 Gi virtual space to an application, but to make people understand address mapping without restricting their mind to how aspecific OS uses the mechanisms. I want to point out the direct correspondence between an OO handle, a Win16 handle, an x86 segment selector – that essentially, the same operationss are done in all cases.

    In JVM, everything is software. Could overheadbe reduced by performing some operations in hardware? Which operations? In Win16, the application manually called GlobalLock data handles, to map the object into linear RAN space before use, GlobalUnlock afterwards, maintainig the mapped address in a separate variable. If you forgot to Unlock, linear space would fill up. (For code objects, the runtime system trapped function calls/returns, doing similar Lock/Unlock mapping implicitly.) If you now add 386 segmenting hardware, GlobalLock calls are superflous; saving code, avoiding programming errors. With hardware managing the translated address, the application is freed of the management task, the address is protected against corruption. Hardware IS useful for this kind of mapping.

    In 8086 Win16, every application had dozens of data objects / handles, GlobalLocking and Unlocking all the time. 386 segmenting, could have made GlobalLock an empty call – but it wasn't carried through fully. Rather, with Win32 the whole world was made one huge object, which, of course made the C pointer arithmetic fans jump with joy. We abandoned segments altogether. (A single 4 Gi segment is segmented addressing the way an election with a single party on ballot is free.)

    386 segmenting is a spin-off of an OO machine architecture, most definitely providing a high number of objects. It appeared when (Windows) applications handled numerous handles/objects, mapping them in and out of linear address space. Around 1990, students easily saw the parallells between handles/selectors and objects/segnebts, GlobalLock and hardware segment mapping. Java was on the rise with its handles. Explaining the mechanisms to students was trivial. When Windows abandoned segmenting, returning to flatland, people seemed to abandon all understandig of how object mapping works. One consequence: It is difficult explain to students how a Java or .net program can fit into a 32 bit address space, yet handle far more than 4Gi objects: The program manages handles only. The "managed" data object itself is only visible through the handle, never directly. So the mapping from handle to object must be understood. 386 MMS shows one way of doing it – once we manage to fence off those arguing "But the segment S 4 Gi large, so it won't work!" etc. In the end, it turns out as easier to make _students_ understand than to make seasoned C pointer arithmetic fans understand… :-)

    (Oh, by the way: Thanks for pointing out that code, data and stack share GDT/LDT. I was a little too fast on that one!)

  58. j b says:

    ErikF> Other than making coding more difficult by having to ensure that you have the right segment (overlay) loaded,

    Are you having similar problems in ensuring that the right page is loaded into RAM? That is exactly the same problem. And, surprise, surprise: The solution is the same for both problems!

  59. j b says:

    Raymond> Suppose ESP=0xFFFFF000 and EAX=0xFFFFF000, and somebody performs a "push dword ptr [eax]" that happens to be at address EIP=0xFFFFF000.

    It is sort of difficult to relate to a single, out-of-context machine instruction with no information about what is the desired effect of the instruction. If the compiler chooses to use selctors to identify objects, rather than offsets to identify locations, the generated code would obviously be very different. You can imagine lots of instructions that would be meanigless in that model. That is not specific to using selectors – even in flatland code you can create semantically meaningless instructions.

    If you tell me what you want to do, at a semantic level, I could try to act as a human compiler and suggest a possible instruction sequece to to it. I haven't been assembler coding for years, so I would probably generate a lot of syntactical errors; I would do much better with informal pseudocode indicating how segment registers and offsets would be used. But my explanation would NOT be assuming that there is a single 4 Gi data segment; but – as I have stressed several times – assume a number of smaller segments – as many segments as the problem solution has objects.

    [You claimed to be be able to address 4GB code, 4GB data and 4GB stack simultaneously, so I put it to the test. I created a 4GB stack, 4GB of data, and have 4GB of code. The stack pointer is 0xFFFFF000 (we've pushed around 4KB of data so far). The data is a giant array of 32-bit integers, and I'm about to pass the 0x3FFFFC00'th one to a helper function. The code that is doing this happens to be near the end of the 4GB code block, because the first 3.99GB of code consists of a tight loop that has been unrolled a billion times. (But now you're denying that this is a possibility. "Assume a number of small segments." Well, okay, but if I assume a number of small segments, then I am not actually addressing 12GB of data simultaneously, which was your original claim.) -Raymond]
  60. @jb:

    The x86 family of CPUs are not Harvard Architecture processors. That's where your whole assertion of separate memory spaces for DS, CS, and SS falls apart. If you reference the original Pentium datasheet (download.intel.com/…/24199710.pdf) you do notice a D/C# control signal but it was not always available, notably in dual-CPU systems.

    In addition, I am not aware of any north-bridge that supported multiple 4-GB memory spaces. Even if it did, the only distinction it could make is data vs code. So immediately your 4GB stack can never be separated.

    Just because you can dream up a method of extending the 32-bit limit does not make it real, nor that Windows should support that.

  61. @Myria:

    [<overly_pedantic> @alegr1: The null selector is reserved, and one of the GDT selectors needs to point to the LDT.  8190 + 8191 = 16381. </overly_pedantic>]

    <extra_pedantic> The null selector is valid in LDT</extra_pedantic>

  62. j b says:

    @Brian EE,

    I'll more or less repeat the same answer that I have given a few times already:

    The goal is NOT to build a 32-bit machine to realize 12 Gi of virtual space, but to understand the MMS architecture so well that you understand how it could have been possible. Whether there was an 80386 with external signals to distinguish code, stack, data is certainly not an architectural property, but a pure implementation choice. If the signal most likely wouldn't be used, it would be a waste assigning it to a pin, even if it could have been.

    No, I am NOT asking for Windows support for 12 Gi virtual space. I am just aaying that it IS there, at the virtual level, in the architecture. As I have argued in length above, it could be realized by segmenting and an interrupt handler for 'segment not present'. Again: I am NOT asking Windows-developers to implement that. But if you have difficulties understanding how it COULD be done, given the segment mapping hardware, then I am quite sure that you have similar problems undertstanding how an OO runtime environment, addressing object by handles (such as the JVM) could make use of hardware to aid the memory management task. There may be details of 386 MMS that are not perfect for JVM or some similar machine, but if you don't understand the 386 mechanisms fully, you are not in a position to tell what should be done differently. After all, 386 MMS came out as a "poor man's OO management", a small subset (with some extensions) of the 432 OO CPU.

    I must admit that I am surprised how readily architectural concepts are pushed aside as of no interest just because some signal line was omitted from an implementation, or some intermediate step in the mapping process imposes some max size on active segments, or whatever. As if people will be using any excuse, no matter how small, to turn down the segment mechanisms, and therefore refuse to relate it to OO concepts. I'll just have to accept it…

    [If you can't create 12 GB of virtual space, then don't open by saying "You can create 12 GB of virtual space." Because everybody will start poking holes in your claim that you can create 12 GB of virtual space, and then when you come out and say "I'm not saying you can do it" then everybody will just give up and walk away. I don't see why you're bothering explaining the segmented memory model to everybody here. You can assume we know how it works. -Raymond]
  63. Let's sum it up. Windows used to use selectors/segments. It was extremely awkward. Even though the segments were required not to overlap in the linear addresses, there was a big overhead to manage them.

    Trying to implement a model where the different segments occupy overlapping linear addresses (while expecting their contents different), thus allowing 12 GB of effective address space is an exercise un futility, especially in multithreaded environment.

    We're engineers. We need to apply our best judgement to decide what can be done, and what should be done, and what should not be done, for that way lies madness.

    If you're not an engineer, it's understandable that you want to explore this path you think is possible. There are many other considerations that make it unfeasible, though.

  64. j b says:

    @alegr1,

    It seems quite obvious that you judge segmented address apace as it appears WITHOUT supporting hardware.

    I judge segmented address apace as it appears WITH supporting hardware (the MMS that came with the 386).

    On that backgorund, it is no big surprise that we come to different conclusions about the usability of segmenting.

  65. dave says:

    >(And again, you aren't really inventing anything new. Segmented memory has been around since the 80286.)

    And a good deal prior to that, too.

    Apart from all of the wrangling about whether or not 12GB is or is not simultaneously accessible (and the stack might well be the clincher on that argument), I assume "j b"'s real subject is the current lack of interest in capability systems, and the relative stasis of general-purpose OS design.

    Still, I'd blame Unix for all that, not Windows  ;-)

    Part of the design issue here seems to be the aspirations of the average modern OS to run on a wide range of hardware. Two modes, flat address space seems to be the limit of the consensus.

  66. j b says:

    Raymond> If the segments are not 4GB each, then how do you get to 12 GB with three selectors?

    Eeeeh… Where should I start?

    To compile for a flat architecture, you assign locations to objects. The location is baked into the code. To access the object, instructions load the (flatland) address (baked into the code) into some address register, presented on the address bus. When another object is later accessed, the location of this object is loaded into an address register. Every access could potentially cause a page fault, causing other pages to be thrown out of RAM. If an object member is referenced, user level instructions may have to calculate the sum of the object address and the offset.

    To compile for a segment architecture, you assign selectors to objects (one to each object). The selector is baked into the code. To accss the object at runtime, instructions load the selector into a segment register. User code is not concerned about the (base) address of the object; that is managed by the OS segment administration. If a member is addressed, offset within the object is specified in the instruction. If another object is later addressed, its selector is loaded into a segment register. Either access may cause a segment not present fault, and could cause other segments to be thrown out of linear space.

    If I make an array of objects,

    HalfGiObj[] BigArray = new HalfGiObjt[7];  // each array element fills 500 Mi

    foreach (HalfGiObj elt in BigArray) { elt.membervalue = 42; }

    the loop addresses 3.5 Gi data apace (other values need the last half Gi!).

    The program contains many huge functions, adding up to 4 Gi code, in forty segments of average size 100 Mi. The CALL instruction specifies selector and offset; this could cause a Segment not present interrupt. The handler may have to remove (un-Present) one one or more of the seven HafGiObject segments to make room fo the code segment. If the program calls

      FnInCodeSeg01();

      FnInCodeSeg02():

      …

      FnInCOdeSeg40();

    it may have caused all the 4 Gi of code to be mapped into linear space, and mostly unmapped (in competition with other data and code segments.

    Each thread requires a stack segment. The thred switching function sets SS to the stack segment associated with the thread. Once the thread starts executing, it may cause a Segment not present, and the handler will map the requested stack segment into linear space – again: un-Present'ing segments that must yield to make room. Just like in paging. If you have sufficiently mamy threads, the sum of the stack sizes could approach 4 Gi.

    Are you saying that this example does NOT show that 12 Gi can be addressed directly by the application? What is missing? How would it appear differently if you really had 12 Gi adressable?

    I never said anything about "three selectors" – I said the exact opposite several times. But you are right: If you insist on three selectors only, three 4 Gi segments, then you cannot map 12 Gi to 32bit linear space. In my example you can.

    [(Please stop using strange abbreviations like Gi and Mi and Objt. It makes your writing hard to read.) Yes, this is how segmented addressing works. But notice that you are not accessing 12GB of data simultaneously. You are accessing at most three objects simultaneously (code, data, stack). Each one is a half gigabyte in size, so you have only 1.5GB addressable at any particular moment in time. I don't know why you are denying that you ever mentioned "three selectors"; the number 3 is how you arrived at the meaningless 12 GB value in the first place. Without that magic number 3, your precious number 12 disappears. But the number 12 is also arbitrary. I can create 200 objects, each 500MB in size, and have a loop that goes through all 100 of them, and ta da, by your argument, I have accessed 100 GB of data. All you did was describe segment swapping, which we already know is a way to extend address space (by putting additional address information in the selector). At the end of the day, we're all saying the same thing. It's just that you claim to be making 12GB of address space available "simultaneously" when in fact it's happening sequentially. -Raymond]
  67. @j b:

    OK, another nail (or stake) into this monster of a design. This selector-shuffling approach (which only allows 4 GB of code+data accessible at any given moment, anyway) is no better than using MapViewOfFile without bothering with segments. Because you have to do that anyway (or the runtime will).

    When you finally step your foot in the real world outside of your ivory tower, and write your first real complex application, not a student assignment, you'll see that "doable" and "making sense" are not always the same thing.

  68. j b says:

    Raymond> You claimed to be be able to address 4GB code, 4GB data and 4GB stack simultaneously, so I put it to the test.

    Hey, do you STILL insist on 4 Gi segment? OK, I once more give you right: If you insist on 4 Gi segments, then you can't make it work.

    Of course you never issue twelve billion memory addresses to the mapping mechanism at the same moment. If you could issue even four billon page addresses to the paging system at exactly the same moment, that would give you some problems, too. (especcially if your machine has less than 4 Gi of RAM).

    Obviously: Having a 12 Gi virtual address space means that you can reference *an arbitrary adress* in this address space and the mapping mechanisms will ensure that the proper location is referenced, if necessary after bringing the object into the space where it is addressed.

    Sure, you have managed to create one highly artificial situation where you don't use segmentation for the purposes of segmentation in a productive way, but only as a tool for rejecting the use of the segmentation mechanisms. You're welcome.

    I could set up some REALISTIC example with a couple dozen segments of a total size up to 12 Gi, all directly addressable by the application, and describe how the accesses makes the segments be swapped in and out of linear space just like virtual pages, just to show that it IS possible. Even if you have a hypothetical, unrealistic setup that won't work, mine would (obviously assuming that there was a handler for the Segment not present interruupt, doing the neecessary administration, which as far as I know Windows does not provide.) No, I am NOT requsting it from Windows – I am talking about 386 segmenting hardware, not about any specific OS.

    Special cases that won't work within an architecture is nothing new. Another example: The VAX 780 had an instruction for calulating a polynomial up to degree 256. Each of the 256 coefficients could be located another memory page, in addition to the instruction itself which might cross page borders. The VAX could not handle page faults mid-instruction; the prefetch would have to page in all operands and the entire instruction before the execution unit started up. The worst case for this innstruction required 260 pages to be in memory at the same time. At the time of the VAX, not every installation had enough RAM to leave 260 physical RAM pages to a user process, so the worst case instruction would not be executable.

    You could deliberately construct code to prove that with a given size RAM, you cannot run an arbitrary polynomial instruction. Similarly, you can define giant segments to prove that some artificial setups will fail with a 32 bit linear address space. It gives you sort of a victory, but a rather insignificant one.

    [If the segments are not 4GB each, then how do you get to 12 GB with three selectors? It sounds like you're really saying, "I can get 12 GB of virtual address space, provided you don't accidentally place two things at the same location." That "provided that" prevents the design from being useful, because because when you write code, you don't don't really know where your stack pointer is, so you have no way of preventing your stack pointer from accidentally being equal to your code pointer. "Everything works great, and then sometimes it just hangs unpredictable." Nobody would use a system like that. (And again, you aren't really inventing anything new. Segmented memory has been around since the 80286.) -Raymond]
  69. j b says:

    > strange abbreviations like Gi and Mi

    IEEE 1541 – 11 year old standard

    IEC 60027-2 – 14 year old standard

    > you are not accessing 12GB of data simultaneously.

    Correct, I am not. I am addressing a few bytes at a time. Whether those bytes are located in a 4Gi segment, a 100 Ki segment or a 4 Ki page makes little difference: Each instruction addresses a few bytes.

    Virtual address space does not have to do with referencing at the very same moment. If you have 4 Gi of virtual address space in a paged system, but only 512 Mi of RAM, then you still have 4 Gi of virtual address space. You cannot access all of those 4 Gi simultaneously, either.

    Virtual address space has to do with *addressability*, not addressing in one specific operation.

    Sure I mentioned three address spaces. Not three selectors, three virtual, non-overlapping (in the virtual space) address spaces, each 4 Gi. By your logic, in a purely paged system (no segments at all) with 4 Ki pages, you can only address 12 Ki, because each of the three pices of information that one instruction references is only 4 Ki, and the rest of the virtual address space or physical RAM is irrelevant, as the instruction doesn't reference them. Well, this is not the common way of referring to adressability and virtual address spaces.

    > I can create 200 objects, each 500MB in size, and have a loop that goes through all 100 of them, and ta da, by your argument, I have accessed 100 GB of data.

    If you have a virtual space that will hold those 200 half-gig objects, then you DO have virutal space of 100 Gi. But you can't fit that into the data segment of a 386. You can fit at most 4 Gi in there, as in my example.  And you can fit 4 Gi of code. And stack.

    I understand that you have no desire whatsoever to make use of the segment facilities; you set up your 4 Gi segment and then prentend there are no segments, only flatland. OK, do! I program in terms of objects, not in terms of offsets and linear space. I know how mechanisms similar to 386 segmenting could be used to efficiently realize object concepts; you refuse to relate to that. OK, then don't!

    When I initiated this part of the discussion, it was to air a little frustration that NOONE seems to care about hardware support of the 386 segmenting kind. I can easily understand that OS people working from the linear space level, down to the physical RAM chips don't care, But after all, lots of people claim to program "object oriented", Java or .net style (here in the sense: With no "pointer" variables holding physical addresses). I sort of hope to make then care about what goes on at runtime, and how close 386 segmenting is to that.

    My frustration over the missing interest is old and goes far beyond this blogg; it is certainly not a frustration over your blog, which is one of the better ones on the net. I hope my attempt to bring some light to the 386 segmenting mechanisms have not frustrated your other readers. Keep up the good work.

    [Okay, then we agree that there is nothing new here and we're just quibbling over terminology. (I don't see why all 200 half-gig objects need to live inside the same data segment. The are by definition not in the same data segment because you put each one in its own segment.) I think I've spent more time working with segments than you. I know how they work. I even wrote a system that uses 48-bit pointers for pretty much this exact purpose. But I also know that changing selectors is very expensive on the CPU, so code that changes selectors a lot will pay a significant performance penalty. Also, this entire thread is completely off-topic, and I'm going to delete it in a few days. -Raymond]
  70. Joshua says:

    I don't get it? Why does 0x1234:FFFFF000 have to be found anywhere in 0x1235:XXXXXXXX in 32 bit segmentation?

  71. yuhong2 says:

    @Harry Johnston: No it is because Server 2008 was also delayed. They generally set the end of mainstream support to 2 years after the next version release in these cases.

  72. OMG says:

    >>> strange abbreviations like Gi and Mi

    IEEE 1541 – 11 year old standard

    IEC 60027-2 – 14 year old standard

    <<

    To be a "real" standard, it has to be (widely) accepted. For example, in Gemany, the national standard DIN5008 mandates that in letters and other formal writings a pure numerical date should use the ISO8601 date format (yyyy-mm-dd). By Wikipedia (de.wikipedia.org/…/Datumsformat), this is also regulated by a standard of the European Union since 1992.

    Nobody cares. So you better stay with the real standard, which (in Germany) is dd.mm.yyyy.

    The Wikipedia article puts it this way: "Ähnlich wie in den meisten anderen europäischen Ländern wurde die Norm auch in Deutschland und Österreich weitgehend ignoriert, wo allgemein weiterhin das gewohnte Format TT.MM.[JJ]JJ in Gebrauch blieb. In der Ausgabe von 2001 der DIN 5008 wurde daraufhin eine Anmerkung eingefügt, wonach das gewohnte Format wieder zulässig sein sollte, „sofern keine Missverständnisse entstehen“."

    Which means in English, this regulation was changed in 2001 to re-allow the old format. I think this is a good example for an irrelevant standard, the same way as this "Gi" and "Mi" thing.

  73. Mike Dimmick says:

    @j b: Selectors, in x86 protected mode with paging enabled, point to *virtual* addresses as mapped through the page table. They do not point to *physical* addresses. The translation from segment-relative to virtual addresses happens *first*. Then the virtual address is translated to a physical address via the page tables. The segments can only point to addresses within the 4GB virtual address range; they do not allow access to more than this.

    The only possible value for segmentation in 32-bit x86 is to provide a degree of access protection: but the access bits on pages are far more fine-grained, and don't require you to reserve chunks of the address space for code or data. Windows only sets the FS register to a per-thread selector, whose base address is the start of the Thread Information Block for that thread. All other selectors have a base of 0 and limit of 4GB.

    AMD crippled the segmentation support in x86 long mode: only FS and GS really work any more.

  74. Gabe says:

    "Do you have an app that is bumping into the 8TB limit? Or is this a purely theoretical problem with no practical impact? -Raymond"

    While I don't have an app that is bumping into the 8TB limit, it's not purely theoretical. Consider applications that compose a finished movie by taking all the clips that have been recorded and combining them according to some processing instructions (e.g. "take frames A to B of clip X, fade to frames C to D of clip Y" or "compose clip X onto clip Y using chroma key K with filter F starting at frame A") into a final output file.

    Such programs deal with uncompressed images (because compressed images aren't all the same size, so you couldn't access frames in random order) in a linear colorspace (16 bits per channel — what your digital camera might call RAW). The file compression and conversion to log colorspace (8 or 10 bits per channel) is done at the same time in another step.

    One way to write such a program is to map all the source files and output file into memory simultaneously so that each frame could be referenced by simple pointer arithmetic. Then you could perform all the processing without having to worry about managing I/O, caches, or buffering. Spend your time optimizing your image processing algorithms instead of I/O. Assign different threads to different ranges of output frames, and you don't have to ever worry about locking.

    This all seems like a reasonable way to write a simple video post-processing system, right? It would certainly work well for a typical home movie maker. It could also work for professional movie studios, if not for the address space limit.

    A "4k" camera (like a Canon EOS C500) records at 4096×2160 resolution, yielding 50MB per uncompressed 16bpc frame. A movie is filmed at 24fps (with some being 48 or 60fps), yielding about 1.2GB per second. A good feature length movie averages over 2 hours, so a file containing a raw, uncompressed movie would be at least 8TB in size.

    Just to have the complete output file memory mapped, it would need to have over 8TB of address space. To have all of the source files and output file mapped into memory simultaneously, it could require hundreds of TB of address space.

    This might not be the best way to write such an application, but it's certainly not far-fetched in a world where 8TB of disks fit in my pants pockets.

    ["I know of an app that could run into the 8TB limit" is an acceptable equivalent, since the question wasn't whether anybody personally had such an app but rather whether this is a practical or theoretical issue. -Raymond]
  75. @Gabe:

    Having the files mapped would be big waste of the OS resources, and unnecessary, too. Just the page tables would take 16 GB (actually more than that)

    Remember that access to memory mapped files usually results in small (page-sized) in-page I/O. There is no guarantee that the OS will do read-ahead.

    It's much more efficient to read the files as necessary into the frame-sized buffers in big I/O requests. Buffers would serve as frame cache and be discarded according to LRU.

    It totally doesn't make sense to memory map the output file. For best throughput you're better off with writing the file out with no OS buffering in big chunks.

  76. yuhong2 says:

    @alegr1: I wonder if memory mapped files based on 2MB pages are possible, especially given the speed of today's disks.

  77. @Youhong Bao:

    2MB pages are non-pageable, because finding a suitable free contiguous 2MB page for page-in in a system with 4KB pages is an exercise in futility.

    This is why you need SeLockMemoryPrivilige to allocate them.

    I think Raymond covered this issue once.

Comments are closed.