Aha, I have found a flaw in the logic to detect whether my program is running on 64-bit Windows


Some time ago, I described how to detect programmatically whether you are running on 64-bit Windows, and one of the steps of the algorithm was "If you are a 64-bit program, then you are running on 64-bit Windows, because 32-bit Windows cannot run 64-bit programs."

Every so often, somebody will claim that they found a flaw in this logic: "This algorithm may work today, but it assumes that the only version of Windows that can run 64-bit applications is 64-bit Windows. What if a future non-64-bit version of version of Windows runs 64-bit applications? Then your algorithm will incorrectly say that it is running on 64-bit Windows!"

Yeah, but so what?

Suppose you detect that the program is running on this hypothetical version of Windows that is not natively 64-bit but still runs 64-bit applications. What will your program do differently? How can you reason about the feature set and compatibility requirements of something that hasn't been invented yet?

This is another case of If you don't know what you're going to do with the answer to a question, then there's not much point in asking it.

In this specific case, you should just continue about your normal business and let the emulation layer of the hypothetical future version of Windows do its job of giving you a 64-bit sky with 64-bit birds in the 64-bit trees.

Comments (60)
  1. 12BitSlab says:

    I agree that there is no point in asking the question.  However, I need to point out that there are 128 bit systems.  AS/400 (and S/38 before that) all deal with 128 bitness at the MI level.  It's been that way since 1981 — 1/3 of a century ago.  So, there could be a future version of Windows that is 128 bit and runs 64 bit programs for so-called legacy code.

  2. Joshua says:

    Actually there is one. The #ifdef ladder ends at 16 bit but 64 bit code might yet run. The only way to tell is to thunk to 32 bit code and then dynamically call IsWow64Process to check. This is far more likely to work than some hyphetical 32 bit Windows knowing how to run 64 bit code.

    Actually I lied. This isn't hypothetical at all. The 64+32+16 is implemented by Wine.

    [I don't know what you mean by "64 bit code might yet run". A 64-bit application is in the _WIN64 branch. -Raymond]
  3. Mordachai says:

    There is no point to asking for 99.9% of software.  For that .1% that really need to know – they'll have to figure that out when 128 bit windows comes out.

    For the rest of us – it is generally a really bad sign that your desktop software is worrying about such things – as you're fighting the OS rather than allowing it to do its job.

    64 bit trees either are emulated well enough that the difference doesn't matter, or poorly enough so it does – in which case the issue is the emulator, not the well-behaved software that relies on 64 bit trees.

  4. Mott555 says:

    My current company produces hardware along with an SDK so I've seen this stuff quite a bit. The 32-bit version of our SDK needs to detect the bit-ness of Windows so it knows which device driver to connect to. But for us, this "flaw" isn't an issue until 128-bit Windows becomes a thing in which case we'll simply update our code. And hopefully by then 32-bit will be long dead.

  5. SimonRev says:

    Interestingly we had a batch file yesterday that actually cared.  We started getting reports that an internal installer had suddenly become 64 bit only.  Considering we hadn't modified the build chain in years we had to go spelunking through the tools.  Turns out there was a batch file that calls IExpress.  When the build tools were run on a 64 bit machine, that invoked the 64 bit version of IExpress and produced a 64 bit only installer.  So that batch file did have to care, if only so it could invoke the 32 bit version of IExpress.

  6. Mc says:

    What's the next step going to be  128bit?   Who needs that much address space?  

    I suppose the wide registers might come in handy for some things.

  7. 12BitSlab says:

    @ Mc — single level store is tailor made for 128 bits.

  8. Antonio 'Grijan' says:

    If Moore's Law grow rate (twice the memory every two years) keeps its pace, it will be about 50 years before we have to start worrying about switching to 128 bits. But memory growth has slowed down lately, so that time may be longer. And maybe in half a century computers will work very differently (compare today's PCs and tablets to the mainframes of the 60s!) and architectural details won't make such a difference to the end user.

  9. J. says:

    SimonRev: If I understood correctly, the batch file doesn't need to care, because it always has to run the 32-bit IExpress anyway (or always both 32-bit and 64-bit if you're releasing both versions).

    IanBoyd: (curious, not argumentative) But why does the program need to know? What decisions will the application take based on this? I suppose some specific runtime optimizations could be one case, but I'd like more concrete examples if you or anyone else can provide them.

  10. SimonRev says:

    @J:  The problem is that the batch file does need to care.  If the batch file is running as 32 bit, it would run <system 32 dir>iexpress.exe.  If running as 64 bit it has to run <windows>SysWOW64iexpress.exe.

    The original batch file just ran iexpress.exe and relied on the path to pull it from system32.  However when running as 64 bit, that will run the 64 bit version.

    In the end this is just a special case of IanBoyd's problem — programs that don't know if they are running as 64 bit or not because the code is agnostic. (in his case IL, in mine batch)

  11. dalek says:

    @mott555

    > And hopefully by then 32-bit will be long dead.

    Due to hardware getting cheaper and more prolific all the time I think there will be far more legacy/custom 32 bit applications than DOS and 16 bit Windows applications (some of which are still being used 22 years after Windows 3.11 was released). I would not be surprised if 32 bit applications would still have to be supported in 2050…

  12. Azarien says:

    @IanBoyd: in .NET for example you have Marshal.SizeOf(typeof(IntPtr)).

  13. The Marshal Plan says:

    @IanBoyd: Similar to Azarien's response, in .NET you may have to P/Invoke to a proprietary DLL from an IL program, and you need to know whether to load the 32-bit or 64-bit version.

  14. Danny says:

    "…and let the emulation layer of the hypothetical future version of Windows…"

    There's no hypothetical here. It's called virtual machine. I run W7 64 bit inside a vmware machine on a W7 32 bit host all the time. On same W7 32 bit host system I run 64 bit MacOsX Mountain Lion, also W8 64 bit, Ubuntu 64 bit virtual machines. All works just fine.

  15. Azarien says:

    There's no clear benefit in going 64-bit for most applications. Most probably, at some point VS will produce 64-bit code by default, and Windows will no longer ship in x86 version, but 32-bit applications will still be used for a long time.

  16. RangerFish says:

    That .1% of software that needs to actually care likely to be installation-related. It could be that it needs to know specifics about what architecture of Windows is in use (regardless of the bitness of the program itself) in order to install the correct components. It might also need to drop the right files into the right places, again regardless of its own architecture.

    Of course, there are no current 128-bit Windows systems, but a 32-bit (or even 16-bit) process is often used to do this in our current 64-bit world because it can run on the widest number of environments. Even if your application doesn't support the environment, your install needs to give a sensible error, rather than just splurging a "bad image format" error onto the screen.

  17. Joshua says:

    [I don't know what you mean by "64 bit code might yet run". A 64-bit application is in the _WIN64 branch. -Raymond]

    I believe this to be obvious. If you drop a 64 bit .exe file, can you start it?

    [The fact that code in the 64-bit EXE file is running (in order to detect whether it is running on a 64-bit system) implies that the answer is "tautologically, yes." -Raymond]
  18. DWalker says:

    I want a 65-bit version of Windows.  Twice the addressing space of 64 bit Windows!

  19. Myria says:

    @DWalker: Windows 8.1 has 16 times the address space of Windows 8.0 =)

  20. ajanata says:

    SimonRev: %PROCESSOR_ARCHITECTURE% is AMD64 on my 64-bit Windows 8.1 box (and I'm pretty sure I've seen this before Windows 8 as well). I suspect it will be something else on 32-bit Windows.

  21. MNGoldenEagle says:

    @ajanata: Environment variables aren't always a good way of identifying characteristics of a machine, since they're trivial to modify or spoof.  You could detect it in code by measuring the size of the IntPtr type (or equivalent), or use WMI to get the system architecture type, or write a custom program that detects it for you (in the event you need bitness logic in a batch file).

  22. Adam Rosenfield says:

    @DWalker: Oh, you mean the IA64 build of Windows?  blogs.msdn.com/…/60162.aspx

  23. Joshua says:

    [The fact that code in the 64-bit EXE file is running (in order to detect whether it is running on a 64-bit system) implies that the answer is "tautologically, yes." -Raymond]

    Oh wow. I was referring to a case of 16 bit code trying to decide if the host is a 64 bit host or not.

  24. ChrisR says:

    @Joshua:  As usual, you are making almost no sense.  I am shocked that Raymond even bothers to respond.  How is anybody supposed to know you are talking about running 16-bit code when you say something like "If you drop a 64 bit .exe file, can you start it?"?  What the heck does that even mean?

  25. j b says:

    @DWalker,

    The hardware gives you three times the address space – if the OS would give you access to it. But it won't. Code, data and stack – even on the x84 the hardware can address 3 x 4GBytes of virtual address space. But for the OS to e.g. determine whether the 32 bit address causing a page fault was a code, data or stack access would make code more complex, and not the least: More bound to a specific architecture. So the OS desginers found it easier to give you four billions times as much logical address space than just three times.

  26. Roman says:

    I suppose you could be a system information tool who wants to display the real bitness of the system.

  27. Derek says:

    Interesting. I worked on the versions of Mac OS X that supported both 32 and 64-bit x86 userspace applications with a 32-bit supervisor mode (kernel and drivers). The systemcall, interrupt etc. trampolines performed mode switches. There were some odd edge cases like having to transition to 64-bit mode within supervisor mode briefly (with interruptions masked) in order to save the full floating point/SIMD states, switch pagedirectory bases etc., but it worked fine and we shipped it. This approach was used to preserve supervisor mode compatibility (for drivers), for a gradual 64-bit transition.

    I'm not sure when windows went 64-bit, but I've wondered if Microsoft internally considered this approach and rejected it for complexity/validation reasons or…? As far as I know, nothing besides Mac OS used this hybrid approach. Of course, for the last couple of releases, Mac OS X has used a 64-bit kernel.

  28. Antonio &#39;Grijan&#39; says:

    @Brian: technically, you are right. But it applies directly to RAM size, because the number of transistors in a memory chip is directly proportional to its capacity (in dynamic RAM, one transistor per bit, plus some selection logic for each word, row and column). And RAM size determines the need for address bus width and, thus, architecture bitness. So yes, you can use Moore's Law to model bitness changes (and the proof is that it works pretty well for past changes).

  29. Will says:

    @Derek – I was also thinking of OS-X when I was reading this post about a hypothetical non-64 bit OS running a 64 bit app.  OTOH, there were very few apps that actually needed to care.  Basically, things like driver installer utilities.  If a driver was sanely made, it would provide the same sort of API for the 64 bit kernel or the 32 bit kernel, so apps that needed to talk to the device usually didn't have an issue one way or another.  (Though I vaguely recall some nonsense with some SDI output cards in Mac Pros.)

  30. Gordo says:

    If you drop a 64-bit .EXE file, it will probably say "ouch". Depending on how far the fall is, of course.

  31. IanBoyd says:

    > If you are a 64-bit program, then you are running on 64-bit Windows, because 32-bit Windows cannot run 64-bit programs

    The problem comes up when i don't know if my **program** is 64-bit or 32-bit. My Windows program is neither Intel x64 or x86, but it IL.

    It is the user's computer which will decide at runtime if my program will be running as 64-bit or 32-bit (or 128-bit, or ARM for that matter).

  32. ErikF says:

    @Joshua: Are you suggesting that you would write a 16-bit program that uses 64-bit processor features? That's the only use case I can see, and it doesn't make any sense to me; making a 16-bit program in this day and age seems odd as there's almost no support for that mode anymore. Any other 16-bit program would be legacy only and maybe would check for 386/Windows 95, tops.

  33. Brian_EE says:

    @Antonio: Moore's Law (which is really an assertion/observation and not a physical law) states that the number of *transistors* doubles every two years.

  34. IdahoJacket says:

    I think the more interesting question is should IsWow64Process return true or false for 32 bit code running on hypothetical 128 bit Windows.

  35. DebugErr says:

    @IdahoJacket There will probably be no 32-bit emulation layer on 128-bit Windows anyways.

  36. Anon says:

    @Joshua

    "Oh wow. I was referring to a case of 16 bit code trying to decide if the host is a 64 bit host or not."

    It's not: support.microsoft.com/…/896458

  37. DWalekr says:

    @j b:  huh?  I was comparing a hypothetical 65-bit version of Windows (or any OS) with a 64-bit version.  One extra bit gives you twice the addressing space.  And you mentioned "the x84", at which point I got lost.

    @Adam R:  Wow, that's interesting.  A 65th bit to signal "valid or not".  Cool.

  38. j b says:

    @DWalekr,

    "x84" was typo for "x86" – I guess my brain mixed in "64" when I was going to type the 6. :-)

    I referred to x86 because I don't know the 64-bit architecture wel. I _guess_ is is similar, but the x86 I KNOW that code, data and address spaces are in principle independent; they run from logigal addresses 0 to 4G; code address X may have different contents from data address X and stack address X. There is never doubt about which of the three content values to use, even if X is the same. So the x86 architecture provides each process with 12 GB of address space (assuming ideal balance between code, data and stack requirements).

    I _guess_ that the x64 architecture is similar, giving a process three 64-bit address spaces, without adding a 65th address bit. I never checked it up.

    If each process had three independent address spaces, each filling the entire address range, the OS would e.g. for DMA see them as three different entities (compare it to three different processes!). It would have to take (slightly) different actions e.g. when paging depending on whether the page fault for logical page X occurred in the process' code, data or stack area. This would be specific to CPUs distinguishing code, data and stack accesses. Addressing-wise, most don't, and those that do, do it in different ways. Originally, NT ran on a significant number of CPUs, and it was much easier to lay all three address spaces on top of each other, assigning any given address to either code, data or stack – not all three of them.

    Splitting the 4G address range into a "system" part and a "user" part is partly a result of NT development having a fair share of people with VAX/VMS background appearently unfamiliar with the kind of mechanisms provided by the 386 MMS. 386 allowed the appplication to occupy the entire 4G address space; any OS call (or other supervisor) might cause a jump into a completely independent 4G address space, fully controlled by the OS. Obviously, this would give the OS dramatically better protection against malbehaved application code, simply by its data not being addressable from application code. But the performance penalty was considered too high (a function call crossing address spaces WAS expensive), and also: Not every CPU architcture provided such mechanism. So instead of giving the OS and application fully independent address spaces, OS code was squeezed into application space, and protection enforced not through adressability, but by restricting access to certain areas of the address space through page/segment protection. (From an architectural point of view, this is a tragic defeat, even though it can very well be understood on pragmatical grounds!)

    Sometimes, I am happy NOT to be an Intel 386 MMS architect. It must be somewhat depressing seeing all the great ideas being put into that memory management system. amd seeing two thirds of it ignored, wasted…

    [No, x86 does not get a theoretical 12GB of address space because the code, data, and stack selectors all point into the same shared 4GB address space. Enough people make this mistake that I think I'll need to do a separate article on it. -Raymond]
  39. Falcon says:

    @j b:

    The way it works is:

    – linear address = segment descriptor base address + offset

    – if PG = 0 (paging disabled), physical address = linear address

    – if PG = 1, perform TLB/page table lookup and translate linear address to physical address

    Incidentally, the Motorola 68k family, which had a flat address space, could do something like what you are suggesting, using the processor's Function Code pins, although it would need assistance from external logic. I was reading some 68k documentation a little while ago, and IIRC, the 68010 introduced an instruction to move data between address spaces.

  40. j b says:

    "the code, data, and stack selectors all point into the same shared 4GB address space"

    Yes, that is how it is _used_ to make it simpler for the OS. But virtual address X (in either code, data or stack space) does NOT have to map to the same linear-space address X. There is a offset that might place three virtual address X into three different linear space addresses.

    It is used in such a way that code, data and stack address X all points to the same linear space address. If neither code, data nor stack size never exceeded 1G, address X in the three spaces could map to 0+X, 1G+X and 2G+X, as perfectly independent spaces. If linear space was, say, 64G large, they could map into 0+X. 15G+X and 32G+X respectively, and could grow to 4 G each without interfering with each other. In the original 386, and today's chips in 386 mode, linear addresses are 32 bits, but that has no influence on the principal independence of the three virtuall address ranges – they still could map to distinct linear addresses, if the OS chose to. But as you say, they choose NOT to. That's the OS' choice.

    The linear adddress is not visible to application code. We might argue whether a 32 bit linear address size is considered "architecture" or "implemementation". Increasing it to some arbitrary larger value (say, a 36 bit address) would be invisible to any application code, and would not require changes in addressing structure, just a size increase. From that point of view it is just an implementation detail. Obviously, the OS would have to be aware of the larger linear address size, so from an OS point of view you might argue that it is more than an implemetentation difference. In my book, chamging a dimension is certainly a minor architectural change. OSes have to adopt to more significant implementation choices than that, while still continue referring to it as within the same architectural framework.

    [The linear address space on x86 is 32-bit. The operating system does not have a choice here. "If linear space was, say, 64G large" is a counterfactual. -Raymond]
  41. j b says:

    First: If you really WANTED to, you could provide 12G virtual space to a 32 bit process, duplicating the paging mechanisms at the linear address space level: Like an access to a physical page not in RAM cause another page to be thrown out to make room for the required one, access to a virtual address segment not mapped to linear space could cause another segment to be unmapped to make room in virtual space. (I am talking about the mapping to linear space, managed by the OS – the application would be unaffected.) Well, that requires using segments less than 4G, which I have understood is difficult to comprehend. But the 386 MMS allows it, from its very first implementation.

    "If linear space was, say, 64G large" is a counterfactual. … No, it is not. It is not a claim of anything at all. Quite to the contrary. "If it was" implies that it isn't, and that certainly is a true fact, in this case. "If we add two more segment register to the next chip generation…", would you label that as "counterfactual", too? Is there a fundamental difference between extending the range of valid register numbers and extending the range of valid linear space locations, making one "if" counterfactual, and the other one not?

    Obviously, the 80386 was limited to 32 bits linear space. But should we consider this an _architectural_ limitation? If Intel made a new chip with the linear space extended to 36, 48 or 64 bits, and everything else identical, would it be considered a completely new and different architecture? Methinks not. Lots of extensions have been made to the original 80386, like new instructions, new registers, new physical addressing capabilities. Yet no one calls it a diversion from the x86 architecture. Nor would an extension of the linear address space be. At most you could call it an extension of the architecture, but bordering on being an architecture _implementation_ detail.

    What IS an architectural aspect, is that address X is mapped to different linear addresses, depending on whether it it a stack, data or code address. That some OS decide to set up the segment tables so that the mappings end up at the same linear address is a software matter; from the MMS point of view, they are still distinct, independent mappings.

    An _OS programmer_ works with _implementations_ of the architecture, and might be affected by the address length. Just like the carpenter who builds my house is concerned whether the wooden planks are six or eight inches wide: You won't have a new architect make a new house even if you decide to change the width of the planks – that is not "architecture".

    I guess I am confused from reading description of the architecture of the x86 chip's MMS – my mind isn't geared to, or limited to, how Windows make use of a selection of the facilities provided, as they appear in specific implementations. It is difficult for me discussing this matter further, as I am not that familiar with all the limitations Windows developers put on the use of the hardware. So I won't make further contributions to the discussion.

    ["If we used two more segment register…" is a counterfactual. Two more segment registers don't exist, so you are assuming a falsehood. What's the point of writing an operating system that takes advantages of CPU features that don't exist? (It's not just an architecture implementation detail. The size of the linear address space and the number of registers is part of the CPU's public surface. It's a contractual change, not an implementation detail.) Besides, your math doesn't add up. "That requires using segments less than 4G" which means that your total virtual space is less than 12G, because adding together three numbers less than 4G will give you a total that is less than 12G. -Raymond]
  42. smf says:

    @Raymond

    [The linear address space on x86 is 32-bit. The operating system does not have a choice here. "If linear space was, say, 64G large" is a counterfactual. -Raymond]

    Pentium Pro onwards supports 36-bit (i.e. 64G).

    en.wikipedia.org/…/Physical_Address_Extension

    Some Microsoft operating systems support it as well.

  43. Simon Farnsworth says:

    @smf

    That's support for larger physical addresses, not larger linear addresses.

    The 386 and later do their MMU work in two conceptual stages when in 32 bit paged mode:

    1. Take segment base register, add offset, to get linear address (32 bit number).

    2. Take linear address (regardless of which segment it came from), and translate to a physical address (32 bit number if no PAE, 36 bit if PAE, up to 48 bits if  64 bit OS).

    Your virtual address space is constrained by the size of linear address space; even if you have three different segments in use, this is still 32 bits in x86 32 bit land. The benefit of PAE is twofold:

    1. Systems running large numbers of processes can map the linear address space for each process onto different 36 bit physical addresses, so you can give processes real RAM (not page file) even when the total memory demand on the system exceeds 4 GiB.

    2. Specially written processes can drive paging so that they use more than 4 GiB of RAM, by using a window into the physical memory in a fashion similar to DOS-era overlays and EMS; under Windows, you'd use Address Windowing Extensions to do this.

  44. Owen SHepherd says:

    Also note that lots of software uses the linear address wraparound for its' own purposes. The Linux x86 TLS ABI, for example, uses "negative" offsets from GS

  45. j b says:

    Raymond,

    I think your comment is a good enough explanation why I don't want to follow up this discussion.

  46. foo says:

    @j b. "I think your comment is a good enough explanation why I don't want to follow up this discussion."

    Thanks, because I can see this turning into a rehash of your blogs.msdn.com/…/10429807.aspx etc… postings.

  47. HagenP says:

    You need to know the platform variant(32/64bit) if you want to install drivers. Not for the driver package itself – the INF file can take care of it – but for calling the correct "Driver Package Installer" introduced by MS "to provide a better user experience" (quote by memory – the WinHEC2008 presentation CON-T532_WH08.pptx seems not to be available anymore).

    But how can you select the "correct" – 32bit or 64bit – DPinst.exe version? MS kindly provides a HowTo document:

    download.microsoft.com/…/32-64bit_install.docx

    This states "A platform-specific version of DPInst must be provided for each target platform that the INF supports.". Obviously it is not possible to make a single DPinst binary for driver installation on 32/64bit OS variants…

  48. HagenP says:

    @ j b

    > code, data and address spaces are in principle independent

    On some processors you can "see" the memory access type (e.g. supervisor / user) externally on some pins (e.g. FC0/1/2 on an 68000). With them you can make address types map to different address spaces, and you can make them  (partially) independent.

    But x86 processors do not provide external signals to tell apart the different address types. There is no "data access" pin or a "stack access" pin.

    -> Without this information you cannot make the "address spaces" independent. Everything maps to the same 32bit address space.

  49. Brian_EE says:

    @JB:

    It's obvious you don't understand how the physical hardware works, so let a hardware guy who designs microprocessor systems for a living tell you: you're wrong.

    The x86 architecture is Von Nueman Architecture. What you describe would require Harvard Architecture. Go educate yourself on these terms.

  50. dave says:

    >Splitting the 4G address range into a "system" part and a "user" part is partly a

    >result of NT development having a fair share of people with VAX/VMS background

    >appearently unfamiliar with the kind of mechanisms provided by the 386 MMS.

    A few of them at least were experienced enough with the PDP11 that they

    would be well familiar with having the OS have its very own address

    space (on the PDP11, it's because 64KB is not enough to share).

    I too am familiar enough with the PDP11 and VAX that having the OS

    have its own address space is a pain in the rear, at least in

    directive/syscall processing, where you need access to

    both spaces.

  51. j b says:

    @HagenP and Brian EE (and Raymond as well):

    You all completely ignore the segmenting mechanisms of the 386 MMS. You defines three 4GB segments, perfectly aligned from address zero, and then forget all about segmenting, pretending it isn't there. Yes, if you do it that way (and that seems to be the only way that is acceptable in this forum), then you have discarded the option to distinguish between the three address spaces. I accept that attitude, and understand that any statement in this forum about the 386 MMS capabilities must make the assumption that the segmenting part of the MMS is not used. (And also that there is no distinction between an architecture and a specific implmenetations.)

    I tend to look at ALL the available facilites, and I tend to distinguish architecture from implementation. I feel it silly continuing a discussion as if half of the processor was missing, and as if an architecture can never be extendended. Therefore I feel it silly continuing this dicussion.

    [Okay, so that's the point of confusion. You are talking about the x86 architecture that could exist in the future; I'm talking about the x86 architecture as it exists today. I think it's silly continuing a discussion that assumes the architecture can be extended, because that doesn't help you write an operating system today. In the future, please make it clear that you are making counterfactual statements (statements that are not true today). -Raymond]
  52. j b says:

    No, Raymond.

    The distinction between virtual data, stack and code addresses has existed since the very first 80386.

    The only reason why one would require a future processor is that you (and others) insist that there both code, data and stack segments are 4 GB large, and there is a single of each. THEN, because you at any time will need all three kinds of segments, will you need a larger linear space. 386 was made to handle multiple segments of each kind, according to need. You appearently refuse to recognize that mechanism. THEN you need more linear space.

    That's one point. Another point is that talking about "an x86 _architecture_ that could exist in the future" – you fail to distinguish beteween architecture and implementation. Extending an address field in a way that is invisible to the application program, keeping everything else as before, certainly is not a "new" architecture. It is the same architecture with a wider address field, which is about as architecturally significant as the number of physical address lines. Which is 99,5% an implementation detail, not an architectural aspect.

    What IS an architectural aspect is that the processor makes completely independent mappings of data, code and stack addresses. There is a single reason why address X ends up at the same linear address: Because your software has chosen to set up the segment descriptors that way (so that you can forget them!). The architecture has the capability of doing otherwise, even if you choose to do identical mappings in all three cases. That's your choice, and it is obvious that you insist not to ever consider any other alternative. Another OS could do differently, even if Windows guys do not want to. Even with 32 bits linear space.

  53. j b says:

    @Brian EE,

    This has nothing to do with physical hardware but with conceptual mapping from a program's virtual address to a linear address – even before it is mapped down to a physical address.

  54. j b says:

    @HagenP,

    The mapping we are talking about takes place within the CPU's MMS, before the address delivered to the paging system for mapping. The CPU selects a segment, depending on the kind of access, which is mapped into linear space – _that's_ the mapping we are talking about. No physical pin is required for the chip-internal MMS to make the distinction. (But to understand that, you must look at the segment system, and not, as most people around here do, simply set up three 4 GB segments and put them on top of each other, so the can completely forget the segment system.)

  55. Evan says:

    @Brian_EE: "The x86 architecture is Von Nueman Architecture. What you describe would require Harvard Architecture. Go educate yourself on these terms."

    I think von Neumann vs Harvard is a red herring, and saying his (apparently-mistaken) view of how x86 works requires Harvard is wrong. Those models don't take into account virtual memory, and the process by which virtual addresses are mapped to physical ones is the linchpin to whether and how JP is right or wrong. If each selector got a separate page table, then JP would be right. But that wouldn't make x86 a Harvard architecture, because whether it looked like Harvard or von Neumann would depend on how the page tables were set up: if you pointed code and data at disjoint physical spaces, it would look like Harvard. If you pointed them at the same place, it would look like von Neumann. If they overlapped but not completely… well, it looks like some weird third category. So which is that hypothetical version of x86? Both? Neither?

    That's the hypothetical x86 that works differently from the real one. So what about the real one? Well, turns out that I *think* you can do something vaguely similar. On processors that didn't have support for the NX bit, PaX was able to emulate it by taking advantage of the separate instruction and data TLBs. By asking the CPU to trigger page faults when the page wasn't in the ITLB (even if it was in the DTLB) they could check to make sure that the faulting address wasn't NX. They could have similarly prevented reading data from the code segment, but there's not really a security reason to and probably that would break programs. I'm not sure about this, but I suspect you could hypothetically even let execution proceed in those cases in such a way that the TLBs would have different contents. Voila, stock x86 that acts like a Harvard machine.

    So… you're right, but I would argue it's not for the reason you say.

  56. Daniel Rose says:

    @ajanata, @MNGoldenEagle

    %processor_architecture% does not always return AMD64 on a 64bit machine. If you are running a 32-bit process, it will return x86. See stackoverflow.com/…/why-processor-architecture-always-returns-x86-instead-of-amd64

    Similarly, sizeof(IntPtr) will tell you the pointer size of the process you are running, not the machine's pointer size.

  57. ErikF says:

    @jb: I still don't understand how you can have three segments that address 12GB at the same time:

    CS: base=00000000h, limit=FFFFFFFFh

    DS: base=????????h, limit=FFFFFFFFh

    SS: base=????????h, limit=FFFFFFFFh

    What values should the DS and SS base addresses be?

  58. Gabe says:

    I thought that the difference with a Harvard architecture machine was that it has two physical data paths: one for data and one for instructions (the Harvard computer had a paper tape for the instructions and electromechanical counters to hold data).

    The problem with a von Neumann machine is the "von Neumann bottleneck" — the instructions and data are fighting for memory bandwidth. If your data and instructions are coming from the same RAM but just have different address spaces, you still have the von Neumann bottleneck, so it's still a von Neumann architecture.

  59. j b says:

    @EricF,

    As long as you insist on there being a single 4GB code segment, a single 4GB data segment, and a single 4GB stack segment. set them up when the process is started, and then you forget everything about segments, you have a problem (with the current implemnetations of the architecture). But that's not the only way to use the 386 MMS. Note that the segment selector is a 14 bit number. That is for a reason! But to utilize it, you must recognize that the segment mechanism is there.

    I know very well that it is far from Politically Correct to point out that the segmentation mechanisms could have a very positive value; the Proper Attitute is to pretend that it isn't there, by setting up three maximum size segments, fully overlapping, and then forget about segments. Yes, that does simplify a few things (not the least for the OS programmer, and certainly if he wants the memory management design to be portable to several architectures). The simplistic, non-segmented approach IS simpler – no question about that. But if you would recognize the segmenting mechanims and its uses, you could utilize the processor architecture in ways that is not possible in the single-segment approach. I get a very clear impression that most people around here has no interest whatsoever in going beyond the simplistic approach. I'll accept that – but I will for my own part continue to recognize all the capabilities of the 386 memory management system.

    [Basically, you're suggesting using a segmented 32-bit model, which is the 32-bit version of 16-bit protected mode. If you had said that, then everybody would have understood. Instead, you talked about having a 64GB line address space, and then you started losing people. Yes, you could use a segmented 32-bit model, where all pointers are 48-bit and segments were swapped in and out of the 4GB linear address space. There was an extension to Windows 3.1 that did this (called WINMEM32, as I recall). Nobody used it. -Raymond]
  60. Joshua says:

    [… segmented 32-bit model …]

    I always thought that should work. Same way the 32 bit kernel can have > 4GB disk cache. Why bother now that most new processors are 64 bit. Topic finished.

Comments are closed.