Why can’t you thunk between 32-bit and 64-bit Windows?


It was possible to use generic thunks in 16-bit code to allow it to call into 32-bit code. Why can't we do the same thing to allow 32-bit code to call 64-bit code?

It's the address space.

Both 16-bit and 32-bit Windows lived in a 32-bit linear address space. The terms 16 and 32 refer to the size of the offset relative to the selector.

Okay, I suspect most people haven't had to deal with selectors (and that's probably a good thing). In 16-bit Windows, addresses were specified in the form of a selector (often mistakenly called a "segment") and an offset. For example, a typical address might be 0x0123:0x4567. This means "The byte at offset 0x4567 relative to the selector 0x0123." Each selector had a corresponding entry in one of the descriptor tables which describes things like what type of selector it is (can it be used to read data? write data? execute code?), but what's important here is that it also contained a base address and a limit. For example, the entry for selector 0x0123 might say "0x0123 is a read-only data selector which begins at linear address 0x00524200 and has a limit of 0x7FFF." This means that the address 0x0123:n refers to the byte whose linear address is 0x00524200 + n, provided that n ≤ 0x7FFF.

With the introduction of the 80386, the maximum limit for a selector was raised from 0xFFFF to 0xFFFFFFFF. (Accessing the bytes past 0xFFFF required a 32-bit offset, of course.) Now, if you were clever, you could say "Well, let me create a selector and set its base to 0x00000000 and its limit to 0xFFFFFFFF. With this selector, I can access the entire 32-bit linear address space. There's no need to chop it up into 64KB chunks like I had to back in the 16-bit days. And then I can just declare that all addresses will be in this form and nobody would have to bother specifying which selector to use since it is implied."

And if you said this, then you invented the Win32 addressing scheme. It's not that there are no selectors; it's just that there is effectively only one selector, so there's no need to say it all the time.

Now let's look at the consequences of this for thunking.

First, notice that a full-sized 16-bit pointer and a 32-bit flat pointer are the same size. The value 0x0123:0x467 requires 32 bits, and wow, so too does a 32-bit pointer. This means that data structures containing pointers do not change size between their 16-bit and 32-bit counterparts. A very handy coincidence.

Next, notice that the 16-bit address space is still fully capable of referring to every byte in the 32-bit address space, since they are both windows into the same underlying linear address space. It's just that the 16-bit address space can only see the underlying linear address space in windows of 64KB, whereas the 32-bit address space can see it all at once. This means that any memory that 32-bit code can access 16-bit code can also access. It's just more cumbersome from the 16-bit side since you have to build a temporary address window.

Neither of these two observations holds true for 32-bit to 64-bit thunking. The size of the pointer has changed, which means that converting a 32-bit structure to a 64-bit structure and vice versa changes the size of the structure. And the 64-bit address space is four billion times larger than the 32-bit address space. If there is some memory in the 64-bit address space at offset 0x000006fb`01234567, 32-bit code will be unable to access it. It's not like you can build a temporary address window, because 32-bit flat code doesn't know about these temporary address windows; they abandoned selectors, remember?

It's one thing when two people have two different words to describe the same thing. But if one party doesn't even have the capability of talking about that thing, translating between the two will be quite difficult indeed.

P.S., like most things I state as "fact", this is just informed speculation.

Comments (36)
  1. John says:

    I, for one, will continue to consider this information to be fact.

  2. ton says:

    "In 16-bit Windows, addresses were specified in the form of a selector (often mistakenly called a "segment") and an offset."

    Interesting stuff Raymond every book or article I’ve ever seen on assembly language programming has referred to what you call selector’s as segments. Including this entry from Wikipedia.

    http://en.wikipedia.org/wiki/X86_assembly_language

    Could you explain more about why the term "segment" is mistaken and shouldn’t be used?

    Also, if

    "And if you said this, then you invented the Win32 addressing scheme. It’s not that there are no selectors; it’s just that there is effectively only one selector, so there’s no need to say it all the time."

    is true then how are read, execute, and write only properties tracked in Win 32 addressing?

  3. Laonianren says:

    It’s a long time since I’ve done any thunking, but IIRC:

    1. 16-bit pointers (i.e. 32-bit wide selector+offset pairs) were unintelligible to 32-bit code, and

    2. Despite having same-width pointers, 32-bit structures often differed from their 16-bit counterparts, so

    3. To call 32-bit code from 16-bit code you had to declare a 32-bit version of the structure and pass any pointers in the structure through GetVDMPointer32W().

    As long as the 32-bit API returns data in a caller supplied buffer (which APIs usually do) this technique works fine.

    Note that this never converts an arbitrary 32-bit pointer to 16-bit, so it would also work for 32-bit to 64-bit.  So there must be some other reason for the lack of 32-bit to 64-bit thunks.

    [Yes, you had to do all this work, but at least it was possible because 32-bit pointers and 16:16 pointers all pointed into the same underlying address space. (“…notice that the 16-bit address space is still fully capable of referring to every byte in the 32-bit address space…”) Not true for 32-bit pointers and 64-bit pointers. (“Neither of these two observations holds true for 32-bit to 64-bit thunking.”) That’s the point of the article, which I apparently failed to make. -Raymond]
  4. Laonianren says:

    "notice that the 16-bit address space is still fully capable of referring to every byte in the 32-bit address space"

    I completely understand your point.  My point is that it is not a necessary condition for thunking.  The condition for the old thunking API is "every 16-bit address can be converted to a 32-bit address".  And since every 32-bit address can be converted to a 64-bit address, there’s no reason we can’t have a 32-bit to 64-bit thunking API.

    The old thunking API also supported calling 16-bit code from 32-bit code but this is not possible from 64-bit to 32-bit for the reasons you state.  However, neither of us were talking about this case.

  5. Roastbeef says:

    "Interesting stuff Raymond every book or article I’ve ever seen on assembly language programming has referred to what you call selector’s as segments. Including this entry from Wikipedia."

    The x86 processors when in "real mode" (the mode without memory protection where the CPU can only access 1 megabyte of memory) the processor uses SEGMENT registers to move a 64k window of memory over the underlying hardware’s 1 megabyte of address space.

    When in protected mode the CPU needs to store a lot of information about each potential memory region.  The information about each memory region is called a "DESCRIPTOR" and they are all stored in one of two tables… either the "LOCAL DESCRIPTOR TABLE" for per-process regions, or the "GLOBAL DESCRIPTOR TABLE" for system-wide regions.

    The CPU needs a register to indicate which entries of the "DESCRIPTOR TABLE" are active at any one time, and so it stores a SELECTOR which is an offset into the DESCRIPTOR TABLEs.

    So did Intel create new registers to store these SELECTORs?  Nope… In protected memory modes the old style SEGMENTs will never be used, so the SEGMENT registers can be overloaded to also store SELECTORs.

    The distinction of which they contain is based on CPU memory model at the instant you’re referring to.

  6. Alexandre Grigoriev says:

    ton:

    "segment" of memory is described with "selector" (16-bit tag). That’s the difference.

  7. ton says:

    @Laonianren

    The following statement from Raymond is the key reason why there isn’t a 32 bit to 64 bit thunking api:

    "It’s not that there are no selectors; it’s just that there is effectively only one selector, so there’s no need to say it all the time." and also

    "Well, let me create a selector and set its base to 0x00000000 and its limit to 0xFFFFFFFF. With this selector, I can access the entire 32-bit linear address space."

    Because of this scheme it is impossible for a 32 bit address to be converted to a 64 bit address with only a single selector and a base address starting at 0x00000000. There just aren’t enough digits.

    So your statement:

    "And since every 32-bit address can be converted to a 64-bit address, there’s no reason we can’t have a 32-bit to 64-bit thunking API."

    could only be true if memory was segmented into 4GB chunks  and there was a selector for each chunk but there isn’t…

  8. ton says:

    @Roastbeef, @Alexandre thanks for the info I’m admittedly pretty green when it comes to memory models and assembly but I’m always seeking enlightenment…:-)

  9. Nick Lamb says:

    Laonianren, it’s not just 64-bit code calling 32-bit code that requires the reverse trick, it’s anywhere (and Raymond did say generic thunking, obviously some specific APIs can be thunked) that needs the reverse trick.

    For example, suppose you try to thunk my reallocator, a function which takes a pointer and returns either the same pointer (but now re-allocated to point to a larger or smaller portion of memory) or else a different pointer (larger or smaller, and with the data from the previous allocation copied over)

    The input side is fine, you add some zeroes to make the 32-bit address into a 64-bit address, and the same with the 32-bit size parameter to turn it into a 64-bit size parameter.

    But on the output side you have trouble, the 64-bit code you’re calling can return any arbitrary 64-bit pointer. In reality it will currently return a pointer from somewhere in canonical 48-bit address space, because CPUs don’t have a 64-bit address bus, but even with that restriction there’s no reliable way to translate that into a 32-bit value to return to the 32-bit code.

  10. Yuhong Bao says:

    I think WOW64 is implemented internally by thunking otherwise the 64-bit version of WinDbg would not be able to debug 32-bit code and .effmach would not exist, but the interface needed are not exposed to external code, why?

  11. Dog says:

    "Well, let me create a selector and set its base to 0x00000000 and its limit to 0xFFFFFFFF. With this selector, I can access the entire 32-bit linear address space."

    What interests me is what happens if you create a "selector" with base 0xFFFFFFFF and limit 0xFFFFFFFF? Do you get access to the 4GB-8GB area?

  12. Mark says:

    Where’s Skywing when you need him?

  13. ton says:

    "What interests me is what happens if you create a "selector" with base 0xFFFFFFFF and limit 0xFFFFFFFF? Do you get access to the 4GB-8GB area?"

    No. You would probably just get a segmentation fault.

  14. Yuhong Bao says:

    BTW, the same restriction exists in Mac OS X. You cannot thunk between Rosetta emulated code and native code, nor you can thunk between 32-bit and 64-bit code.

  15. mikeb says:

    A "selector" can be thought of as the name of a "segment", so it’s not really a mistake to use the term "segment".

    Saying that a selector is mistakenly called a segment reminds me of this discussion between the Knight and Alice in "Through the Looking Glass" by Lewis Carroll:

       "… The name of the song is called ‘Haddocks’ Eyes.’"

       "Oh, that’s the name of the song, is it? "Alice said, trying to feel interested.

       "No, you don’t understand," the Knight said, looking a little vexed. "That’s what the name is called. The name really is ‘The Aged Aged Man.”’

       "Then I ought to have said, ‘That’s what the song is called?’" Alice corrected herself.

       "No, you oughtn’t: that’s quite another thing! The song is called ‘Ways and Means’: but that’s only what it’s called, you know!"

       "Well, what is the song, then?" said Alice, who was by this time completely bewildered.

       "I was coming to that," the Knight said. "The song really is ‘A-Sitting on a Gate’: and the tune’s my own invention."

  16. Alexandre Grigoriev says:

    My explanation why there’s no thunking is that there is no real demand for that. It’s so easy to make sure your 32-bit code compiles cleanly for 64 bits, much easier than for 16->32 transition. Any supposed benefits of thunking would not be worth the trouble.

    There are drawbacks, though, of lack of thunking. You cannot open a 64-bit process from 32-bit process.

  17. Anonymous says:

    I thought that real mode x86 address translation worked by multiplying the selector by 16 then adding that to the 16 bit pointer.  So the highest address you could get was 0xffff * 16 + 0xffff or just over a megabyte.

  18. Kaenneth says:

    I think the main issue between 32/64 bit programs on the same system is dynamic linking. A 32 bit process could ask the system to start a 64 bit process for it, and processes can exchange data, as the Clipboard works, and files are interchangeable.

    But DLLs, ActiveX, etc. are all toast, since they run in the same Process, and and individual process can only have one memory model.

    That causes fun with Shell extensions, Windows Media Player plugins (I think WMP11 is 32 bit only? browser plugins… I don’t know what effects it has on drag-and-drop, and object embedding, or if .NET helps any.

  19. Koro says:

    @Kaenneth: Even if .NET produces 32-bit PE files, the CLR has the ability to generate 64-bit code even then I think. So for .NET it matters a lot less except for the native code that loads it.

  20. Pax says:

    The mainframe world is still dealing with this. It was only recently that the last of the 24-bit code was removed from ISPF under z/OS.  Now it’s all 32-bit code running on a 64-bit OS.

    There are special memory allocation flags in GETMAIN to indicate whether you want memory addressable by 24-bit code (under the line), 32-bit code (under the bar) or 64-bit code (whole address space).

    Accessing all 32-bit addresses from 64-bit code is possible so you should be able to thunk ‘upwards’.  It’s downward thunking that requires special handling like this.

  21. Laonianren says:

    @ton: Every 32-bit address can be converted into a *64-bit wide* 64-bit address.  Which is what is required for thunking.

    @Nick Lamb: Yes, you can’t convert a returned 64-bit pointer to 32-bit.  But you can use the 64-bit pointer as a handle, and you can pass it to (say) RtlCopyMemory using thunking.  And anyway, most Windows APIs don’t return pointers so this is generally a non-issue.

    The point is that the old API had these same restrictions and they were a nuisance but they could be worked around.  As I said before, there’s no reason we couldn’t have a new API with the same restrictions.  I know it wouldn’t work in every possible case, but that’s not to say it wouldn’t be useful.

  22. Koro says:

    Interesting.

    Although it *could* be possible to get a 64-bit memory region visible from 32-bit, akin to how mapped file views or shared sections operate, with a little help from the kernel in the form of two APIs:

    void32* Map64bitMemory(void64* ptr64);

    void Unmap64bitMemory(void32* ptr32);

    The kernel would just have to map what’s from the higher address to one that’s guaranteed to be below 0xffffffff. No big problem there.

  23. Worf says:

    A selector and a segment are two different things.

    A segment, only valid in x86 real mode, is a 16-bit value describing a 64k region of memory. A segment:offset address is then mapped as segment*16+offset. This is hardcoded and fixed in hardware. (the CS, DS, ES, SS registors hold the segment part). And yes, you can make the same memory address different ways as there are 12 overlapping bits.

    A selector is valid in protected mode (and possibly unreal mode). In this, the segment registers now hold a selector value. The selectors can be arbitrary (and described I believe in the global descriptor table), and the entries are arbitrary. Only kernel code can update the selector. But here, there is no fixed mapping between selector and the underlying memory address – it is what the OS programs it to be.

    Windows, Linux and most other OSes out there define one selector that maps the entire 32-bit range (and thus forget about the selector period). This is because the x86 acts sane for once and gives you the flat memory model, which practically all other CPU architectures feature, and makes programming far less complex.

    You use a selector to select which segment descriptor you want (that describes a region of memory). A segment is just a region of memory.

  24. Unreal! says:

    > I thought that real mode x86 address translation worked by multiplying the selector by 16 then adding that to the 16 bit pointer.  So the highest address you could get was 0xffff * 16 + 0xffff or just over a megabyte.

    In real mode all 386+ (and probably 286 but the thing is less interesting) still use predefined selectors to simplify address translation logic. It’s just that selectors are defined with all permissions and the 4 bit offset rule.

    This leads to the ability to enter "Unreal mode" that is real mode with modified selectors which was used by old games and is still used by some BIOS functions.

    http://en.wikipedia.org/wiki/Unreal_mode

  25. Leo Davidson says:

    Out-of-process COM lets you do some of the things you might want thunking for. It allows 32-bit and 64-bit processes to call each other’s interfaces without you needing to write any extra code.

    It isn’t perfect for everything, and you may have to write your own proxy process if you need to host in-process DLLs, but I found it let me do things very quickly and easily once I had worked out what to do and how to do it.

    This even lets you run 32-bit ActiveX GUI controls as child windows of 64-bit apps. For example, there isn’t a 64-bit Flash control but I was able to write a proxy process which hosts the 32-bit Flash ActiveX control and forwards the interfaces that the control and host need to talk to each other, giving the 64-bit host program the ability to display Flash within its windows.

    I think Explorer on Vista uses the same technique to display things like Word documents in the preview pane. (There are additional reasons for using out-of-process COM there but they’re off-topic.)

  26. Magnus Hiie says:

    Note that there still is use for selectors that don’t have base of 0x00000000 and limit 0xFFFFFFFF – the fs holds a selector value that describes the range for TEB. Try looking at the selector register values and selector contents in WinDbg (e.g. "dg ds" vs "dg fs"). "!teb" should give the same address as base in "dg fs".

    By the way, does anybody know if the gs is used on Win32? If so, what is it used for? For me the value of gs seems to be always 0.

  27. Mike says:

    Magnus:  GS may very well be used in kernel mode and be a privileged kernel-mode descriptor. It’s just that by the time you see it, the privilege transition code has zeroed out the segment register.

    dumpbin /disasm ntoskrnl.exe | grep gs anyone?

  28. Mike Diack says:

    As a (D)COM programmer, I’m totally aware of the issues that Leo mentions, but using (D)COM is hardly desirable these days….

    A work project of mine is actually removing DCOM from a legacy project, precisely because it’s such a god forsaken mess wrt security issues and reliability (the infamous 40 odd second hang if a DCOM connection can’t be made which you have to work around by doing your DCOM on background threads etc UGH!…)

  29. Duke of New York says:

    If it’s in Wikipedia, it must be the truth!

  30. Paul de Vrieze says:

    Just take a look at how a function call works as implemented in the i386 system (roughly, some parts are left out).

    1. The function parameters are pushed on the stack
    2. The instruction pointer is pushed on the stack (64 bits or 32 bits)

    3. The system calls into the address of the function (32bits or 64 bits)

    4. The called function uses it’s parameters relative to the current base pointer (and as such is dependent on pointer length of the calling code) (this means a 32 bit function can not have parameters when called by 64 bit code)

    5. The called function does its job

    6. The called function returns by getting the old instruction pointer from the stack and going to that address.

    ps. There is a lot of stuff with calling conventions that I ignore here, but you can already see that (besides the impossibility of a mode switch in a function call), even just calling a function has issues (as well as 64bit code being called by 32bit code, (the address of the code could very well be unadressable in 32 bit))

  31. The one thing everybody here is forgetting is that by calibrating the kernel-mode succubus driver you can 32/64 thunk using force-based heuristics in a non-Euclidean addressing system whereby two 32-bit pointers are extrapolated using a numeric variant of Markov chains whose extrapolated segment register then matches the Rutger axiom for run-time logarithmic precalculation of directed acyclic graphs — an approach that’s been used (successfully) to address register latency issues in L2 cache.

  32. Mark says:

    codingthewheel: the alarming thing is that, apart from the succubus bit, that made sense.

  33. ton says:

    I must say that for the most part codingthewheel’s comment was pure gibberish. I mean nobody here is dumb but dude english please.

  34. mps says:

    By default, .net executables will run either 32 or 64 bit. At compile time, you can specify that an app has to be 32 or 64 bit. This is useful for apps like XNA games which have to be 32 bit, since they bind to the 32 bit DirectX libraries.

  35. @codingthewheel: There can only be one response to this…

    http://joeschwartz.net/Illusions/parastatic.htm

  36. Neil says:

    "…notice that the 16-bit address space is still fully capable of referring to every byte in the 32-bit address space…"

    But not all at once, as there are only 4096 local and 4096 global descriptors, which limits you to half a gigabyte of 16-bit address space.

Comments are closed.