The Itanium’s so-called stack


Last year I alluded to the fact that the Itanium processor has two stacks. The one that is traditionally thought of as “the stack” (and the one that the sp register refers to) is a manually managed block of memory from which a function can carve out space to use during its execution. For example, if you declare a local variable like

TCHAR szBuffer[MAX_PATH];

then that buffer will go on “the stack”.

But not all local variables are on “the stack”.

Recall that the Itanium has a very large number of registers, most of which participate in function calls. Consequently, many local variables are placed into registers rather than “the stack”, and when a function is called, those registers are “squirreled away” by the processor and “unsquirreled” when the function returns. Where do they get squirreled? Well, the processor can often just squirrel them into other unused registers through a mechanism I won’t go into. (Those still interested can read Intel’s documents on the subject.) If the processor runs out of squirrel-space, it spills them into main memory, into a place known as the “register backing store”. This is another stack-like chunk of memory separate from “the stack”. (Here’s Slava Oks artistic impression of the layout of the ia64’s stacks.)

As already noted, one consequence of this dual-stack model is that a stack buffer overflow will not corrupt the return address, because the return address is not kept on “the stack”; rather, it is kept in the “squirrel space” or (in the case of spillage) in the register backing store.

Another consequence of this dual-stack model is that various tricks to locate the start of the stack will find only one of the stacks. Missing out on the other stack will cause problems if you think grovelling “the” stack will find all accessible object references.

The Itanium architecture challenges many assumptions and is much less forgiving of various technically-illegal-but-nobody-really-enforced-it-before shenanigans, some of which I have discussed in earlier entries. To this list, add the “second stack”.

Comments (6)
  1. mschaef says:

    Heh… I’m ‘honored’, I guess… :-)

    FWIW, The SIOD interpreter’s stack walk dates back a long time, and even George Carrette (the original author) spoke to portability problems in his documentation:

    "The stack and register marking code used in the mark-and-sweep GC is unlikely to work on machines that do not keep the procedure call stack in main memory at all times. It is assumed that setjmp saves all registers into the jmp_buff data structure. If your target machine architecture is radically different, such as using linked procedure call frames of some kind, not organized as a stack, then it would be best if you could find vendor-supported routines for walking these frames, such as would be utilized by a debugger. The mark_locations procedure can then be invoked multiple times with the proper start and end addresses.

    If the stack is not always aligned (in LISP-PTR sense) then the gc_mark_and_sweep procedure will not work properly unless steps are taken to work around the problem. "

    I don’t think any of this is necessarily a bad thing, but it is one of the costs a developer has to pay if they’re interested in writing or maintaining a garbage collector. You’d be hard pressed to make such a thing insensitive to platform changes, and still retain decent performance.

    Ideally it’d be possible to use the GC in the CLR, but you have to give up portability by going that route, not to mention worrying about CLR runtime versions, etc.

  2. Reuben Harris says:

    "Squirrel-space"… heh! Cutest tech term I’ve heard for a while… it sounds like something Scott Adams would coin.

    And how does one ‘grovel’ a stack? :-)

  3. James Curran says:

    Years ago, I programmed (in Assembler) on the Z-8000 Chip. Like the 8088 which would power the original IBM PC, the Z-8000 was an evolutary growth from the 8080, but Zilog took it in a very different direction than Intel.

    It didn’t have an SP register at all, partly because registers were numbered, but mostly because you could PUSH/POP off of any register. CALL automatically PUSHed the return address onto R15, but if you wanted to save a register to the stack, you had to specify which register you wanted to use as the SP:

    PUSH @R15, R1

  4. Marco says:

    That’s pretty cool! The register stack thingy reminds me of the imaginary MMIX architecture that Donald Knuth invented for educational purposes. Any Knuth fans here?

  5. Dave says:

    Just about every architecture is more regular and sensible than what Intel has wrought. After hearing some of these features of the Itanium, it’s no wonder that Microsoft pushed Intel to develop an x86-64 architecture chip. Or at least that’s my interpretation of things…

  6. You’ve run out of address space.

Comments are closed.