This processor has no stack (insert spooky laughter)

In the early days of computing, most processors did not have what we today would consider a "stack". Everything was done with static data.

Subroutine linkage on these systems was typically done by explicitly passing the return address to the called subroutine. The first thing the subroutine did was save this return address in a global variable somewhere, so it knew where to go when the subroutine finished.

One convention was to use self-modifying code and store the return address directly in the branch target of a jump instruction at the end of the function.

Another convention was to store the return address in a global variable somewhere, and then perform an indirect jump at the end of the function. A common place for this global variable was immediately before the first instruction of the subroutine. Indeed, some processors elevated this convention into hardware: The "Jump to subroutine" instruction automatically stored the return address at the specified target subroutine address, and then resumed execution at the word immediately following the return address. In other words every subroutine looked like this:

    .WORD 0             ; return address goes here
    blah blah           ; first actual instruction
    blah blah           ; do stuff
    JMP @MYSUBROUTINE   ; indirect jump back to caller

This subroutine linkage convention precluded recursion, because a recursive call would destroy the previous activation frame's return address. (It also precluded multithreading, but that's way beyond where we are at this point in history.)

The lack of a stack also means that nothing has automatic storage duration. Function local variables actually have static storage duration.

This is the "fast one" that I pulled last week when I discussed the COMMON statement in FORTRAN. I put local variables into a COMMON block and noted that they were being shared across functions. This implied that the local variables were static, because if they were automatic, then there would be nothing left to share once the function returned.

Comments (30)
  1. Brian_EE says:

    I’ve been using micros for over 30 years, and probably the oldest I’ve used were 8085-based single-board-computers. So, I’ve never seen a stack-less processor.

    The first thing that jumps out at me (haha, get it?) is that this technique wouldn’t work on systems that boot out of ROM, or at least you couldn’t have subroutines in your ROM code.

    1. sd says:

      If I remember correctly PowerPC doesn’t really have a stack. The register used as the stack pointer is convention and there are no push or pop instructions.

      1. Many RISC processors do not have an architectural stack, but they usually come with recommended convention for which register to treat “as if” it were a stack pointer.

    2. Brian says:

      Thanks for making me feel young; that doesn’t happen very often lately.
      I don’t think I’ve ever seen a stack-less processor either. 40 years ago I was an undergraduate taking first-year”Fortran for Engineers” and had no idea what a stack was. But, the next year I took a mini-computer course and programmed four different mini-computers in assembly language. I think one required that you do a push and jump on a subroutine call and a pop and jump on a return, but I’m pretty sure they all had stacks. After that, I started programming 8-bit microcomputers (Motorola, Intel, Zylog) and they all had stacks.

    3. Joshua says:

      The Parallax spin processor is stackless, uses self-modifying-code in the JMPRET instruction to edit the last instruction of a function to splice in the return address, and boots from ROM, and has functions in the ROM image.

      1. Brian_EE says:

        How does it “self-modify” the code in ROM? By definition, ROM contents are unalterable. Are the ROM functions copied into RAM at boot and those are what get modified?

      2. Brian_EE says:

        After some quick research, the Parallax copies it’s code from ROM to RAM as part of boot. So my initial comment stands – that is, you can’t execute subroutines out of ROM.

        1. Dave Bacher says:

          The strategy for doing it on ROM is to use a double jump.

          For example, if I put a jmp at 0x0FF, I can have the app poke the return address it wants at 0x0100. My ROM then jumps to 0x0FF consistently. This sounds terrible — but its already how you’re communicating to ROM / OS on these architectures anyway.

  2. CarlD says:

    Shades of the IBM 360 BALR instruction come to mind – Branch and Link Register. Good times :)

    1. DWalker07 says:

      And BR R14. Ah yes, the IBM 360.

    2. 12BitSlab says:

      BALR actually worked very well. There were downsides, though. With base-displacement addressing, if one wanted to use a referenced link, you were limited to a 4K space.

      For one project, I had to build — from scratch — an interpreter for S/360 all in BAL* — no privileged instructions. I ended up building my own subroutine calling package that worked well and could handle recursion. The crux of it was that I had to build my own stack routines.

      Youngins today have no idea of the low level stuff we had to do in the olden days just to get things to work.

      *BAL is Basic Assembler Language on S/360 and its follow-ons. One could call OS SVCS but there was no access to anything that typical system software would need to do.

      1. 12BitSlab says:

        Ooops – typo — 12K for referenced links. I truly am getting too old.

        1. DWalker07 says:

          Operating system modules (under 360 and 370 VM) had to be 4K or less in size, I believe, especially if they were pageable.

          1. 12BitSlab says:

            Yes, you are correct. The page size was 4K and certain routines in the OS had to fit in 4k as to minimize paging.

            One of my professors at UC wrote the debugger portion of OS/360. Originally OS/360 was going to have a user configurable page size. IBM had to drop that because of the requirement that they didn’t want certain things utilizing more than 1 page. If I remember correctly, the selectable page sizes were going to be 512B, 1K, 2K and 4K.

  3. Medinoc says:

    Oh. I thought it only meant COMMON blocks were static.

  4. Joe D says:

    Back in the long ago time, I programmed on Datapoint systems.

    They had a stack, but it was implemented in the CPU itself. There was a separate memory store of sixteen 16-bit words for the stack. This meant that you couldn’t call more than 16 levels deep. It would wrap back around and overwrite the first address if you did.

  5. Erkin Alp Güney says:

    > Another convention was to store the return address in a global variable somewhere
    Then that is your stack pointer.

    1. Joshua says:

      Nope. It’s one variable per function with predictable consequences if recursion is attempted.

  6. cheong00 says:

    Sounds like how TSR works as described in here:

  7. John Styles says:

    Hence the FORTRAN assigned GOTO.

    If you like you can look on the way this was used as a way of implementing the ‘subroutine’ ‘design pattern’.

  8. Neil says:

    At school we weren’t taught using a real microprocessor instead we had to learn a fictitious processor for which we had two similar (but subtly different) emulations.

    I always wondered why this processor had both types of subroutine instruction; I suppose it meant you could write an entire program stacklessly if you so wished, although you would obviously be unable to make use of the library ROM.

  9. John Elliott says:

    I’ve seen BIOSes where the self-test doesn’t use any RAM until it’s verified that there’s RAM there to use, so to call a subroutine they put the return address in one of the processor registers (usually BP; sometimes SI, if a SI-is-return-address subroutine in turn wants to call a BP-is-return-address subroutine).

    1. poizan42 says:

      The coreboot (or LinuxBIOS back then) project has actually made a C compiler for this – ROMCC

    2. Joshua says:

      Oh man. Talk about a constrained environment.

      I heard the newer processors boot up in a mode where the L2 or L3 cache is RAM and remains so until BIOS initializes southbridge.

    3. Patrick Star says:

      In the one I’ve worked with (Phoenix BIOS), the convention is to store the return address in BX. Also, to preserve the value of registers during a stackless function call, it uses the top 16 bits of the 32 bit registers. Remember, this all takes place very early, so it’s all 16 bit realmode – no 32 bits of address space yet and thus not much other use for the top 16 bits anyways.

      1. DWalker07 says:

        Why can’t modern processors use 32-bit addressing early on, even during booting? Or is there just no reason to?

        1. Patrick Star says:

          DWalker07: Modern x86 CPUs still start in 16 bit real mode. However, EFI (as opposed to classic BIOS) enable protected mode and a flat 32 bit address space very early – one of the very first thing they do, in fact, long before even RAM is available.

          Classic BIOSes basically stay in real mode for most of the boot – however, a bit later in the process they usually enable 32 bit addressing (using a trick called Flat Real Mode or Unreal Mode) so they can address more than 1MB of memory. After all – even just the BIOS itself is significantly larger than the original 8086 memory map assumes, and lots of more modern hardware is mapped way above 1MB, so you need that to get any sort of work done. But you still need compatibility with all the legacy code and things like VGA BIOS and option ROMs, so therefore they stay in real mode.

    4. Patrick Star says:

      It’s not just that it decides to not use RAM – it simply can’t use it until it’s been initialized. Nowadays you solve this with cache-as-RAM (though last time I looked it’s not enabled on boot – you have to enable it yourself). Back in the days before this you had to stay stackless until at least some RAM was usable, typically the bottom 1MB or so. You can actually get a surprising amount of work done in that environment!

  10. Rick C says:

    Store the return address somewhere? Looks like you missed an opportunity, Raymond: “Nothing to push here, MOV along”.

  11. Wombat says:

    It was possible to implement recursive code on an IBM 370 mainframe, despite not having a stack. What you did was start by calling a dynamic memory allocate for what was effectively a stack frame, then save all registers in that dynamic block (the space after that was for local variables. You got the effect of the stack but it cost a memory allocation call.

Comments are closed.

Skip to main content