[The twelfth in a series of posts on the evolution of TransForth]
It’s been quite fun playing with this Universal Machine from the Cult of the Bound Variable. In this post we’re going to continue the journey toward building a full Forth for this machine by assembling a Forth inner interpreter; turning this register machine into a stack machine.
Last time we finished up a single-pass macro assembler for the UM-32 (in 15 lines of F# and 40 lines of Forth!) but did nothing more than a simple “Hello, world” with it. If you remember when we built an inner interpreter for TransForth, you know that this is the heart of Forth. If you’re unclear on the mechanics of a direct threaded inner interpreter, you’d probably enjoy the video from that earlier post:
Register Machines (what a pain)
Being so used to dealing with a stack machine and essentially zero-operand instructions, it can be a bit annoying to deal with a register machine. But that’s what we have with the UM-32. Let’s assign purposes to the remaining registers (remember, z is already the zero constant and registers 1 and 2 are temp registers. Let’s allocate one more temp along with the standard inner interpreter and stack pointers.
In a way you can think of what we’re doing as implementing stack machine mechanics on top of a register machine. There are, of course, many real stack machines implemented in hardware. Koopman’s book is an excellent resource (free online) by the way. For the UM-32, we’ll need to handle the stacks ourselves along with the mechanics of walking nested sequences of stack operations.
Given the data or return stack pointer and a value register, these words will handle the stacks in a few instructions:
These don’t emit any UM-32 code at this point. Instead they are assembler “macros” used to inline these instructions as needed later.
Now we’re going to emit the inner interpreter and dictionary structure. Following that, we’ll initialize and kick things off. We’ll start by jumping to the initialization code (yet to be defined):
It happens to be patched to address 60 later and the following ends up in the image:
The triumvirate of an inner interpreter is enter,next and exit (in the past I’ve referred to them by their more classic names docol, next and dosemi).
The job of enter is to go into word definitions. It pushes the interpreter (i) register onto the return stack so we can get back out upon exit. It then points i at the body of the word and does a next.
It emits the following UM-32 code to the image (falling through to next):
To make it easy to jump to this, here’s a convenience word:
Next comes next, whose job is to jump to the next word in a sequence and advance the interpreter to the one following.
And the same kind of convenience macro:
Remember that all primitive words are plain machine code followed by a next. All secondary words begin with enter (that is, with machine code to jump to &enter) and are terminated with exit. The job of exit is to leave and rejoin a parent sequence; simply recovering the return address from the return stack and doing next from there.
Emitting an inlined pop and a jump to next:
That’s it for the inner interpreter! Just fifteen instructions (plus two for the jump as the start).
Now we want to pack a dictionary with primitive and secondary words to exercise things a bit. Using the same example as in the direct threading demo video, we’ll need literals (lit), pick and add primitives then we can define secondary dup (in terms of lit and pick) and double (in terms of dup and add).
Starting with lit, we push the literal which is packed alongside word addresses and advance the interpreter to skip ahead:
The pick word is used to pull a copy of the nth element on the stack to the top:
The add word essentially pops two values and pushes back their sum. It could be implemented as:
But a more efficient implementation just fetches the top of the stack for the second operand (without popping) and pokes back the result (without pushing):
For our final primitive in this exercise, we just need a halt. It’s just a single instruction terminating execution (so it doesn’t need to be followed by a next). Later we’ll seed the process by pointing the interpreter here so that once all the nested sequences are complete the machine halts:
Hand-packed Secondary Words
Now on to the secondary words, which are quite different. They’re a branch to enter followed by a sequence of addresses rather than of machine code. It’s starting to gain a Forth-like feel; a little more comfortable:
Notice how the literal zero is packed in there but otherwise it’s just a list of addresses. This is the essence of threaded code. If you think of the word addresses as op codes for a zero-operand stack machine, you’re not far off!
The double word is another secondary, this time defined in terms of another secondary (the dup from above) and primitives:
That’s it for the dictionary.
Now we want to prime the mechanics to run a sample program. We’ll push a 42 to the stack and execute double. This will dup the 42 by doing a 0 pick and add them together. With success we’ll end up with an 84 on the stack and halt.
To kick off the interpreter, we’ll point it at a location in memory containing the address of the halt word in the dictionary. This will be pushed to the return stack upon initially entering double and will be popped and executed upon completion just as we want:
Remember the jump at the very start of the image? That is patched to the current address and so here finally begins execution of our sample:
We’ll partition memory much as we did before with return and data stacks in high memory and the dictionary (already packed) in low memory:
And like we said, seed the interpreter pointer with a cell containing the halt address:
Emitted so far:
To set up our little scenario, we manually push a 42 to the stack, point the interpreter (word register) at doubleand branch into it. The inner interpreter takes it from there; walking the nested primitive and secondary words to completion and halting.
We can’t quite just load up this 71-cell image in the UM-32 and go however because we’re using memory at much higher addresses for the stack. We could add code to allocate and copy over the image but I think I’ll just take the easy route and pad the rest of the image with zeros; making a nice little 64KB (16K 32-bit cells) image file:
I would never have been able to get this all working without some visibility into the UM-32 in action. To get some tracing I added the following to Luke’s UM-32 implementation (inside his cycle function):
//(* Debugging let name = function 0 -> "z" | 1 -> "t" | 2 -> "y" | 3 -> "x" | 4 -> "w" | 5 -> "i" | 6 -> "s" | 7 -> "r" let an, bn, cn, a2n = (name a), (name b), (name c), (name a2) printf "%05i " finger match code with | 0u -> printfn " %s %s %s cmove %s = %s:%i if %s:%i" an bn cn an bn reg.[b] cn reg.[c] | 1u -> printfn " %s %s %s fetch %s = M[%s:%i][%s:%i]" an bn cn an bn reg.[b] cn reg.[c] | 2u -> printfn " %s %s %s store M[%s:%i][%s:%i] = %s:%i" an bn cn an reg.[a] bn reg.[b] cn reg.[c] | 3u -> printfn " %s %s %s add %s = %s:%i + %s:%i" an bn cn an bn reg.[b] cn reg.[c] | 4u -> printfn " %s %s %s mult %s = %s:%i * %s:%i" an bn cn an bn reg.[b] cn reg.[c] | 5u -> printfn " %s %s %s div %s = %s:%i / %s:%i" an bn cn an bn reg.[b] cn reg.[c] | 6u -> printfn " %s %s %s nand %s = %s:%i ~& %s:%i" an bn cn an bn reg.[b] cn reg.[c] | 7u -> printfn " halt" | 8u -> printfn " %s %s alloc new(%s:%i) -> %s" bn cn cn reg.[c] bn | 9u -> printfn " %s free %s:%i" cn cn reg.[c] | 10u -> printfn " %s echo %s:%i" cn cn reg.[c] | 11u -> printfn " %s key %s:%i" cn cn reg.[c] | 12u -> printfn " %s %s loadjump load(%s:%i), jump(%s:%i)" bn cn bn reg.[b] cn reg.[c] | 13u -> printfn "%5i %s literal %s = %i" value a2n a2n value //Console.ReadLine() |> ignore //*)
Now we can run our image and see all the gloriously gory details!
And viola! We indeed end up with 84 on the stack. Not a particularly impressive feat, but it shows that all seems to be working as expected.
To be continued…
Next, we’ll work on building an outer interpreter to process Forth source code and get out of this assembly language business. However, we’ll always keep the assembler handy and we’ll always be free to dip down to defining Forth words at this lowest level; for performance and to take advantage of machine-level functionality as needed. This spanning from bare metal all the way up to very high levels of abstraction is a unique beauty of Forth.