Quick note about x64 Tail call

A coworker on the CLR dropped by recently, having hit a brick wall about some problem he was having with some ASM code for x64 that was trying to tail call.  He has already grep’ed through the ABI docs, and had actually hit this blog prior to knocking on my door.  The problem?  How to emit a tail call through a register such that the OS unwinder doesn’t puke.

The quick answer is to use a REX prefix extension on the JMP instruction.  For those of you that know what you’re doing, you can probably stop here.  For more details, read on…

For folks that don’t know what a tail call is, you probably haven’t ever done any functional programming before.  The Intro to CS class I took my freshman year of college was taught by a guy who really liked programming languages, and forced us (against the will of many of the guys in the class) to write a whole bunch of Scheme.  But since none of us knew Scheme, we were learning that language as we went along, and he only taught us the functional subset of scheme.  The biggest problem folks coming from the procedural programming world hit is the lack of a for loop.  Since my Scheme is really rusty, I’ll continue this exposition writing in the “functional subset of C++” :-)

The easiest way to understand the issue is to try to solve a problem.  Let’s write Factorial without using any side-effects:

BigInt Factorial(BigInt v) {
return (v <= 1) ? 1 : v * Factorial(v - 1);
}

That looks great, right?  Now go try running Factorial(15) and watch the stack explode (I’m not kidding – BigInt is a large object).  The reason the stack explodes is because Factorial doesn’t just return the value of a recursive function call.  It gets the value from the recursive function call, then it does something with it prior to the return, so the stack frame of the caller must be left alone.  You need to be able to eliminate the need to do something with the value of the function you’re recursing on, if you want to allow the stack frame to be tossed (actually, just reused):

BigInt Factorial_TailRecurse(BigInt v, BigInt current) {
return (v <= 1) ? current : Factorial_TailRecurse(v - 1, v * current);
}

BigInt Factorial(BigInt v) {
return Factorial_TailRecurse(v, 1);
}

Voila – we no longer need to do anything with the return result of Factorial_TailRecurse other than just return it.  What does that code wind up looking like, once it’s pumped through a relatively intelligent compiler back end?  Almost exactly like the more traditional C++ for loop, actually.

So, now you have a perfect understanding of what tail recursion is.  In my contrived example, however, there’s no need to do anything fancy to express that the end of your function is a tail-recursive call.  Imagine, however, that the function you’re calling from has a very large stack frame, and you’re calling doing something goofy, like writing a recursive parser.  Since you don’t want your parser to be limited to the depth of the stack, you’re making sure that you’re always writing tail-recursive code.  But what if you’re trying to call a function pointer recursively?  This is where you get stuck without REX JMP.

The OS unwinder can statically look at the Factorial_TailRecurse(v-1, v*current) function call, and recognized that the offset being jumped to is the head of this function, and therefore knows that it’s actually part of a function epilog.  If, however, instead of jmp [IP-42], it saw jmp [RAX], it wouldn’t know where this thing was actually headed.  In fact, it might be a switch statement.  So, to properly flag the tail call as being part of an epilog, you throw a REX prefix on first, and now the unwinder groks the epilog, and you’re done.

-Kev