Tail calls in F#

Article
07/08/2011

Introduction
Several people have asked for more information on how tail calls behave in F#. This blog post will answer many frequently asked questions about tail calls, including why tail calls are important, how to identify them, and how the compiler generates code for them.

Why are tail calls important?

In F#, it’s sometimes natural to express an algorithm as a recursive function. However, in many programming languages, calls to recursive functions are treated like any other function call, and require the allocation of an additional stack frame for each call. This means that functions which make many recursive calls may cause a stack overflow. As a functional programming language, F# encourages using recursion for some tasks, so it’s important for the language to provide a mechanism to avoid overflowing the stack when using recursive functions. Tail calls are important because they can be invoked without extending the call stack, and therefore recursive algorithms which use tail calls can be called without worrying about causing a stack overflow at runtime. In many cases, it is possible for the compiler to turn tail recursion into loops, resulting in a performance profile which is equivalent to imperative code.

As a simple example, consider the List.nth function from the standard library. This function recursively walks down the list until the desired element is found (or the end of the list is reached). If this function were naively compiled using normal recursive calls, then calling List.nth to get the 100000^th element of a large list might cause a stack overflow. Instead, the compiler is able to optimize the function into the equivalent of a loop, so that no matter the length of the list, calling List.nth will not cause a stack overflow.

What, exactly, is a tail call?

A tail call is a call within a function whose result is immediately used as the output of the function (although this is phrased in terms of functions, in F# the same rules apply to .NET methods, too). That is, a tail call is a call which is in tail position, which is defined recursively as follows:

The body of a function or method
The body of an action in a match expression, where the match expression is in tail position
The body of an if, elif, or else branch, where the conditional expression is in tail position
The last expression in a sequence, where the sequence is in tail position
The body of a let or let rec binding, where the binding expression is in tail position

Here are some corresponding examples:

Expression e is in tail position in
fun() -> e
Expressions e and f are in tail position in
function
| 1 -> e
| _ -> f
Likewise, e and f are also in tail position in
fun() ->
if true then e
else f
Expression e is in tail position in
fun() ->
printfn "First statement in a sequence"
printfn "Second statement in a sequence"
e
Expression e is in tail position in
fun() ->
let x = 1
e

One example of a common mistake which results in a non-tail call is the following:

let rec sum = function
| [] -> 0
| x::xs -> x + sum xs

In this code the recursive call to sum is not a tail call. Although this call appears at the end of the function in the text of the source code, it is not in tail position since the result of the call is used by the (+) operator.

How can I verify that tail calls are being used?

The easiest way is to ensure that tail calls are being used is to understand and apply the rules from the previous section. While the F# compiler itself doesn’t currently provide any way to verify that tail calls have been used at a particular call site, you can be sure by looking at the compiled version of the code using the MSIL Dissasembler (ildasm.exe). We’ll see several examples below.

How are tail calls compiled?

When the compiler comes across a tail call, there are a few different forms that the resulting code may take. The next few sections will go into all of the gory details, but here’s a quick summary. F# will use a .NET tail call when compiling a call which is in tail position unless any of the following is true:

You turn off the --tailcalls compiler option (this is the default behavior in Visual Studio for Debug builds, to enable a better debugging experience)
The compiler can determine that no recursive paths through the call site exist, in which case it may emit a normal call for efficiency reasons
The compiler can use gotos instead of recursive calls
The call is to a first class function value returning unit
The call takes place in a try-catch or try-finally block (note that these calls aren’t really in tail position, by definition)

Compiled forms

Non-tail calls

The compiler will emit a normal (non-tail) call instruction if it can determine that no recursive call will be made directly or indirectly as a result of the call. The purpose of using tail calls is to prevent unbounded stack growth during recursive calls. Using .NET tail calls can have a performance penalty on some versions of .NET on certain platforms, so using normal calls can be more efficient, and doesn’t affect correctness since only a single additional stack frame will be used. For instance, consider this F# code:

let writeString(s:string) =
System.Console.WriteLine(s)

Here, the call to System.Console.WriteLine is in tail position. However, since this call can’t result in a recursive call to writeString, a normal call instruction is used. Here is the IL that the compiler generates:

.method public static void writeString(string s) cil managed
{
// Code size 8 (0x8)
.maxstack 8
IL_0000: nop
IL_0001: ldarg.0
IL_0002: call void [mscorlib]System.Console::WriteLine(string)
IL_0007: ret
} // end of method Program::writeString

Branch instructions

In simple recursive functions, the compiler will often optimize tail calls into loops rather than actual method calls. This allows F# code which is written using recursion to run as quickly as imperative code which uses explicit loops. For instance, consider the following code to sum a list:

let rec sum acc = function
| [] -> acc
| x::xs -> sum (acc + x) xs

The F# compiler generates the following compiled code:

.method public static int32 sum(int32 acc,
class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<int32> _arg) cil managed
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationArgumentCountsAttribute::.ctor(int32[]) = ( 01 00 02 00 00 00 01 00 00 00 01 00 00 00 00 00 )
// Code size 37 (0x25)
.maxstack 4
.locals init ([0] class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<int32> V_0,
[1] class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<int32> ns,
[2] int32 n)
IL_0000: ldarg.1
IL_0001: call instance class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<int32>::get_TailOrNull()
IL_0006: brfalse.s IL_0023
IL_0008: ldarg.1
IL_0009: stloc.0
IL_000a: ldloc.0
IL_000b: call instance class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<int32>::get_TailOrNull()
IL_0010: stloc.1
IL_0011: ldloc.0
IL_0012: call instance !0 class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<int32>::get_HeadOrDefault()
IL_0017: stloc.2
IL_0018: nop
IL_0019: ldarg.0
IL_001a: ldloc.2
IL_001b: add
IL_001c: ldloc.1
IL_001d: starg.s _arg
IL_001f: starg.s acc
IL_0021: br.s IL_0000
IL_0023: ldarg.0
IL_0024: ret
} // end of method Program::sum

The important thing to note about this code is that there is no recursive call to the sum function at all. Instead, there is an unconditional jump (using the br.s instruction, highlighted above), which makes the compiled form of this code similar to a loop. This is more efficient than using a true tail call, and maintains the benefit that no extra stack frame is created for recursive calls.

True tail calls

When potentially recursive code can’t be compiled into loops, true .NET tail calls will be used. For instance, consider the following simple example:

let apply f x = f x

This function simply applies a function to an argument. The call to f is in tail position, and it’s possible that a recursive function will be defined which passes itself as the argument f, so the F# compiler will use a tail call here:

.method public static !!b app<a,b>(class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<!!a,!!b> f,
!!a x) cil managed
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationArgumentCountsAttribute::.ctor(int32[]) = ( 01 00 02 00 00 00 01 00 00 00 01 00 00 00 00 00 )
// Code size 11 (0xb)
.maxstack 8
IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldarg.1
IL_0003: tail.
IL_0005: callvirt instance !1 class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<!!a,!!b>::Invoke(!0)
IL_000a: ret
} // end of method Program::app

The presence of the IL .tail prefix confirms that this is a tail call.

Limitations

Calling function values returning unit

Generally speaking, F# functions returning unit are translated into .NET methods returning void. However, when functions are treated as values, they are stored in objects of type (’a->’b) (which is just another name for Microsoft.FSharp.Core.FSharpFunc<’a,’b>), and they are called by calling the instance method Invoke which takes a single parameter of type ’a and has a return value of type ’b. However, this means that calling Invoke on a value of type ‘a -> unit actually returns a value of type unit (Microsoft.FSharp.Core.Unit), not void. The compiler is responsible for managing these implementation details to give users a seamless experience. Unfortunately, this mismatch between void and unit can prevent tail calls from being used in some cases.

Consider the following two functions:

let getString (f : unit -> string) = f()

let getUnit (f : unit -> unit) = f()

These very similar functions each take a function as an argument and call it. In both getString and getUnit, the call to f is in tail position.

The getString function will be compiled to a .NET method taking a single parameter of type unit->string and returning a string. The getUnit function will be compiled to a .NET method taking a single parameter of type unit->unit and returning void (not unit). Let’s look at the IL that the compiler generates for each method. First, getString:

.method public static string getString(class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class [FSharp.Core]Microsoft.FSharp.Core.Unit,string> f) cil managed
{
// Code size 11 (0xb)
.maxstack 8
IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldnull
IL_0003: tail.
IL_0005: callvirt instance !1 class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class [FSharp.Core]Microsoft.FSharp.Core.Unit,string>::Invoke(!0)
IL_000a: ret
} // end of method Program::getString

We load the function f and a null value onto the stack (to pass to f as the input value of type unit). Then, we perform the call using the Invoke method on f. As expected, this is done as a tail call.

Now, let’s look at getUnit:

.method public static void getUnit(class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class [FSharp.Core]Microsoft.FSharp.Core.Unit,class [FSharp.Core]Microsoft.FSharp.Core.Unit> f) cil managed
{
// Code size 10 (0xa)
.maxstack 8
IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldnull
IL_0003: callvirt instance !1 class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class [FSharp.Core]Microsoft.FSharp.Core.Unit,class [FSharp.Core]Microsoft.FSharp.Core.Unit>::Invoke(!0)
IL_0008: pop
IL_0009: ret
} // end of method Program::getUnit

Here, compiler cannot use a tail call to Invoke because the call isn’t immediately followed by a ret. Instead, we need to pop the dummy unit value off of the stack before returning because getUnit returns void.

Note that this issue only arises when using first-class functions as values. Calling normal .NET methods which return void doesn’t present this problem, because there is no return value to pop off of the stack.

Try-catch and try-finally

On the .NET platform, there are limitations on where tail calls may occur. One restriction is that tail calls cannot be performed in try-catch or try-finally blocks (neither in the body of the try nor in the catch or finally handlers). For instance, consider the following code which retries an operation until it succeeds (doesn’t throw an exception):

let rec retry f =
try
f()
with
_ -> retry f

Although the calls to f and retry both appear to be in tail position, the compiler can’t turn either of them into a tail call because they occur in a try-catch.

In this case, it is clear on inspection that these calls appear within a try-catch, but sometimes the issue can be more subtle. For instance, use bindings implicitly generate a try-finally around the code that follows them to ensure that the Dispose method is called on the bound value. This means that no calls following a use binding will be tail calls.

Inherent Limitations on .NET

One other thing to keep in mind is that even when the F# compiler emits a tail call, there are certain conditions where the runtime will not honor the .tail prefix:

When calling from untrusted code to trusted code
On .NET 2.0, there are several additional restrictions, particularly when targeting 64-bit runtimes (follow the link for more details, but these include limitations on calls which return value types, calls to generic methods, and cross-assembly calls, among others). Most of these restrictions have been lifted in .NET 4.0.

I hope that this information has been helpful. My next post will cover ways to work around some of the limitations covered in this post.

Regards,

Keith Battocchi

Further information on using F# across various platforms can be found at fsharp.org.