Lightweight syntax option in F# 1.1.12.3

We're glad to announce that F# 1.1.12.3 supports the optional use of lightweight syntax through the use of whitespace to make indentation significant.  At the time of this release this is an experimental feature, though it is assumed that its use will become widespread.

The F# indentation-aware syntax option is a conservative extension of the explicit language syntax, in the sense that it simply lets you leave out certain tokens such as in and ;; by having the parser take indentation into account. This can make a surprising difference to the readability of code. 

[ Note: This feature is similar in spirit to the use of indentation by Python and Haskell, and we thank Simon Marlow (of Haskell fame) for his help in designing this feature and sketching the implementation technique. We also thank all the F# users at MSR Cambridge who've been helping us iron out the details of this feature. ]

Compiling your code with the indentation-aware syntax option is useful even if you continue to use explicit tokens, as it reports many indentation problems with your code and ensures a regular, clear formatting style. The F# library is written in this way.

In this article we call the indentation-aware syntax option the "light" syntax option. It is also occasionally called the "hardwhite" or "white" option (because whitespace is "hard", i.e. significant as far as the lexer and the parser is concerned).

The light syntax option is enabled using the #light directive in a source file. This directive scopes over all of the subsequent text of a file.

When the light syntax option is enabled, comments are considered pure whitespace. This means the indentation position of comments is irrelevant and ignored. Comments act entirely as if they were replaced by whitespace characters.

TAB characters may not be used when the light syntax option is enabled. You should ensure your editor is configured to replace TAB characters with spaces, e.g. in Visual Studio 2005 go to "Tools\Options\Text Editor\F#\Tabs" and select "Insert spaces".

Using the light syntax option makes code clearer by doing three things:

  • Fewer tokens. Nearly all end-of-line separator tokens become optional in well-indented code. In particular, ;; , in and ; tokens can generally be omitted.

  • Clearer disambiguation. It uses indentation to disambiguate the parsing of certain constructs, e.g. nested if/then/else blocks and nested match blocks. This greatly reduces the number of parentheses in code with nested branching constructs.

  • Sanity checks. It applies additional sanity checks on formatting, reporting places where "undentation" has been used. Unindentation is where a language construct has been used at a column position that is "undented" from an enclosing construct, which breaks the important principle that nested constructs appear at increasing column positions. Some manifestations of undentation are permitted in certain positions in the language syntax.

The basic rules applied when the light syntax option is activated are shown below, illustrated by example.

 // When the light syntax option is // enabled top level expressions do not// need to be delimited by ';;' since every construct // starting at first column is implicitly a new // declaration. NOTE: you still need to enter ';;' to // terminate interactive entries to fsi.exe, though // this is added automatically when using F# // Interactive from Visual Studio.#lightprintf "Hello"printf "World"
 // Without the light syntax option the // source code must contain ';;' to separate top-level // expressions. //////printf "Hello";;printf "World";;
 // When the light syntax option is // enabled 'in' is optional. The token after the '=' // of a 'let' definition begins a new block, where // the pre-parser inserts an implicit separating 'in'// token between each 'let' binding that begins at // the same column as that token.#lightlet SimpleSample() =    let x = 10 + 12 - 3     let y = x * 2 + 1      let r1,r2 = x/3, x%3     (x,y,r1,r2)
 // Without the light syntax option 'in' // is very often required. The 'in' is optional when // the light syntac option is used.////let SimpleSample() =    let x = 10 + 12 - 3 in    let y = x * 2 + 1 in     let r1,r2 = x/3, x%3 in    (x,y,r1,r2)
 // When the light syntax option is // enabled 'done' is optional and the scope of // structured constructs such as match, for, while // and if/then/else is determined by indentation.#lightlet FunctionSample() =    let tick x = printf "tick %d\n" x     let tock x = printf "tock %d\n" x     let choose f g h x =         if f x then g x else h x     for i = 0 to 10 do         choose (fun n -> n%2 = 0) tick tock i     printf "done!\n" 
 // Without the light syntax option // 'done' is requiredlet FunctionSample() =    let tick x = printf "tick %d\n" x in     let tock x = printf "tock %d\n" x in     let choose f g h x =         if f x then g x else h x in     for i = 0 to 10 do        choose (fun n -> n%2 = 0) tick tock i     done;    printf "done!\n" 
 // When the light syntax option is // enabled the scope of if/then/else is implicit from // indentation.#lightlet ArraySample() =    let numLetters = 26     let results = Array.create numLetters 0     let data = "The quick brown fox"     for i = 0 to data.Length - 1 do         let c = data.Chars(i)          let c = Char.ToUpper(c)          if c >= 'A' && c <= 'Z' then             let i = Char.code c - Char.code 'A'              results.[i] <- results.[i] + 1    printf "done!\n" 
 // Without the light syntax option // 'begin'/'end' or parentheses are often needed // to delimit structured language constructslet ArraySample() =    let numLetters = 26 in     let results = Array.create numLetters 0 in     let data = "The quick brown fox" in     for i = 0 to data.Length - 1 do         let c = data.Chars(i) in         let c = Char.ToUpper(c)  in         if c >= 'A' && c <= 'Z' then begin            let i = Char.code c - Char.code 'A' in             results.[i] <- results.[i] + 1        end    done;    printf "done!\n" 

Undentation. In general, nested expressions must occur at increasing column positions in indentation-aware code, called the "incremental indentation" rule. Warnings or syntax errors will be given where this is not the case. However, for certain constructs "'undentation" is permitted. In particular, undentation is permitted in the following situations:

 // The bodies of functions may be undented// from the 'fun' or 'function' symbol. This means the 'fun' is// ignored when determining whether the body of the function // satisfies the incremental indentation rule. The block// may not undent further than the next significant construct.#lightlet HashSample(tab: Collections.HashTable<_,_>) =    tab.Iterate (fun c v ->         printf "Entry (%O,%O)\n" c v) 
 // The bodies of a '(' ... ')' or // 'begin' ... 'end' may be undented when the expressions// follow a 'then' or 'else'. They may not undent further // than the 'if'.#lightlet IfSample(day: System.DayOfWeek) =    if day = System.DayOfWeek.Monday then (        printf "I don't like Mondays"    )
 // Likewise the bodies of modules and module types// delimited by 'sig' ... 'end', 'struct' ... 'end' and // 'begin' ... 'end' may be undented, e.g.#lightmodule MyNestedModule = begin   let one = 1   let two = 2end

More details: offside lines and contexts. Indentation-aware syntax is sometimes called the "offside rule". This pleasant terminology comes from a 1965 paper where Peter Landin introduced the idea, and derives from football (soccer), where the last defending player causes an imaginary line to be drawn across the pitch, and if an attacker is beyond this line the referee will blow the whistle and call "offside!". In F# code offside lines occur at column positions. For example, a = token associated with let introduces an offside line at the column of the first token after the = token.

When a token occurs prior to an offside line, one of three things happens:

  • (1) enclosing constructs are terminated. This may result in a syntax error, e.g. when there are unclosed parentheses.

  • (2) extra delimiting tokens are inserted. In particular, when the offside line associated with the token after a do in a while...do construct is violated, a done token is inserted.

  • (3) an "undentation" warning or error is given, indicating that the construct is badly formatted. This is usually simple to remove by adding extra indentation and applying standard structured formatting to your code.

When a token occurs directly on an offside line, an extra delimiting token may be inserted. For example, when a token occurs directly on the offside line of a context introduced by a let, an appropriate delimiting separator token is inserted i.e. an in token.

Offside lines are also introduced by other structured constructs, in particular at the column of the first token after the then in an if/then/else construct, and likewise after try, else, -> and with (in a match/with or try/with) and with (in a type augmentation). "Opening" bracketing tokens ( , { and begin also introduce an offside line. In all these cases the offside line introduced is determined by the column number of the first token following the significant token. Offside lines are also introduced by let, if and module. In this cases the offside line occurs at the start of the identifier.

The "light" syntax option is implemented as a pre-parse of the token stream coming from a lexical analysis of the input text (according to the lexical rules above), and uses a stack of contexts. When a column position becomes an offside line a "context" is pushed. "Closing" bracketing tokens (" ) ", " } " and "end") automatically terminate offside contexts up to and including the context introduced by the corresponding "opening" token.

Here are some examples of the offside rule being applied to F# code:

 // 'let' and 'type' declarations in // modules must be precisely aligned.#lightlet x = 1 let y = 2  <-- unmatched 'let'let z = 3   <-- warning FS0058: possible                  incorrect indentation: this token is offside of                   context at position (2:1)
 // The '|' markers in patterns must align.// The first '|' should always be inserted. Note: a future revision// may also permit the optional complete omission of the '|' markers.#lightlet f () =     match 1+1 with     | 2 -> printf "ok"  | _ -> failwith "no!"   <-- syntax error

Enjoy!

Don and James for the F# team