Lightweight syntax option in F# 1.1.12.3
We're glad to announce that F# 1.1.12.3 supports the optional use of lightweight syntax through the use of whitespace to make indentation significant. At the time of this release this is an experimental feature, though it is assumed that its use will become widespread.
The F# indentation-aware syntax option is a conservative extension of the explicit language syntax, in the sense that it simply lets you leave out certain tokens such as in
and ;;
by having the parser take indentation into account. This can make a surprising difference to the readability of code.
[ Note: This feature is similar in spirit to the use of indentation by Python and Haskell, and we thank Simon Marlow (of Haskell fame) for his help in designing this feature and sketching the implementation technique. We also thank all the F# users at MSR Cambridge who've been helping us iron out the details of this feature. ]
Compiling your code with the indentation-aware syntax option is useful even if you continue to use explicit tokens, as it reports many indentation problems with your code and ensures a regular, clear formatting style. The F# library is written in this way.
In this article we call the indentation-aware syntax option the "light" syntax option. It is also occasionally called the "hardwhite" or "white" option (because whitespace is "hard", i.e. significant as far as the lexer and the parser is concerned).
The light syntax option is enabled using the #light
directive in a source file. This directive scopes over all of the subsequent text of a file.
When the light syntax option is enabled, comments are considered pure whitespace. This means the indentation position of comments is irrelevant and ignored. Comments act entirely as if they were replaced by whitespace characters.
TAB characters may not be used when the light syntax option is enabled. You should ensure your editor is configured to replace TAB characters with spaces, e.g. in Visual Studio 2005 go to "Tools\Options\Text Editor\F#\Tabs" and select "Insert spaces".
Using the light syntax option makes code clearer by doing three things:
Fewer tokens. Nearly all end-of-line separator tokens become optional in well-indented code. In particular,
;;
,in
and;
tokens can generally be omitted.Clearer disambiguation. It uses indentation to disambiguate the parsing of certain constructs, e.g. nested
if/then/else
blocks and nestedmatch
blocks. This greatly reduces the number of parentheses in code with nested branching constructs.Sanity checks. It applies additional sanity checks on formatting, reporting places where "undentation" has been used. Unindentation is where a language construct has been used at a column position that is "undented" from an enclosing construct, which breaks the important principle that nested constructs appear at increasing column positions. Some manifestations of undentation are permitted in certain positions in the language syntax.
The basic rules applied when the light syntax option is activated are shown below, illustrated by example.
|
|
|
|
|
|
|
|
Undentation. In general, nested expressions must occur at increasing column positions in indentation-aware code, called the "incremental indentation" rule. Warnings or syntax errors will be given where this is not the case. However, for certain constructs "'undentation" is permitted. In particular, undentation is permitted in the following situations:
|
|
|
More details: offside lines and contexts. Indentation-aware syntax is sometimes called the "offside rule". This pleasant terminology comes from a 1965 paper where Peter Landin introduced the idea, and derives from football (soccer), where the last defending player causes an imaginary line to be drawn across the pitch, and if an attacker is beyond this line the referee will blow the whistle and call "offside!". In F# code offside lines occur at column positions. For example, a =
token associated with let
introduces an offside line at the column of the first token after the =
token.
When a token occurs prior to an offside line, one of three things happens:
(1) enclosing constructs are terminated. This may result in a syntax error, e.g. when there are unclosed parentheses.
(2) extra delimiting tokens are inserted. In particular, when the offside line associated with the token after a
do
in awhile
...do
construct is violated, adone
token is inserted.(3) an "undentation" warning or error is given, indicating that the construct is badly formatted. This is usually simple to remove by adding extra indentation and applying standard structured formatting to your code.
When a token occurs directly on an offside line, an extra delimiting token may be inserted. For example, when a token occurs directly on the offside line of a context introduced by a let
, an appropriate delimiting separator token is inserted i.e. an in
token.
Offside lines are also introduced by other structured constructs, in particular at the column of the first token after the then
in an if
/then
/else
construct, and likewise after try
, else
, ->
and with
(in a match
/with
or try
/with
) and with
(in a type augmentation). "Opening" bracketing tokens (
, {
and begin
also introduce an offside line. In all these cases the offside line introduced is determined by the column number of the first token following the significant token. Offside lines are also introduced by let
, if
and module
. In this cases the offside line occurs at the start of the identifier.
The "light" syntax option is implemented as a pre-parse of the token stream coming from a lexical analysis of the input text (according to the lexical rules above), and uses a stack of contexts. When a column position becomes an offside line a "context" is pushed. "Closing" bracketing tokens (" )
", " }
" and "end
") automatically terminate offside contexts up to and including the context introduced by the corresponding "opening" token.
Here are some examples of the offside rule being applied to F# code:
|
|
Enjoy!
Don and James for the F# team