[Part 7 of the FScheme series]
The Soul of Scheme
We’re now getting into the language-oriented features of Scheme. This is why Scheme is one of my very favorite languages. Scheme is of course a multi-paradigm language; functional, imperative (next post), object oriented (soon), but most wonderfully and uniquely language-oriented. In object-oriented programming the approach is to extend the type system to your problem domain and then solve the domain-specific problem at hand using these new types. In language-oriented development you first extend the language itself (not just the type system) with domain specific constructs and then solve the problem. All of the recent hype around DSLs is old hat for Lisp guys.
Lisp has been called a “programmable programming language" and this language-oriented approach is what Alan Kay meant when he said, “Lisp isn't a language, it's a building material." I love how Paul Graham phrased the whole creative experience in his book On Lisp: “Language and program evolve together. Like the borders between two warring states, the boundary between language and program is drawn and redrawn, until eventually it comes to rest along the mountains and rivers, the natural frontiers of your problem.”
F# can also be used in a language-oriented style by way of quotations. There's a chapter on the subject in Don Syme's Expert F#. Thomas Petricek has done some amazing work with Web Tools using F# quotations. We will be adding quotations to our Scheme implementation in this post. The difference between F# and Scheme quotations though is that in F# a quotation becomes an expression tree; an AST representation normally used internally as output from the parser and input to the compiler. It expresses all of the same basic concepts but in an entirely different structure to that of the surface syntax of the language in which you’re used to working.
Scheme on the other hand can be thought of as having no surface syntax at all! It’s as if the syntax is just a straight serialization of the AST. This makes manipulating program structure just as natural as manipulating data structure. The structure is one and the same. In Scheme, program fragments and data are both just lists and can be tinkered with equally using ‘cons’, ‘car’ and ‘cdr’. This one-to-one correspondence between internal program representation and syntactic representation is termed homoiconicity and is a key distinguishing feature of the Lisp family. McCarthy originally planned to add M-expression syntax on top of S-expressions but never did and honestly people who really use Lisp for serious work don’t want any syntax. People poke fun at all of the parenthesis but they’re there for good reason.
To enable working with program fragments as data, we’ll first introduce ‘quote’ which will be a special form that returns its argument as plain data without evaluating. Additionally, we’ll allow for marking pieces within a quoted structure with ‘unquote’ which will cause just those pieces to be evaluated within.
and Quote env =
let rec unquote = function
| List([Symbol("unquote"); e]) -> eval env e
| List(Symbol("unquote") :: _) -> failwith "Malformed 'unquote'." // too many args
| List(h :: t) -> List(h :: (List.map unquote t)) // recurse
| e –> e
function [e] -> unquote e | _ -> failwith "Malformed 'quote'."
This lets us do things like (quote (* 2 (- 5 2))) and instead of getting 6, we get nested list structure containing symbols and numbers; literally: (* 2 (- 5 2)). If we want to evaluation just part of it we could say (quote (* 2 (unquote (- 5 2)))) and get (* 2 3). Notice the ‘unquote’ is more like a keyword that only has meaning within a quoted expression.
Using ‘quote’ and ‘unquote’ is extremely common and could use a more optimized syntax. Yes, I did just say “syntax.” This is one place where we will consider breaking the homoiconicity and introduce special syntax. Internally (in the AST), we will still have ‘quote’ and ‘unquote’ symbols in function application form (and you’re free to stick with that), but the syntax will be simplified to an apostrophe or comma preceding expressions to be quoted or unquoted respectively. This can be handled at lexing/parsing time. First add to the tokenizer:
type Token =
| Open | Close
| Quote | Unquote
| Number of string
| String of string
| Symbol of string
let rec tokenize' acc = function
| w :: t when Char.IsWhiteSpace(w) -> tokenize' acc t // skip whitespace
| '(' :: t -> tokenize' (Open :: acc) t
| ')' :: t -> tokenize' (Close :: acc) t
| '\'' :: t -> tokenize' (Quote :: acc) t
| ',' :: t -> tokenize' (Unquote :: acc) t
| '"' :: t -> // start of string
let s, t' = string "" t
tokenize' (Token.String(s) :: acc) t'
| '-' :: d :: t when Char.IsDigit(d) -> // start of negative number
let n, t' = token ("-" + d.ToString()) t
tokenize' (Token.Number(n) :: acc) t'
| '+' :: d :: t | d :: t when Char.IsDigit(d) -> // start of positive number
let n, t' = token (d.ToString()) t
tokenize' (Token.Number(n) :: acc) t'
| s :: t -> // otherwise start of symbol
let s, t' = token (s.ToString()) t
tokenize' (Token.Symbol(s) :: acc) t'
|  -> List.rev acc // end of list terminates
Then the parser. We factor our the Open match a bit and reuse for Quote :: Open and Unquote :: Open:
let rec list f t acc =
let e, t' = parse'  t
parse' (List(f e) :: acc) t'
and parse' acc = function
| Open :: t -> list id t acc
| Close :: t -> (List.rev acc), t
| Quote :: Open :: t -> list (fun e -> [Symbol("quote"); List(e)]) t acc
| Quote :: h :: t -> parse' (List([Symbol("quote"); map h]) :: acc) t
| Unquote :: Open :: t -> list (fun e -> [Symbol("unquote"); List(e)]) t acc
| Unquote :: h :: t -> parse' (List([Symbol("unquote"); map h]) :: acc) t
| h :: t -> parse' ((map h) :: acc) t
|  -> (List.rev acc), 
Now instead of the heavy looking (quote (* 2 (unquote (- 5 2)))), we can say ‘(* 2 ,(- 5 2)) and yield the same (* 2 3). Or how about (let ((x 'rain) (y 'spain) (z 'plain)) '(the ,x in ,y falls mainly on the ,z)). What do you think that resolves to? It’s much nicer looking syntax indeed.
Quotations are useful for manipulating program structure as lists but then we need a way of explicitly executing our manipulations. This is where ‘eval’ comes in. The semantics are a little tricky. First it evaluates a quotation against the passed in environment (the call-time environment). The result of that will be the literal expression (with unquoted pieces resolved). Then that is evaluated again but in the global environment:
and Eval env = function [args] -> args |> eval env |> eval environment | _ -> failwith "Malformed 'eval'."
So now we can say (eval ‘(* 2 ,(- 5 2))) and get 6.
Now for the grand finale. We want to be able to introduce our own special forms into the language; to program the programming language. It is true that ‘lambda’ allows us to add things that look like first class primitives; nearly indistinguishable from built-in functions. But there are limitations.
For example, let’s try to build ‘and’ which will take two arguments and return true (1) if both are true. The catch is that it should have the same semantics of the real ‘and’ which would be a special form that doesn’t evaluate its arguments up front but instead “short circuits” in that if the first argument evaluates to false it shouldn’t bother to evaluate the second. This would be partially for efficiency but more importantly to maintain workable semantics. What if, for example, the second argument were a recursive call and the ‘and’ expression was the base case? Eagerly evaluating could cause an infinite loop!
Using only ‘lambda’, we could try (let ((and (lambda (a b) (if a (if b 1 0) 0)))) (and 0 BOOM)). Oops, we see an error (No binding for 'BOOM'.) even though it should not have touched that, once it realized that the first argument in (and 0 BOOM) is false (0). The ‘if’ is indeed a special form and short-circuits, but the issue is that all of the arguments to ‘and’ are evaluated before calling the lambda. That’s the nature of applicative order languages (lazy languages like Haskell need no “special form” distinction).
Let’s try again. This time we’ll quote the arguments and explicitly eval them within the lambda: (let ((and (lambda (a b) (if (eval a) (if (eval b) 1 0) 0)))) (and '0 'BOOM)). Well that works (we get 0 and no complaint about missing bindings) but how very annoying to expose the implementation details to the callers. We’d much rather the quoting at the call site to be implicit. Enter ‘macro’:
and Macro env = function
| [List(parameters); body] –>
let closure env' args =
// bind parameters to actual arguments (but unevaluated, unlike lambda)
let bindings = List.zip parameters args
let bind = function Symbol(p), a -> p, ref a | _ -> failwith "Malformed 'macro' parameter."
let env'' = List.map bind bindings |> extend env // extend the captured definition-time environment
eval env'' body |> eval env' // eval against bound args, then again in the caller's environment
| _ -> failwith "Malformed 'macro'."
This is much the same as Lambda but it binds the arguments in a new environment frame without evaluating them in the calling environment. It then evaluates the body in that extended environment as usual. Then finally evaluates the result against the captured environment. In other words, a macro’s arguments are treated as implicitly quoted and are bound as literals (without evaluating). The result of evaluating the body is expected to be a modified program structure that’s then evaluated again to get a result.
Now we can rewrite our ‘and’ as (let ((and (macro (a b) '(if ,a (if ,b 1 0) 0)))) (and 0 BOOM)). Notice that the macro uses quoting and unquoting internally but callers can treat ‘and’ exactly as if it were a primitive special form. By the way, we can define ‘or’ as well with similar short-circuit semantics: (let ((or (macro (a b) '(if ,a 1 (if ,b 1 0))))) (or 1 BOOM)). Pretty neat stuff.
One of the super-grand finales of the whole series will be to come back and replace a bunch of our primitive special forms (written in F#) with a library written in FScheme itself!
case "(quote (* 2 3))" "(* 2 3)" // quote primitive
case "'(* 2 3)" "(* 2 3)" // quote primitive with sugar
case "(eval '(* 2 3))" "6" // eval quoted expression
case "(quote (* 2 (- 5 2)))" "(* 2 (- 5 2))" // quote nested
case "(quote (* 2 (unquote (- 5 2))))" "(* 2 3)" // quote nested unquote
case "'(* 2 ,(- 5 2))" "(* 2 3)" // quote nested unquote with sugar
case "(quote (quote 1 2 3))" "(quote 1 2 3)" // quote special form
case "(let ((x 'rain) (y 'spain) (z 'plain)) '(the ,x in ,y falls mainly on the ,z))"
"(the rain in spain falls mainly on the plain)" // quote/unquote
case "(let ((* -)) (eval '(* 3 3)))" "9" // eval uses top-level environment
case "(let ((or (macro (a b) '(if ,a 1 (if ,b 1 0))))) (or 1 BOOM))" "1" // macro as special form
case "(let ((and (macro (a b) '(if ,a (if ,b 1 0) 0)))) (and 0 BOOM))" "0" // macro as special form
By the way, all of this is up on CodePlex if you want to play around.