Heads up: this will be a bit of a long post compared to previous posts. We’ve got a lot of ground to cover.
An Abstract Syntax Tree (AST) is a way of representing a program in a way that is easy for another program to manipulate. It gets rid of all the text nonsense and instead replaces it with datatypes from the programming language. For example:
(f 1 2 3)
might be represented by a tree that looks like:
Call(Var "f", [Fixnum 1; Fixnum 2; Fixnum 3])
which of course is longer, but it allows us to more easily
In this post, we’re going to
Pair(Symbol "if", Pair(cond, Pair(iftrue, Pair(iffalse, Nil))))) and turns it into an AST (like
If(cond, iftrue, iffalse))
Creating a new type is not hard. We’ve already got
lobject, and we can reuse
that. I propose we add all the special forms as constructors for our new type:
type lobject = | Fixnum of int | Boolean of bool | Symbol of string | Nil | Pair of lobject * lobject | Primitive of string * (lobject list -> lobject) and value = lobject and name = string and exp = | Literal of value | Var of name | If of exp * exp * exp | And of exp * exp | Or of exp * exp | Apply of exp * exp | Call of exp * exp list | Defexp of def and def = | Val of name * exp | Exp of exp
The different types are recursively defined with
and, which means they can
reference one another (as
value as an alias for
lobject because really we won’t be
lobjects anymore. Instead we’ll use those as our
self-evaluating values. That is, they can be evaluated repeatedly and they
name as an alias for
string purely for the aesthetics — I
think it makes the code easier to read. It doesn’t change the type system at
all, since OCaml will accept
string in place of
exp is a rather large type that has a bunch of constructors. I’ll explain all
of them briefly:
Literalholds any self-evaluating type. It’s a container for
Varholds a variable name.
Ifholds an if-expression — condition, expression to be evaluated if true, expression to be evaluated if false.
Orhold two conditions (expressions).
Callare synonymous, for the most part. They call either primitive procedures or closures (coming soon) with the specified arguments (also called actual parameters).
Defexps are kind of weird and more of a design choice than anything else. We have two classes of expression: those that modify the environment, and those that do not. Previously, this was implicit — you had to check if the expression returned the original environment or a modified one in
eval. Now, only
Defexps (currently just
val— but we’ll add
definesoon) modify the environment. We’ll find later that having
Expas a sub-case of
Defexpmakes our implementation much cleaner, and also allows us to have a “last result” variable in the REPL at no additional cost.
Okay, excellent. Now that we’ve modified an old type and added a couple of new ones, it’s time to put them to use. Our first step will be to write a function that takes an S-expression from the reader and transforms it into an AST.
Currently our REPL looks like this:
let rec repl stm env = print_string "> "; flush stdout; let sexp = read_sexp stm in let (result, env') = eval_sexp sexp env in print_sexp result; print_newline (); repl stm env';;
Let’s add in the AST transform:
let rec repl stm env = print_string "> "; flush stdout; let ast = build_ast (read_sexp stm) in let (result, env') = eval ast env in print_val result; print_newline (); repl stm env';;
Note that I’ve also changed the name of
eval_sexp to just
Now let’s go ahead and write
build_ast! We know that we need to generate all
of the different types of
exp, so that will provide reasonable scaffolding
for our function. In general: numbers, bools, and empty lists should all
self-evaluate. Primitives and non-list pairs shouldn’t happen at this stage in
the process. The others are special.
For example, a
Symbol is probably a name, so should be a
Var. Or there are
several types of list (like
(and #t #f)) that are special.
Val is even more
special in that it has to be wrapped in a
Defexp. So there you have it:
exception ParseError of string let rec build_ast sexp = match sexp with | Primitive _ -> raise ThisCan'tHappenError | Fixnum _ | Boolean _ | Nil -> Literal sexp | Symbol s -> Var s | Pair _ when is_list sexp -> (match pair_to_list sexp with | [Symbol "if"; cond; iftrue; iffalse] -> If (build_ast cond, build_ast iftrue, build_ast iffalse) | [Symbol "and"; c1; c2] -> And (build_ast c1, build_ast c2) | [Symbol "or"; c1; c2] -> Or (build_ast c1, build_ast c2) | [Symbol "val"; Symbol n; e] -> Defexp (Val (n, build_ast e)) | [Symbol "apply"; fnexp; args] when is_list args -> Apply (build_ast fnexp, build_ast args) | fnexp::args -> Call (build_ast fnexp, List.map build_ast args) |  -> raise (ParseError "poorly formed expression")) | Pair _ -> Literal sexp
I’ve also taken the liberty of introducing
or as special forms.
Also, introducing a special
apply form so it’s easy to build lists of
arguments to functions programmatically, as in:
(apply + '(1 2)). For now,
Call are separate forms that do roughly the same thing because I
can’t think of an elegant way to make them work together with two different
types of list (the Lisp kind and OCaml kind).
This should all be reasonably self-explanatory. If it’s not, take it apart. Poke at it. Run it on some user input. See what it produces. Better yet, pause here and try and write an AST printer (it’ll be good for debugging anyway).
If you’ve either done that and come back, or decided that you understand that enough, or decided that it’ll probably make more sense as we continue — great! Let’s keep on trucking.
So we’ve got these two classes of expression. One class changes the environment and the other does not. I propose the following structure to manage this:
(* inside the repl function *) [...] let ast = build_ast (read_sexp stm) in let (res, env') = eval ast env in let () = print_endline (string_of_value res) in repl stm env' [...]
Notice that we just have one
eval can decide what kind of
expression the AST represents, then delegate to the right kind of function,
possibly mutating the environment in the process. Also notice I’ve swapped out
string_of_value; it’s cleaner to transform to a string and
print_endline than to write your own print function.
Let’s handle the delegating
let eval ast env = match ast with | Defexp d -> evaldef d env | e -> (evalexp e env, env)
Which is pretty clear. If the environment should change something, let it
evaldef will return an environment). Else, send it to
evalexp, which just
returns a value, and return the original environment unmodified.
Speaking of, let’s take a look at
let evaldef def env = match def with | Val (n, e) -> let v = evalexp e env in (v, bind (n, v, env)) | Exp e -> (evalexp e env, env)
Which looks pretty reasonable, right? If someone writes
(val x 5) they should
x to be bound in the new environment.
And this brings us to
evalexp. We should have a case for each constructor in
Defexp — but that case should just raise
ThisCan'tHappenError, since our program should never pass a
let rec evalexp exp env = let evalapply f es = match f with | Primitive (_, f) -> f es | _ -> raise (TypeError "(apply prim '(args)) or (prim args)") in let rec ev = function | Literal l -> l | Var n -> lookup (n, env) | If (c, t, f) when ev c = Boolean true -> ev t | If (c, t, f) when ev c = Boolean false -> ev f | If _ -> raise (TypeError "(if bool e1 e2)") | And (c1, c2) -> begin match (ev c1, ev c2) with | (Boolean v1, Boolean v2) -> Boolean (v1 && v2) | _ -> raise (TypeError "(and bool bool)") end | Or (c1, c2) -> begin match (ev c1, ev c2) with | (Boolean v1, Boolean v2) -> Boolean (v1 || v2) | _ -> raise (TypeError "(or bool bool)") end | Apply (fn, e) -> evalapply (ev fn) (pair_to_list (ev e)) | Call (Var "env", ) -> env | Call (e, es) -> evalapply (ev e) (List.map ev es) | Defexp d -> raise ThisCan'tHappenError in ev exp
There’s some fun with guards when evaluating
If. Guards allow us to reduce
the amount of
match expressions. We could just as easily write this instead:
[...] | If (c, t, f) -> begin match ev c with | Boolean true -> ev t | Boolean false -> ev f | _ -> raise (TypeError "(if bool e1 e2)") end [...]
And that is perhaps even faster than the alternative, unless OCaml has some
notion of function purity and automatically caches or does not recompute the
ev c. That would be neat.
Also note that
or are not short-circuiting here, but rather
evaluate both expressions and then
&& the resulting booleans. That’s because
I’m lazy and don’t want to write more code. In this language with no side
effects, the only difference is in performance.
Also note that I added in a case for
env, which would have otherwise been
homeless in this new AST representation. Since
env is more for debugging (at
least for now), I didn’t think it was important enough to give it a
Last, note that
Call use the same function,
evaluate, so they are at least unified in the evaluation layer. Just not the
In any case… that’s that! That’s our evaluator. Let’s give it a go:
$ ocaml 08_asts.ml > (env) ((pair . #<primitive:pair>) (+ . #<primitive:+>) (list . #<primitive:list>)) > (+ 3 4) 7 > (and #t #f) #f > (and #t #t) #t > (or #f #f) #f > (or #f #t) #t > (if (and #t #f) 3 4) 4 > (if (or #t #f) 3 4) 3 > (val x 3) 3 > (env) ((x . 3) (pair . #<primitive:pair>) (+ . #<primitive:+>) (list . #<primitive:list>)) > (+ x 7) 10 > (apply pair (list 3 4)) (3 . 4) > (pair 3 4) (3 . 4) > Exception: End_of_file. $
Wait, what? What’s that
list function? Oh, I took the liberty of adding it
into the environment. Its primary purpose is to build lists easily:
let basis = let rec prim_list = function |  -> Nil | car::cdr -> Pair(car, prim_list cdr) in [...] List.fold_left newprim Nil [ ("list", prim_list); ("+", prim_plus); ("pair", prim_pair) ]
Download the code here if you want to mess with it.
Next up, quote.