Isn’t it about time we were able to define our own functions? I think so. There’s just one problem: some of the plumbing we’ve built so far doesn’t suit what we’re making in this post. Our environments, for example, won’t support recursive functions — so we’ll need to fix the implementation. Also, fair warning… this is the longest post yet!
Say we’re defining a factorial function. It might look like this:
(define fact (x) (if (< x 2) 1 (* x (fact (- x 1)))))
So in order to evaluate the definition and bind fact, we’d have to do the
following:
define in a closure (don’t worry about the exact
mechanics just yet)fact in an environment, just like we do when evaluating
valThat’s all fine and dandy until we try and execute fact, at which point the
interpreter will halt with a NotFound exception. After all, how can fact
possibly know what it itself is? It can’t, because our environments store
environments as Pairs of Symbol x value, so we’d have to infinitely
recursively store the contents of the function in nested environments all the
way down. And that takes up too much space.
Or would we have to? We could try fetching the formal parameters from the closure environment and the recursive function calls from the call-time environment… but that won’t work, because that function might have been redefined in the meantime, like so:
; Assume for a moment we've defined <, *, - primitives
(define fact (x) (if (< x 2) 1 (* x (fact (- x 1)))))
(val x fact)
(define fact (x) 5)
(x 3) ; == (* 3 (fact (- 3 1)))
; == (* 3 5)
; == 15
So we’ve got to fix them.
In C, we solve problems of recursive data storage with pointers. If, for example, you need to represent a binary search tree, you might do so like this:
struct TreeNode {
void *data;
struct TreeNode *left;
struct TreeNode *right;
};
Even if we wanted to, we couldn’t store actual TreeNodes for the left and
right subtrees of any given node, because (as above) this would take infinite
space.
In OCaml, we’ll solve this problem with refs, which are sort of like OCaml’s
well-typechecked version of pointers. 1
I propose that we change our environment to have the following type:
type 'a env = (string * 'a option ref) list
The change to the built-in list type from our use of Pairs is largely due
to lobject’s inability to hold option or ref. It comes with benefits,
most notably the decrease in code complexity throughout the interpreter.
That’s a lot of gymnastics for a lousy environment, but here’s the rationale, in brief: we need to fix the steps from above (reproduced here with modifications in bold):
define in a closure (don’t worry about the exact
mechanics just yet), binding fact to a ref of unspecified valuefact to the closure from above, just like we do when
evaluating valref from step 1 to point to the resulting closureThese updates allow us to “close the loop” and make fact self-aware. In our
case, “unspecified value” means using the option type
2, and we’ll have a ref pointing to that.
Look for these modifications and additions to the environment functions below.
Earlier I said not to worry about the exact mechanics of closures just yet. Well, time to worry about the exact mechanics. First, the how.
A lambda is an anonymous function, defined like so:
(lambda (x) (+ x 1))
; ^ the expression body
; ^ the formal parameters
; ^ the word lambda
This anonymous function takes in a value (called x internally) and adds one
(1) to it. Evaluating a lambda-expression results in a closure. A closure is
composed of three things:
x)(+ x 1))The captured environment is key, and is what allows us to do things like
recursive functions, curried functions, and a bunch of other fancy stuff. In
order to make a closure, we can simply capture the environment at the point
where the lambda is being evaluated.
A closure is a value, and therefore self-evaluating. They are only called into action when applied to values, at which point:
And that’s closures in a nutshell. So let’s hop to it.
We’ll need to have lambda expressions and closure values, I suppose, so let’s
add those:
type lobject =
| Fixnum of int
| Boolean of bool
| Symbol of string
| Nil
| Pair of lobject * lobject
| Primitive of string * (lobject list -> lobject)
| Quote of value
| Closure of name list * exp * value env (* NEW! *)
and value = lobject
and name = string
and exp =
| Literal of value
| Var of name
| If of exp * exp * exp
| And of exp * exp
| Or of exp * exp
| Apply of exp * exp
| Call of exp * exp list
| Lambda of name list * exp (* NEW! *)
| Defexp of def
Note how the only difference between Lambda and Closure is the attached
value env.
But so far we don’t actually have a means to construct Lambda expressions, so
let’s go ahead and add that into build_ast:
let rec build_ast sexp =
match sexp with
| Primitive _ | Closure _ -> raise ThisCan'tHappenError
| Fixnum _ | Boolean _ | Nil | Quote _ -> Literal sexp
| Symbol s -> Var s
| Pair _ when is_list sexp ->
(match pair_to_list sexp with
[...]
| [Symbol "val"; Symbol n; e] -> Defexp (Val (n, build_ast e))
| [Symbol "lambda"; ns; e] when is_list ns ->
let err () = raise (TypeError "(lambda (formals) body)") in
let names = List.map (function Symbol s -> s | _ -> err ())
(pair_to_list ns)
in Lambda (names, build_ast e)
[...]
We’ve got to make sure of a couple things:
Symbol names for formal parametersI’ve included a handy-dandy error function, tersely dubbed err, that will
raise an exception if any of the supposed “formals” are not, in fact, formals.
And that’s all well and good if we can build it, but right now evalexp
doesn’t know how to handle it. So let’s make some Closures!
let rec evalexp exp env =
[...]
let rec ev = function
[...]
| Call (Var "env", []) -> env_to_val env
| Call (e, es) -> evalapply (ev e) (List.map ev es)
| Lambda (ns, e) -> Closure (ns, e, env)
| Defexp d -> raise ThisCan'tHappenError
in ev exp
Creating closures is surprisingly easy to do! Now before we try and evaluate them, let’s get cracking with some environment code.
bind remains largely the same:
let bind (n, v, e) = (n, ref (Some v))::e
Only now we’re sprinkling in refs and options. We’ll also introduce two
other functions, mkloc and bindloc, which we’ll use later on:
let mkloc () = ref None
let bindloc (n, vor, e) = (n, vor)::e
(* Signature of bindloc: *)
val bindloc : name * 'a option ref * a env -> 'a env
Next comes lookup, which is also modified to deal with refs and options:
exception UnspecifiedValue of string
let rec lookup = function
| (n, []) -> raise (NotFound n)
| (n, (n', v)::_) when n=n' ->
begin
match !v with
| Some v' -> v'
| None -> raise (UnspecifiedValue n)
end
| (n, (n', _)::bs) -> lookup (n, bs)
We’ve got one more case to handle now — when the value at a location is
None. In that case, we should raise an UnspecifiedValue exception.
Otherwise, we’ll just do as we did before, albeit now with some more
complexity.
Last, we’ve got a bindlist function whose sole purpose is to bind a bunch of
names to a bunch of values (putting them in option refs along the way).
let bindlist ns vs env =
List.fold_left2 (fun acc n v -> bind (n, v, acc)) env ns vs
That’s all for environments… for now.
Okay but we still don’t know how to evaluate Closures… currently Call and
Apply both use evalapply, but that only has a case for Primitives. It’ll
need to be able to handle Closures as well:
let rec evalexp exp env =
let evalapply f vs =
match f with
| Primitive (_, f) -> f vs
| Closure (ns, e, clenv) -> evalexp e (bindlist ns vs clenv)
| _ -> raise (TypeError "(apply prim '(args)) or (prim args)")
in
[...]
Ah, yes. This is where bindlist comes in handy. Bind the names of the formals
to the values of the actuals, add them to the captured Closure environment…
and then just call evalexp. That’s it! How crazy is that?? Now we can build
functions quite easily:
$ ocaml 10_closures.ml
> (val addone (lambda (x) (+ x 1)))
#<closure>
> (addone 4)
5
> Exception: End_of_file.
$
I’ve glossed over the printing of closures, since (as you can see) all it
requires is printing #<closure>. You could make it fancier and actually print
out the lambda and captured environment, but I think you’ll find most of the
time that’s just too much output.
Astute readers will at this point notice that we haven’t yet supported recursion, or really even done anything notably different with the new implementation of environments. Yes, well. That’s in the next section.
defineWe have no way to make the values bound with val self-aware, so we’re going
to do that now with define. Let’s add a new type of def:
[...]
and def =
| Val of name * exp
| Def of name * name list * exp (* NEW! *)
| Exp of exp
Note that Def that very much mirrors the syntax we used before:
(define f (x) (+ x 1))
; Def ("f", ["x"], Call(Literal (Primitive "+"), [Var "x"; Literal (Fixnum 1)))
But we don’t yet have a way to build Defs, so let’s go ahead and do that.
We’re going to need to add yet another case in build_ast that looks very
similar for the case that builds Lambdas:
let rec build_ast sexp =
[...]
(match pair_to_list sexp with
[...]
| [Symbol "lambda"; ns; e] when is_list ns ->
let err () = raise (TypeError "(lambda (formals) body)") in
let names = List.map (function Symbol s -> s | _ -> err ())
(pair_to_list ns)
in Lambda (names, build_ast e)
| [Symbol "define"; Symbol n; ns; e] ->
let err () = raise (TypeError "(define name (formals) body)") in
let names = List.map (function Symbol s -> s | _ -> err ())
(pair_to_list ns)
in Defexp (Def (n, names, build_ast e))
[...]
We’ve got nearly the same structure — the only differences now are that we
have to capture the name of the function we’re defining inside the Def and
then wrap the Def in a Defexp.
Now that we can build Defs we should also be able to evaluate them. So let’s
head on over to evaldef, where we’re add a new case. I’m going to reproduce
the English steps I outlined above and then give the code so you can see how
well they translate over.
define in a closure binding the function name to
a ref of unspecified valuevalref from step 1 to point to the resulting closurelet evaldef def env =
match def with
| Val (n, e) -> let v = evalexp e env in (v, bind (n, v, env))
| Def (n, ns, e) ->
let (formals, body, cl_env) =
(match evalexp (Lambda (ns, e)) env with
| Closure (fs, bod, env) -> (fs, bod, env)
| _ -> raise (TypeError "Expecting closure."))
in
let loc = mkloc () in
let clo = Closure (formals, body, bindloc (n, loc, cl_env)) in
let () = loc := Some clo in
(clo, bindloc (n, loc, env))
| Exp e -> (evalexp e env, env)
The only tricky thing is the match, which should be unnecessary — we know
that there’s only one way a Lambda can evaluate: to a Closure! But OCaml’s
type system doesn’t know that. In order to make it understand, we’d need to
convert our types to polymorphic variants. 3 Only then
could we write something like:
[...]
let `Closure (formals, body, cl_env) = evalexp (`Lambda (ns, e)) env in
[...]
And avoid that whole headache. Perhaps we’ll convert later.
In any case, that’s it. That’s lambdas, closures, define — all taken care of.
If you’re left wondering how we’re supposed to have the env special form now
that we’ve completely changed how our environments work, swell. All the more
reason to download the code here and mess with it.
Next up, primitives III.
refs are probably not implemented as you’d expect. Since fields of OCaml
records are the only things in the language that can be declared mutable with
the mut keyword, a ref is a record with just one field: contents.
Declared mutable, of course. ↩
Here is the API for OCaml’s
option type, and also some reading on
Haskell’s Maybe monad. ↩
Polymorphic variants are fun because they solve a bunch of nasty little type problems we have. They’re also not fun because they’re probably unfamiliar to OCaml newcomers, and require typing a bunch of backticks everywhere. Here’s an interesting StackOverflow question about them if you want to read some more. ↩