Writing a Lisp, Part 15: I/O

March 18, 2017

While it’s great that our language is Turing-complete and all, it’s still pretty useless if it can’t interact with the outside world. So we’re going to add four primitive functions to our language: getchar, print, itoc, and cat. The first is probably pretty familiar from C, the second is probably familiar from Python, the third one is a short name for “int to char”, and the last one concatenates two strings. itoc allows the programmer to work with arbitrary ASCII values and make characters such as newline (10), space (32), etc, and cat allows the programmer to create symbols character-by-character.

Why have print instead of putchar? Well, we don’t have a way to manipulate symbols on a character level right now, so we might as well be able to print whole objects.

We’re also going to add a built-in value to the basis, empty-symbol. Right now there is no way that I know of to build an empty symbol. That is, there is no way of reading something whose value is (Symbol ""). In order to write a some functions easily, that will be a necessity.

let basis =
  [...]
  let prim_getchar = function
    | [] ->
        (try Fixnum (int_of_char @@ input_char stdin)
        with End_of_file -> Fixnum (-1))
    | _ -> raise (TypeError "(getchar)")
  in
  let prim_print = function
    | [v] -> let () = print_string @@ string_val v in Symbol "ok"
    | _ -> raise (TypeError "(print val)")
  in
  let prim_itoc = function
    | [Fixnum i] -> Symbol (stringOfChar @@ char_of_int i)
    | _ -> raise (TypeError "(itoc int)")
  in
  let prim_cat = function
    | [Symbol a; Symbol b] -> Symbol (a ^ b)
    | _ -> raise (TypeError "(cat sym sym)")
  in
  let newprim acc (name, func) =
    bind (name, Primitive(name, func), acc)
  in
  List.fold_left newprim ["empty-symbol", ref (Some (Symbol ""))] [
    [...]
    ("getchar", prim_getchar);
    ("print", prim_print);
    ("itoc", prim_itoc);
    ("cat", prim_cat);
  ]

getchar expects no arguments and returns a character from standard input. Since we have no way of detecting exceptions raised, we’ll return -1 if we encounter EOF. This is useful because -1 is not in the ASCII table and therefore corresponds to no valid character. Notice that getchar returns a Fixnum and therefore has to be converted with itoc by the programmer.

print prints the string_val representation of a value and returns the symbol ok. Nothing spcial.

itoc converts the given number into a character, makes a string of length 1, and then returns a symbol. Nothing fancy.

You’ll notice that getchar and itoc use this weird operator @@. Turns out it’s actually a pretty handy operator. Because of its precedence, it allows for chaining of functions, and they are applied in sequence. So it makes the following two lines equivalent:

int_of_char @@ input_char stdin
int_of_char (input_char stdin)

Yeah, it’s one character longer, but it removes parentheses.

empty-symbol is defined as expected.

There’s one really annoying problem with these functions, though. If you stopped reading and went ahead and implemented them, you’d probably see behavior like:

$ ocaml 15_io.ml
> (getchar)
10
> (itoc (getchar))


>
$

What gives? It’s not even waiting for me to type something — it just takes the newline!

That’s for two reasons:

  1. Our REPL does not try and read one line at a time. It reads in units of s-expression, and then stops.
  2. We’re entering the program on the same channel as the program is expecting to read user input.

So when I go to hit enter, the reader goes all the way until the last parenthesis, then stops. And then the next character is the newline, 10.

The easy fix, as it turns out, is to add the ability to read from a file instead of only supporting a REPL. We’ll make a better reader in the next post, so hold off on line reading for a second.

Let’s go back to main:

let get_ic () =
  try  open_in Sys.argv.(1)
  with Invalid_argument s -> stdin

let main =
  let ic = get_ic () in
  let stm = { chr=[]; line_num=1; chan=ic } in
  try  repl stm basis
  with End_of_file -> if ic <> stdin then close_in ic

This is some pretty new stuff. get_ic means “get input channel”. It tries to get the command-line argument to the interpeter and open it as an input channel. open_in is analogous to fopen from C.

Note that it gets index 1 instead of index 0, since 15_io.ml is in index 0:

index:  0        1
$ ocaml 15_io.ml mylispprogram.lsp

Once it has the channel, it makes our nice little stream and calls repl. If that has an End_of_file exception, which happens if we hit Control-D in the REPL or the program reaches the end of our input file, it cleans up after itself and exits.

Note that we only want to close the input file if it’s not stdin (the <> operator), otherwise we can’t read anything else after.

We should probably modify repl so that it doesn’t print a prompt if it’s reading from a file that the user passed in:

let rec repl stm env =
  if stm.chan=stdin then ( print_string "> "; flush stdout; );
  let ast = build_ast (read_sexp stm) in
  let (result, env') = eval ast env in
  if stm.chan=stdin then print_endline (string_val result);
  repl stm env';;

We also probably don’t want to print the result of every intermediate expression in the user’s program, so we limit that to REPL-only as well.

Let’s take it for a spin:

$ cat > myprogram.lsp
(print (itoc (getchar)))
(print (itoc 10))
(print (cat 'hello (cat (itoc 32) 'world)))
$ ocaml 15_io.ml myprogram.lsp
a
a
hello world
$

Nice.

Download the code here if you want to mess with it.

There are some I/O functions that would be really useful to have in user-define programs, but that we don’t want to bake right into the OCaml code. This is where having a standard library would come in super handy, so that’s up next!