In my last series, I wrote about building a Lisp interpreter. This time, we’re going to write a Lisp compiler.
This series is an adaptation of Abdulaziz Ghuloum’s excellent paper An Incremental Approach to Compiler Construction, with several key differences:
See my implementation for reference, but note that it may be incomplete and also may look a little bit different than the compiler detailed in these posts.
You probably have a couple questions, like why Lisp? and why compile directly to x86-64? and why C? and come on, another Lisp series?, and those are all very reasonable questions that will be answered in due time.
I want to implement this compiler in another language than Scheme because it will prevent me from copying too much of the code from the paper. Even though the paper doesn’t actually contain the source for the whole compiler (most of it is, after all, left as exercises for the reader), I think I will learn a lot more when I have to write all of the code by myself. I get to make my own mistakes and you get to watch me make and fix them in “real” time.
I also don’t want to generate text assembly, but those reasons are a little different than my reason for choosing another implementation language:
First, I think that would be harder to test: I want to have an in-process unit
test suite that compiles Lisp programs and executes them on-the-fly. Shelling
out to a system assembler like
nasm would be somewhat error prone.
What if the person building this doesn’t have the assembler I need? Sure, I
could also write a small assembler as part of this compiler, but that’s a lot
of work. Harder than just generating x86-64 directly, perhaps.
Second, I want to learn more about machine architecture. While
add a, b seems
like one machine instruction, it could probably be encoded in 50 different ways
depending on whether
b are registers, stack locations, other memory
addresses, immediates, which registers they are, etc. Shelling out to an
assembler abstracts a lot of that detail away. I want to get my hands dirty.
Hopefully you do, too.
I chose Lisp because that’s what the Ghuloum paper uses, and because Lisp can be represented as a small, compact, dynamically typed language. Many interpreter implementations are under 200 lines. I don’t think this compiler will be that short, though.
In order to get the most out of this series, I recommend having at least passing familiarity with the following:
Having the background helps your focus be more on the bigger picture than the minutia, but it is by no means required. I expect most of this series to be fairly readable. If it’s not, that’s a bug and you should report it to me.
I plan on writing this series in installments where each installment adds a feature of some kind. Maybe that feature is a new bit of Lisp functionality, or maybe it’s a refactoring of the compiler, or maybe it’s a compiler optimization.
For this reason, each post will tend to depend on the code and understanding from previous posts. As such, I recommend reading the series in order. I’ll still try to keep the big ideas understandable for those who don’t.
At each stage of the compiler, we should have a battery of tests that ensure that the features we have already build continue to work as expected.
I plan on adhering to this rough plan:
cons, strings, symbols, etc)
With optional add-ons also described in the initial paper:
And optional add-ons not described in the original paper:
You may have noticed that this is a lot of steps, and there are some steps that I intend to take but have completely omitted because I want to roll them into other posts. Things like:
So it’s really actually more work than I listed above. This series may take a long time. It may take some twisty turns. It may take some shortcuts. But there is good news: I’ve already written the compiler up to compiling heap allocation (still working on procedure calls), and even if I don’t finish this series there is still Ghuloum’s excellent paper to learn from.
Next up, the smallest program.