What I talk about when I talk about IRs

June 12, 2025

I have a lot of thoughts about the design of compiler intermediate representations (IRs). In this post I’m going to try and communicate some of those ideas and why I think they are important.

The overarching idea is being able to make decisions with only local information.

That comes in a couple of different flavors. We’ll assume that we’re compiling a method at a time, instead of a something more trace-like (tracing, tracelets, basic block versioning, etc).

Control-flow graphs

A function will normally have some control-flow: if, while, for, any amount of jumping around within a function. Let’s look at an example function in a language with advanced control-flow constructs:

int sumto(int n) {
  int result = 0;
  for (result = 0; n >= 0; n -= 1) {
    result += n;
  }
  return result;
}

Most compilers will deconstruct this for, with its many nested expressions, into simple comparisons and jumps. In order to resolve jump targets in your compiler, you may have some notion of labels (in this case, words ending with a colon):

int sumto(int n) {
  entry:
    int result = 0
  header:
    if n < 0, jumpto end
    result = result + n
    n = n - 1
    jumpto header
  end:
    return result
}

This looks kind of like a pseudo-assembly language. It has its high-level language features decomposed into many smaller instructions. It also has implicit fallthrough between labeled sections (for example, entry into header).

I mention these things because they, like the rest of the ideas in this post, are points in an IR design space. Representing code this way is an explicit choice, not a given. For example, one could make the jumps explicit by adding a jumpto header at the end of entry.

As soon as we add that instruction, the code becomes position-independent: as long as we start with entry, the chunks of code between labels could be ordered any which way: they are addressed by name and have no implicit ordering requirements.

This may seem arbitrary, but it gives the optimizer more flexibility. If some optimization rule decides, for example, that a branch to block4 may rarely be taken, it can freely re-order it toward the end of the function (or even on a different page!) so that more hot code can be on a single cache line.

Explicit jumps and labels turn the code from a strictly linear assembly into a control-flow graph (CFG). Each sequence of code without internal control-flow is a called basic block and is a vertex in this graph. The directed edges represent jumps between blocks. See for example this crude GraphViz representation:

Each basic block gets its own node in the graph, and directed edges between nodes represent jumps.

We’re actually kind of looking at extended basic blocks (EBBs), which allow for multiple control exits per block but only one control entry. A strict basic block representation of the above code would look, in text form, something like this:

int sumto(int n) {
  entry:
    int result = 0
    condbranch n < 0, iftrue: end, iffalse: header
  header:
    result = result + n
    n = n - 1
    condbranch n < 0, iftrue: end, iffalse: header
  end:
    return result
}

Notice how each block has exactly one terminator (control-flow instruction), with (in this case) 0 or 2 targets.

Opinions differ about the merits and issues of extended vs normal basic blocks. Most compilers I see use normal basic blocks. In either case, bringing the IR into a graph form gives us an advantage: thanks to Cousot and Cousot, our favorite power couple, we know how to do abstract interpretation on graphs and we can use this to build an advanced optimizer. See, for example, my intro to abstract interpretation post.

Register-based IRs

Some IRs are stack based. For concatenative languages or some newer JIT compilers, IRs are formatted in such a way that each opcode reads its operands from a stack and writes its outputs to a stack. This is reminiscent of a point-free coding style in languages such as Haskell or OCaml.

inc = (+ 1)

In this style, there is an implicit shared state: the stack. Dataflow is explicit (pushes and pops) and instructions can only be rearranged if the stack structure is preserved. This requires some non-local reasoning: to move an instruction, one must also rearrange the stack.

PUSH 1
PUSH 2
ADD

By contrast, in a register-based IR, things are more explicit. Instructions take named inputs (v0, v12, etc) and produce named outputs. Instructions can be slightly more easily moved around (modulo effects) as long as inputs remain defined. Local variables do not exist. The stack does not exist. Everything is IR “variables”.

v1 = CONST 1
v2 = CONST 2
v3 = ADD v1 v2

The constraints (names being defined) are part of the IR. This gets a little bit tricky if it’s possible to define a name multiple times.

v1 = CONST 1
v2 = CONST 2
x = ADD v1 v2
x = ADD x v2
v3 = MUL x 12

What does x mean in the instruction for v3? Which definition does it refer to? In order to reason about the instruction v3 = MUL x 12, we have to keep around some context. This is non-trivial: requiring compiler writers to constantly truck around side-tables and update them as they do analysis is slow and error-prone. Fortunately, if we enforce some interesting rules, we can push that analysis work into one pass up-front…

SSA

Static single assignment (SSA) was introduced by a bunch of folks at IBM (see my blog post about the different implementations). In SSA-based IRs, each variable can only be defined once. Put another way, a variable is its defining instruction¹; alternately, a variable and its defining instruction are addressed by the same name. The previous example is not valid SSA; x has two definitions.

If we turn the previous example into SSA, we can now use a different name for each instruction. This is related to the unique name assumption or the global names property: names do not depend on context.

v1 = CONST 1
v2 = CONST 2
x0 = ADD v1 v2
x1 = ADD x0 v2

Now we can identify each different ADD instruction by the variable it defines. This is useful in analysis…

CPS

I’d be remiss if I did not mention continuation-passing style (CPS) based IRs (and in fact, I had forgotten in the original draft of the post). As an IR, CPS is normally used in the analysis and optimization of functional programs, for example in the OCaml and Racket compilers. It is not required, however; MLton, for example, uses SSA in its compiler for Standard ML.

SSA and CPS can more or less represent the same programs, but they can each feel a natural fit for different languages (and different compiler authors). I don’t feel qualified to say much more here. For a more informed opinion, check out Andy Wingo’s approaching cps soup, especially the benefits and drawbacks near the end.

Type information

Speaking of CPS, I took a class with Olin Shivers and he described abstract interpretation as “automated theorem finding”. Unlike theorem provers such as Lean and Rocq, where you have to manually prove the properties you want, static analysis finds interesting facts that already exist in your program (and optimizers use them to make your program faster).

Your static analysis pass(es) can annotate your IR nodes with little bits of information such as:

has type int
has value 5
is between 2 and 5
is equivalent to the expression y*z
- See CF’s writeup of record_known_result: A Hint for Language-Specific Runtime Function Optimization in RPython’s Meta-JIT
does not have side effects
is loop invariant
does not escape
…

If your static analysis is over SSA, then generally the static analysis is easier and (potentially) storing facts is easier. This is due to this property called sparseness. Where a static analysis over non-SSA programs has to store facts about all variables at all program points, an analysis over SSA need only store facts about all variables, independent of context.

I sometimes describe this as “pushing time through the IR” but I don’t know that that makes a ton of sense.

Potentially more subtle here is that we could represent the above IR snippet as a list of tuples, where instructions are related via some other table (say, a “variable environment”):

code = [
  ("v1", "CONST", 1),
  ("v2", "CONST", 2),
  ("x0", "ADD", "v1", "v2"),
  ("x1", "ADD", "x0", "v2"),
]

Instead, though, we could allocate an object for each instruction and let them refer to one another by pointer (or index, if using Rust or something). Then they directly refer to one another (no need for a side-table), which might be faster and more compact. We can re-create nice names as needed for printing. Then, when optimizing, we look up the type information of an operand by directly reading a field (operands[N]->type or similar).

Another thing to note: when you start adding type information to your IR, you’re going to start asking type information questions in your analysis. Questions such as “what type is this instruction?”, where “type” could span a semilattice, and even refer to a specific run-time object by its pointer. In that case, it’s important to ask the right questions. For example:

wrong: Is instruction X a Const instruction?
right: Is the type of X known to be a specific object?

Const instructions are likely not the only opcodes that could produce specific objects; if you have an instruction like GuardIsObject, for example, that burns a specific expected pointer into the generated code, the type (and therefore the pointer) will come from the GuardIsObject instruction.

The big idea is that types represent a different slice of your IR than the opcodes and should be treated as such.

Anyway, SSA only stores type information about instructions and does not encode information that we might later learn in the IR. With basic SSA, there’s not a good way to encode refinements…

SSI

Static single information (SSI) form gives us new ways to encode metadata about instructions (variables). It was introduced by C. Scott Ananian in 1999 in his MS thesis (PDF). (I also discussed it briefly in the Scrapscript IR post.) Consider the following SSA program (represented as pseudo-Python):

# @0
x = int(...)
# @1
if x >= 0:
    # @2
    return x
else:
    # @4
    result = -x
    # @5
    return result

x is undefined at @0. x is defined and an integer at @1. But then we do something interesting: we split control flow based on the run-time value of x. We can take this split to add new and interesting information to x. For non-sparse analysis, we can record some fact on the side. That’s fine.

# @0
x = int(...)
# @1
if x >= 0:
    # @2
    LearnFact(x, nonnegative)
    # @3
    return x
else:
    # @4
    LearnFact(x, negative)
    # @5
    result = -x
    # @6
    return result

When doing a dataflow analysis, we can keep track of the fact that at @3, x is nonnegative, and at @5, x is negative. This is neat: we can then determine that all paths to this function return a positive integer.

Importantly, LearnFact(x, nonnegative) does not override the existing known type of x. Instead, it is a refinement: a set intersection. A lattice meet. The middle bit of a Venn diagram containing two overlapping circles, Integer and Nonnegative.

If we want to keep our information sparse, though, we have to add a new definition to the IR.

# @0
x = int(...)
# @1
if x >= 0:
    # @2
    newx0 = LearnFact(x, nonnegative)
    # @3
    return newx0
else:
    # @4
    newx1 = LearnFact(x, negative)
    # @5
    result = -newx1
    # @6
    return result

This is complicated (choose which variables to split, replace all uses, to maintain SSA, etc) but gives us new places to store information inside the IR. It means that every time we refer to newx0, we know that it is nonnegative and every time we refer to newx1, we know that it is negative. This information is independent of context!

I should note that you can get a lot of the benefits of SSI without going “full SSI”. There is no need to split every variable at every branch, nor add a special new merge instruction.

Okay, so we can encode a lot of information very sparsely in the IR. That’s neat. It’s powerful. But we should also be mindful that even in this very sparse representation, we are encoding information implicitly that we may not want to: execution order.

Sea of nodes

In a traditional CFG representation, the instructions are already scheduled, or ordered. Normally this comes from the programmer in the original source form and is faithfully maintained. We get data use edges in an IR like SSA, but the control information is left implicit. Some forms of IR, however, seek to reify both data and control dependencies into the IR itself. One such IR design is sea of nodes (SoN), which was originally designed by Cliff Click during his PhD.

In sea of nodes, every instruction gets its own vertex in the graph. Instructions have use edges to their operands, which can be either data or some other ordering property (control, effects, etc). The main idea is that IR nodes are by default unordered and are only ordered later, after effect analysis has removed a bunch of use edges.

Per Vognsen also notes that there is another motivating example of sea of nodes: in the previous SSI example, the LearnFact(x, nonnegative) cannot be validly hoisted above the n >= 0 check. In a “normal” IR, this is implicit in the ordering. In a sea of nodes world, this is explicitly marked with an edge from the LearnFact to the if n >= 0. I think Graal, for example, calls these LearnFact nodes “Pi nodes”.

I think I need to re-read the original paper, read a modern implementation (I get the feeling it’s not done exactly the same way anymore), and then go write more about it later. For now, see Simple, by Cliff Click and friends. It is an implementation in Java and a little book to go with it.

Design neighbors include value dependence graphs (VDG), value state dependence graphs (VSDG), region value state dependence graphs (RVSDG), and program dependence graphs (PDG).

Specification reification

Speaking of Cliff Click, I once heard/read something he said that sounded really interesting. Roughly, it was “elaborate the full semantics of the operation into the IR and let the optimizer sort it out”. That is, “open code” or “inline” your semantics.

For example, don’t emit code for a generic add operation that you later specialize:

v2 = GENERIC_ADD v0, v1

Instead, emit code that replicates the written semantics of the operation, whatever that is for your local language. This can include optimistic fast paths:

v1 = TYPE_OF v0
v2 = TYPE_OF v1
if (v1 == Fixnum && v2 == Fixnum) {
    v3 = FIXNUM_ADD v0, v1
}
else if (v1 == String && v2 == String) {
    v4 = STRING_CONCAT v0, v1
}
else {
    v5 = LOOKUP_METHOD v0, "add"
    v6 = CALL_METHOD v5, v0, v1
}
v7 = PHI v3, v4, v6

This has the advantage that you may end up with fewer specialized rewrite rules because constant propagation and branch folding take care of these specializations “for free”.

You can even attach probabilities to more or less likely branches to offer outlining hints in case all of this is never specialized.

Sure, the downside of this is that the generated IR might be bigger, so your optimizer might be slower—or worse, that your resulting generated code at the end might be bigger. But outlining, deduplication (functionalization?), and probably some other clever methods can help here.

Similarly, Maxime Chevalier-Boisvert and Marc Feeley write about this (PDF) in the context of basic block versioning. If the runtime’s generic add functions is written in IR, then callers to that function can specialize “through it” by calling it in different basic block contexts. That more or less gets you call-site specialization “for free”. See Figure 4 from their paper (lightly edited by me), where I think dollar-prefixed variable names indicate special functions known to the compiler:

function $rt_add(x, y) {
    if ($ir_is_i32(x)) { // If x is integer
        if ($ir_is_i32(y)) {
            var r = $ir_add_i32_ovf(x, y);
            if (r)
                return r;
            else // Handle the overflow case
                return $ir_add_f64($ir_i32_to_f64(x),
                                   $ir_i32_to_f64(y));
        } else if ($ir_is_f64(y))
            return $ir_add_f64($ir_i32_to_f64(x), y);
    } else if ($ir_is_f64(x)) { // If x is floating point
        if ($ir_is_i32(y))
            return $ir_add_f64(x, $ir_i32_to_f64(y));
        else if ($ir_is_f64(y))
            return $ir_add_f64(x, y);
    }
    // Evaluate arguments as strings and concatenate them
    return $rt_strcat($rt_toString(x), $rt_toString(y));
}

This is nice if you are starting a runtime from scratch or have resources to devote to re-writing chunks of the runtime in your IR. Then, even in a method JIT, you can get your inlined language semantics by function (partial) inlining.

More?

There’s probably more in this vein to be explored right now and probably more to invent in the future, too. Some other potentially interesting concepts to think about include:

union-find
e-graphs
ae-graphs and reification in IR (see also PLDI eqsat paper)
Webs, for operating on connected components of IR nodes that are all concerned about a thing (for example, call and function nodes are concerned about the signature of the callee)
- see also Benjamin Quiring paper
High-level IRs and strength reduction of language semantics before low-level stuff (no low-level concerns in my high-level IR, please)
Some papers and posts I found myself agreeing with
- Maglev - V8’s Fastest Optimizing JIT
- Design of the Java HotSpot™ Client Compiler for Java 6 (PDF)

Thank yous

Thank you to Chris Fallin, Hunter Goldstein, and Per Vognsen for valuable feedback on drafts of this post.

I first saw this for real in CF Bolz-Tereick’s toy optimizer series about tracing JITs and it completely changed how I think about IRs. Then, when reading about register allocation, I saw this in Section 2 of Design of the Java HotSpot™ Client Compiler for Java 6 (PDF). ↩