A catalog of ways to generate SSA

February 11, 2025

Static Single Assignment is a program representation where each “variable” (though this term can be misleading) is assigned exactly once. Mostly the variables aren’t variables at all, but instead names for values—for expressions. SSA is used a lot in compilers. I’m making this page to catalog the papers I have found interesting and leave a couple of comments on them.

A brief bit of background

It comes from the 1980s. Wikipedia claims the first paper about it (though not by the name SSA) was Code Motion of Control Structures in High Level Languages (PDF), which I had not seen before today. There was also Global Value Numbers and Redundant Computations (PDF), which I had also not seen before today.

If you’re confused about what a phi function is: they are annotations that join dataflow edges from predecessor blocks into new variables. It’s not a real function, it doesn’t (directly) generate any code, and it needs to be handled differently than other instructions in your IR.

Let’s generate some variables

Finally, in 1991, the same group produced the first SSA paper I was already familiar with: The Cytron Paper (PDF). This approach requires computing dominance frontiers for an existing control-flow graph, which may or may not be useful or applicable in your situation. If you have bytecode, this might be workable, but you’ll need to “discover” the basic blocks hiding within. It’s also a notable paper because it produces the minimal amount of phi instructions.

In 1994, Brandis and Mössenböck write Single-Pass Generation of Static Single-Assignment Form for Structured Languages (PDF). This is a neat approach because it shows that you don’t need to do Fancy Algorithms or Fancy Data Structures to get SSA—you can build it as soon as during parsing, something I have taken to doing recently. It turns out that if you don’t have goto, things get easier for the compiler developer.

In 1995, Richard Kelsey writes about how CPS and SSA are similar in A Correspondence between Continuation Passing Style and Static Single Assignment Form (PDF). Part of the paper involves converting CPS to SSA (and the reverse, too).

Also in 1995, Cliff Click and Michael Paleczny publish A Simple Graph-Based Intermediate Representation (PDF), which has become known as the Sea of Nodes IR. It’s like “normal” SSA except that instead of just having blocks and data edges between instructions, all instructions are “unscheduled” and have control edges between them. It means your graph looks all funky but doesn’t implicitly have (kind of arbitrary) code linearization so code motion is easier. Cliff and co are building a working demo compiler. See also Global Code Motion / Global Value Numbering (PDF) by Cliff Click.

In 1998, Appel (of functional language fame) describes briefly a correspondence between functional languages and SSA in SSA is Functional Programming (PDF). I think this is the first paper that introduced a notion of “basic block arguments” (instead of phi functions). He also suggests a “really crude” algorithm for generating SSA that places phi nodes all over the place for every variable.

In 2000, Aycock and Horspool publish Simple Generation of Static Single-Assignment Form (PDF). It starts from Appel’s approach and then iteratively deletes phi nodes that don’t need to exist. They find that for reducible control-flow graphs (the common case for most compilers, I think), their approach also produces minimal SSA.

In 2009, Michael Bebenita writes Constructing SSA the Easy Way (PDF) which is “essentially a rehashing of Aycock’s SSA construction algorithm but using forwarding pointers instead”. I only found it the other day. It’s fantastic and I wish more people knew about it. It uses one of my favorite data structures, union-find, though for some reason it does not mention it by name.

Also in 2009, Paul Biggar and David Gregg sketch an algorithm (PDF) for converting into SSA while doing sparse conditional constant propagation (SCCP). Like Cytron’s algorithm, it requires having dominance information available. Per the paper, the implementation appears to work but there are no proofs about it.

In some year between 2009 and 2018, Fil Pizlo creates and does not write much about “Pizlo special” SSA¹. It is used in DFG-SSA (high level in FTL JIT) and B3 (low level in FTL JIT) in JavaScriptCore. After light prodding on Hacker News, he writes this gist (mirrored in footnote).

In 2013, my former coworker Matthias Braun and his labmates write Simple and Efficient Construction of Static Single Assignment Form (PDF), which describes converting to SSA from an AST and building the CFG on the fly. It’s fairly popular because it is simpler than the Cytron paper. I find the phase transitions in the blocks (filled/sealed/…) a little tricky to keep straight though.

In 2023, Matthieu Lemerre writes SSA Translation Is an Abstract Interpretation (PDF), which I would love to understand one day.

Other papers

I will eventually add some papers on extensions to SSA, analyses on SSA, converting out of SSA (including register allocation). Just not this evening.

I am also probably missing or forgetting some big hit papers for converting into SSA, so please drop me a line if you have favorites.

Other resources

The SSA book (PDF; draft) is a huge tour de force of SSA-based compiler design. That is even the new name of the book, which has since been published in print. It’s on my shelf.

A keyword dump

Here are some keywords which you may search for but mostly serve as notes to self to write more about one day:

windmill, lost copy
SCCP
liveness
DCE
chordal
https://arxiv.org/abs/2011.05608
SSI and e-SSA and sparseness
GVN and CSE and hash consing
load/store forwarding
mem2reg
SROA and memory ssa
eager constant folding / smart constructors
points-to and CFA
flattening tuples
webs (see Muchnick’s 1997 Advanced compiler design and implementation)
RVSDG and co
e-graphs

This describes how I do SSA form, which avoids the need to have any coupling between CFG data structures and SSA data structures.

Let’s first define a syntax for SSA and some terminology. Here’s an example SSA node:
```
A = Add(B, C)
```
In reality, this will be a single object in your in-memory representation, and the names are really addresses of those objects. So, this node has an “implicit variable” called A; it’s the variable that is implicitly assigned to when you execute the node. If you then do:
```
X = Sub(A, 1)
```
Then “A” is just a pointer to the Add node, and we’re using the implicit variable “A”.

Here’s an example function:
```
int foo(int a, int b)
{
    int x;
    if (a)
        x = b + 1
    else
        x = b * 2
    return x + 42;
}
```
Here’s an SSA program with a Phi in Pizlo form:
```
root:
    A = GetArgument(0)
    B = GetArgument(1)
    Branch(A, then, else)
then:
    X1 = Add(B, 1)
    Upsilon(X1, ^X)
    Jump(return)
else:
    X2 = Mul(B, 2)
    Upsilon(X2, ^X)
    Jump(return)
return:
    X = Phi()
    R = Add(X, 42)
    Return(R)
```
In Pizlo form:
- Every SSA node has an implicit variable, as mentioned above.
- Every Phi node has a shadow variable in addition to the implicit variable.
Let’s say that given a Phi like “X = Phi()”, the implicit variable is called “X”, and the shadow variable is called “^X”.

Therefore, the semantics of an upsilon like “Upsilon(X1, ^X)” is just “set ^X = X1”. And the semantics of a Phi like “X = Phi()” is just “set X = ^X”.

In other words, you can think of Upsilon as being a side effect (a store to a shadow variable). And you can think of Phi as being a side effect (a load from a shadow variable). You can model them that way in your effect analysis to block reordering Upsilons and Phis.

But also, the shadow variables of Phis in Pizlo form are “Static Single Use” (SSU) variables. This falls out naturally from the fact that the only syntax for loading a shadow variable is the Phi itself. So you can think of Pizlo form as “SSA-SSU form”.

The main benefit of this form is that basic blocks - and all CFG data structures - have zero knowledge about SSA. There are no basic block arguments. There’s no requirement that Phis appear at the tops of blocks. In fact, this is a valid program in Pizlo form (albeit suboptimal):
```
M = Stuff(...)
Upsilon(M, ^N)
N = Phi()
MoreStuff(N)
```
Here, there’s a Phi in them middle of a basic block, and there’s an Upsilon just before it. That’s fine. This is important, because it means that you can do CFG transforms that blow away control flow edges without worrying about fixing your Phis.

In any Pizlo-form compiler, you’ll want to have a Phi simplification pass, which you can implement either by running Cytron or by running any other SSA converter. The simplest is just to just fixpoint the rule that if you have a Phi that has Upsilons that only use the Phi or exactly one other value, then replace the Phi with that other value. ↩