Type-based alias analysis in the Toy Optimizer

February 16, 2026

Another entry in the Toy Optimizer series.

Last time, we did load-store forwarding in the context of our Toy Optimizer. We managed to cache the results of both reads from and writes to the heap—at compile-time!

We were careful to mind object aliasing: we separated our heap information into alias classes based on what offset the reads/writes referenced. This way, if we didn’t know if object a and b aliased, we could at least know that different offsets would never alias (assuming our objects don’t overlap and memory accesses are on word-sized slots). This is a coarse-grained heuristic.

Fortunately, we often have much more information available at compile-time than just the offset, so we should use it. I mentioned in a footnote that we could use type information, for example, to improve our alias analysis. We’ll add a lightweight form of type-based alias analysis (TBAA) (PDF) in this post.

Representing types

We return once again to Fil Pizlo land, specifically How I implement SSA form. We’re going to be using the hierarchical heap effect representation from the post in our implementation, but you can use your own type representation if you have one already.

This representation divides the heap into disjoint regions by type. Consider, for example, that Array objects and String objects do not overlap. A LinkedList pointer is never going to alias an Integer pointer. They can therefore be reasoned about separately.

But sometimes you don’t have perfect type information available. If you have in your language an Object base class of all objects, then the Object heap overlaps with, say, the Array heap. So you need some way to represent that too—just having an enum doesn’t work cleanly.

Here is an example simplified type hierarchy:

Any
  Object
    Array
    String
  Other

Where Other might represent different parts of the runtime’s data structures, and could be further segmented into GC, Thread, etc.

Fil’s idea is that we can represent each node in that hierarchy with a tuple of integers [start, end) (inclusive, exclusive) that represent the pre- and post-order traversals of the tree. Or, if tree traversals are not engraved into your bones, they represent the range of all the nested objects within them.

Any [0, 3)
  Object [0, 2)
    Array [0, 1)
    String [1, 2)
  Other [2, 3)

Then the “does this write interfere with this read” check—the aliasing check—is a range overlap query.

Here’s a perhaps over-engineered Python implementation of the range and heap hierarchy based on the Ruby generator and C++ runtime code from JavaScriptCore:

class HeapRange:
    def __init__(self, start: int, end: int) -> None:
        self.start = start
        self.end = end

    def __repr__(self) -> str:
        return f"[{self.start}, {self.end})"

    def is_empty(self) -> bool:
        return self.start == self.end

    def overlaps(self, other: "HeapRange") -> bool:
        # Empty ranges interfere with nothing
        if self.is_empty() or other.is_empty():
            return False
        return self.end > other.start and other.end > self.start


class AbstractHeap:
    def __init__(self, name: str) -> None:
        self.name = name
        self.parent = None
        self.children = []
        self.range = None

    def add_child(self, name: str) -> None:
        result = AbstractHeap(name)
        result.parent = self
        self.children.append(result)
        return result

    def compute(self, start: int) -> None:
        current = start
        if not self.children:
            self.range = HeapRange(start, current + 1)
            return
        for child in self.children:
            child.compute(current)
            current = child.range.end
        self.range = HeapRange(start, current)


Any = AbstractHeap("Any")
Object = Any.add_child("Object")
Array = Object.add_child("Array")
String = Object.add_child("String")
Other = Any.add_child("Other")
Any.compute(0)

Where Any.compute(0) kicks off the tree-numbering scheme.

Fil’s implementation also covers a bunch of abstract heaps such as SSAState and Control because his is used for code motion and whatnot. That can be added on later but we will not do so in this post.

So there you have it: a type representation. Now we need to use it in our load-store forwarding.

Load-store forwarding

Recall that our load-store optimization pass looks like this:

def optimize_load_store(bb: Block):
    opt_bb = Block()
    # Stores things we know about the heap at... compile-time.
    # Key: an object and an offset pair acting as a heap address
    # Value: a previous SSA value we know exists at that address
    compile_time_heap: Dict[Tuple[Value, int], Value] = {}
    for op in bb:
        if op.name == "store":
            obj = op.arg(0)
            offset = get_num(op, 1)
            store_info = (obj, offset)
            current_value = compile_time_heap.get(store_info)
            new_value = op.arg(2)
            if eq_value(current_value, new_value):
                continue
            compile_time_heap = {
                load_info: value
                for load_info, value in compile_time_heap.items()
                if load_info[1] != offset
            }
            compile_time_heap[store_info] = new_value
        elif op.name == "load":
            load_info = (op.arg(0), get_num(op, 1))
            if load_info in compile_time_heap:
                op.make_equal_to(compile_time_heap[load_info])
                continue
            compile_time_heap[load_info] = op
        opt_bb.append(op)
    return opt_bb

At its core, it iterates over the instructions, keeping a representation of the heap at compile-time. Reads get cached, writes get cached, and writes also invalidate the state of compile-time information about fields that may alias.

In this case, our may alias asks only if the offsets overlap. This means that the following unit test will fail:

def test_store_to_same_offset_different_heaps_does_not_invalidate_load():
    bb = Block()
    var0 = bb.getarg(0)
    var0.info = Array
    var1 = bb.getarg(1)
    var1.info = String
    var2 = bb.store(var0, 0, 3)
    var3 = bb.store(var1, 0, 4)
    var4 = bb.load(var0, 0)
    bb.escape(var4)
    opt_bb = optimize_load_store(bb)
    assert (
        bb_to_str(opt_bb)
        == """\
var0 = getarg(0)
var1 = getarg(1)
var2 = store(var0, 0, 3)
var3 = store(var1, 0, 4)
var4 = escape(3)"""
    )

This test is expecting the write to var0 to still remain cached even though we wrote to the same offset in var1—because we have annotated var0 as being an Array and var1 as being a String. If we account for type information in our alias analysis, we can get this test to pass.

After doing a bunch of fussing around with the load-store forwarding (many rewrites), I eventually got it down to a very short diff:

+def may_alias(left: Value, right: Value) -> bool:
+    return (left.info or Any).range.overlaps((right.info or Any).range)
+
+
 def optimize_load_store(bb: Block):
     opt_bb = Block()
     # Stores things we know about the heap at... compile-time.
@@ -138,6 +210,10 @@ def optimize_load_store(bb: Block):
                 load_info: value
                 for load_info, value in compile_time_heap.items()
                 if load_info[1] != offset
+                or not may_alias(load_info[0], obj)
             }
             compile_time_heap[store_info] = new_value

If we don’t have any type/alias information, we default to “I know nothing” (Any) for each object. Then we check range overlap.

The boolean logic in optimize_load_store looks a little weird, maybe. But we can also rewrite (via DeMorgan’s law) as:

{
    ... for ...
    if not (load_info[1] == offset
            and may_alias(load_info[0], obj))
}

So, keeping all the cached field state about fields that are known by offset and by type not to alias. Maybe that is clearer (but not as nice a diff).

Note that the type representation is not so important here! You could use a bitset version of the type information if you want. The important things are that you can cheaply construct types and check overlap between them.

Nice, now our test passes! We can differentiate between memory accesses on objects of different types.

But what if we knew more?

Object provenance / allocation site

Sometimes we know where an object came from. For example, we may have seen it get allocated in the trace. If we saw an object’s allocation, we know that it does not alias (for example) any object that was passed in via a parameter. We can use this kind of information to our advantage.

For example, in the following made up IR snippet:

trace(arg0):
  v0 = malloc(8)
  v1 = malloc(16)
  ...

We know that (among other facts) v0 doesn’t alias arg0 or v1 because we have seen its allocation site.

I saw this in the old V8 IR Hydrogen’s lightweight alias analysis¹:

enum HAliasing {
  kMustAlias,
  kMayAlias,
  kNoAlias
};

HAliasing Query(HValue* a, HValue* b) {
  // The same SSA value always references the same object.
  if (a == b) return kMustAlias;

  if (a->IsAllocate() || a->IsInnerAllocatedObject()) {
    // Two non-identical allocations can never be aliases.
    if (b->IsAllocate()) return kNoAlias;
    if (b->IsInnerAllocatedObject()) return kNoAlias;
    // An allocation can never alias a parameter or a constant.
    if (b->IsParameter()) return kNoAlias;
    if (b->IsConstant()) return kNoAlias;
  }
  if (b->IsAllocate() || b->IsInnerAllocatedObject()) {
    // An allocation can never alias a parameter or a constant.
    if (a->IsParameter()) return kNoAlias;
    if (a->IsConstant()) return kNoAlias;
  }

  // Constant objects can be distinguished statically.
  if (a->IsConstant() && b->IsConstant()) {
    return a->Equals(b) ? kMustAlias : kNoAlias;
  }
  return kMayAlias;
}

There is plenty of other useful information such as:

If we know at compile-time that object A has 5 at offset 0 and object B has 7 at offset 0, then A and B don’t alias (thanks, CF)
- In the RPython JIT in PyPy, this is used to determine if two user (Python) objects don’t alias because we know the contents of the user (Python) class field
Object size (though perhaps that is a special case of the above bullet)
Field size/type
Deferring alias checks to run-time
- Have a branch if (a == b) { ... } else { ... }
…

If you have other fun ones, please write in.

Interacting with other instructions

We only handle loads and stores in our optimizer. Unfortunately, this means we may accidentally cache stale information. Consider: what happens if a function call (or any other opaque instruction) writes into an object we are tracking?

The conservative approach is to invalidate all cached information on a function call. This is definitely correct, but it’s a bummer for the optimizer. Can we do anything?

Well, perhaps we are calling a well-known function or a specific IR instruction. In that case, we can annotate it with effects in the same abstract heap model: if the instruction does not write, or only writes to some heaps, we can at least only partially invalidate our heap.

known_builtin_functions = {
  "Array_length": Effects(reads=Array, writes=Empty()),
  "Object_setShape": Effects(reads=Empty(), writes=Object),
  "String_setEncoding": Effects(reads=Empty(), writes=String),
}

However, if the function is unknown or otherwise opaque, we need at least more advanced alias information and perhaps even (partial) escape analysis.

Consider: even if an instruction takes no operands, we have no idea what state it has access to. If it writes to any object A, we cannot safely cache information about any other object B unless we know for sure that A and B do not alias. And we don’t know what the instruction writes to. So we may only know we can cache information about B because it was allocated locally and has not escaped.

Storing vs computing on the fly

Some runtimes such as ART pre-compute all of their alias information in a bit matrix. This makes more sense if you are using alias information in a full control-flow graph, where you might need to iterate over the graph a few times. In a trace context, you can do a lot in one single pass—no need to make a matrix.

When is this useful? How much?

As usual, this is a toy IR and a toy optimizer, so it’s hard to say how much faster it makes its toy programs.

In general, though, there is a dial for analysis and optimization that goes between precision and speed. This is a happy point on that dial, only a tiny incremental analysis cost bump above offset-only invalidation, but for higher precision. I like that tradeoff.

Also, it is very useful in JIT compilers where generally the managed language is a little better-behaved than a C-like language. Somewhere in your IR there will be a lot of duplicate loads and stores from a strength reduction pass, and this can clean up the mess.

Wrapping up

See the full code.

Thanks for joining as I work through a small use of type-based alias analysis for myself. I hope you enjoyed.

See also two mechanisms for dynamic type checks by Andy Wingo. CRuby uses the latter technique described in the article.

Thanks

Thank you to Chris Gregory for helpful feedback.

I made a fork of V8 to go spelunk around the Hydrogen IR. I reset the V8 repo to the last commit before they deleted it in favor of their new Sea of Nodes based IR called TurboFan. ↩