<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/assets/rss.xsl"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Max Bernstein&apos;s Blog</title>
        <description></description>
        <link>https://bernsteinbear.com</link>
        <atom:link href="https://bernsteinbear.com/feed.xml" rel="self" type="application/rss+xml" />
        <item shouldShow="false">
            <title>Sorry for marking all the posts as unread</title>
            <description>
              I noticed that the URLs were all a little off (had two slashes
              instead of one) and went in and fixed it. I did not think
              everyone's RSS software was going to freak out the way it did.

              PS: this is a special RSS-only post that is not visible on the
              site. Enjoy.
            </description>
            <pubDate>Wed, 31 Jan 2024 00:00:00 +0000</pubDate>
            <guid isPermaLink="false">rss-only-post-1</guid>
        </item>
        
        <item>
            <title>A survey of inlining heuristics</title>
            <description>&lt;p&gt;Compilers, especially method just-in-time compilers, operate on one function at
a time. It is a natural code unit size, especially for a dynamic language JIT:
at a given point in time, what more information can you gather about other
parts of a running, changing system?&lt;/p&gt;

&lt;p&gt;I don’t have any data to back this up—maybe I should go gather some—but on
average, methods are small. Especially in languages such as Ruby that use
method dispatch for everything, even instance variable (attribute, field, …)
lookups, they are &lt;em&gt;small&lt;/em&gt;. And everywhere.&lt;/p&gt;

&lt;p&gt;This makes the compiler sad. If we are to continue to anthropomorphize them,
compilers like having more context so they can optimize better. Consider the
following silly-looking example that is actually representative of a surprising
amount of real-world code:&lt;/p&gt;

&lt;div class=&quot;language-ruby highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Point&lt;/span&gt;
  &lt;span class=&quot;nb&quot;&gt;attr_reader&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:y&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;initialize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;vi&quot;&gt;@x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;
    &lt;span class=&quot;vi&quot;&gt;@y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;distance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;other&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;no&quot;&gt;Math&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;vi&quot;&gt;@x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;other&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;vi&quot;&gt;@y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;other&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;distance_from_origin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;no&quot;&gt;Point&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;distance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;Point&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Right now, in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;distance_from_origin&lt;/code&gt; method, I count 8 different method calls:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Point.new&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Point#initialize&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Point.new&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Point#initialize&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Point#distance&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Float#**&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Float#**&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Math.sqrt&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(Technically more, but the ivar lookups (including &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;attr_reader&lt;/code&gt;!), addition,
and subtraction are generally specialized and don’t push a frame, even in the
interpreter.)&lt;/p&gt;

&lt;p&gt;Furthermore, there are at least two heap allocations: one for each &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Point&lt;/code&gt;
instance.&lt;/p&gt;

&lt;p&gt;Last, there is a bunch of memory traffic to and from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Point&lt;/code&gt; instances.&lt;/p&gt;

&lt;p&gt;This all is a huge bummer! What should be a simple math operation is now
overwhelmed with a bunch of other stuff. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Point&lt;/code&gt; is certainly not a zero-cost
abstraction.&lt;/p&gt;

&lt;p&gt;Even if we had a bunch of other optimizations such as load-store elimination or
escape analysis, they would not be able to do much: pretty much everything
escapes and is effectful. That is, unless we &lt;em&gt;inline&lt;/em&gt;. Inlining is the lever
that enables a bunch of other optimization passes to kick in.&lt;/p&gt;

&lt;h2 id=&quot;inlining-the-easy-part&quot;&gt;Inlining: the “easy” part&lt;/h2&gt;

&lt;p&gt;I wrote about the design and implementation of Cinder’s inliner (&lt;a href=&quot;https://engineering.fb.com/2022/05/02/open-source/cinder-jits-instagram/&quot;&gt;FB
link&lt;/a&gt;,
&lt;a href=&quot;/blog/cinder-jit-inliner/&quot;&gt;personal blog link&lt;/a&gt;) a couple of years ago. I wrote
about arguably the simplest part, which is copying the callee body into the
caller. It took me at least a week to get working. Probably closer to months if
you consider all the plumbing through the rest of the JIT. In February during a
small hackathon, I watched my colleague &lt;a href=&quot;https://github.com/k0kubun&quot;&gt;k0kubun&lt;/a&gt;
prototype that bit of the inliner inside ZJIT in about 30 minutes.&lt;/p&gt;

&lt;p&gt;There is more to do when pretty much every part of the VM is observable from
the guest language: both Python and Ruby allow inspecting the state of the
locals, the call stack, etc from user code. Sampling profilers also expect some
amount of breadcrumbs to work with to inspect the stack. So there’s some more
machinery still required to pretend like the callee function was not inlined. I
talk about this a little bit in the Cinder blog post.&lt;/p&gt;

&lt;p&gt;Even so, all of that can probably be designed and wired together in a couple
of months. Then you will find yourself tuning the inliner for the next 10
years. This is much harder.&lt;/p&gt;

&lt;h2 id=&quot;when-the-harder-part&quot;&gt;When: the harder part&lt;/h2&gt;

&lt;p&gt;The thing that makes inlining difficult, especially in a method JIT, is that
you are trying to make an entire (dynamic!) system faster but you are only
looking through a microscope and only capable of local reasoning&lt;sup id=&quot;fnref:aot-split&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:aot-split&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.
Whereas other optimizations such as strength reduction, inline caches, and
value numbering are an un-alloyed good for the generated code, inlining can
have &lt;em&gt;negative effects&lt;/em&gt;. It is also perhaps the first optimization people add
that has non-local impact.&lt;/p&gt;

&lt;p&gt;If you inline wrong, your code size might blow up. This might thrash your CPU’s
caches. Bummer, but happens to the best of us.&lt;/p&gt;

&lt;p&gt;But also, if you inline wrong, you might get in the way of other helpful
optimizations: if you hit some size limit after inlining method A, you might
never get to inline B, which is the key to unlocking the performance of the
method you are trying to optimize.&lt;/p&gt;

&lt;p&gt;Last, inlining might hurt compile time. In situations where latency is
paramount (think: interactive client JavaScript), adding tons more code into
the fray might add noticeable hiccups, even if the long-term throughput
improves. As always, in-band compilation is a trade-off because any time you
spend compiling, you are &lt;em&gt;not executing code&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;You have to write your compiler to reason about all of this stuff. So you have
heuristics. For example, here is Michael Pollan’s inliner heuristic:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;Inline methods. Mostly small. Not too many.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I did a survey of a bunch of compilers, mostly JIT compilers, to see what their
inlining heuristics look like. I also read (skimmed) some papers to see what
those folks had to say. I wonder if they agree.&lt;/p&gt;

&lt;p&gt;This post was a long time coming. I started working on it about five years ago
but then when I quit working at Facebook I accidentally left behind all of the
inliner research I did for Cinder’s inliner. So then I kind of just thought
about it aimlessly for a while before redoing it this year. Anyway, here’s
wonderwall.&lt;/p&gt;

&lt;h2 id=&quot;the-heuristics&quot;&gt;The heuristics&lt;/h2&gt;

&lt;p&gt;Spoiler alert: all in all, people tend to look at:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Profiles of call target&lt;/li&gt;
  &lt;li&gt;Cumulative caller size (increasing as callees get inlined)&lt;/li&gt;
  &lt;li&gt;Callee size&lt;/li&gt;
  &lt;li&gt;Inline depth&lt;/li&gt;
  &lt;li&gt;Number of inlined calls at a certain depth&lt;/li&gt;
  &lt;li&gt;If recursion is present&lt;/li&gt;
  &lt;li&gt;Callee/caller call count ratio (if callee only called less than K% of calls
to caller, don’t inline callee)&lt;/li&gt;
  &lt;li&gt;Callee stack usage&lt;/li&gt;
  &lt;li&gt;Polymorphism in callee&lt;/li&gt;
  &lt;li&gt;What mode the compiler is in (baseline vs more aggressive)&lt;/li&gt;
  &lt;li&gt;If the callee looks like it always raises/throws&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And also have different interesting ways to pipe in profile information.&lt;/p&gt;

&lt;p&gt;Last, some newer papers do some wild stuff:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Train neural networks to make inlining decisions&lt;/li&gt;
  &lt;li&gt;Let inlining drive the entire optimization pipeline, treating it as a search
heuristic over a BFS walk of the call graph&lt;/li&gt;
  &lt;li&gt;Use AOT-gathered information to aid in JIT heuristics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another thing to consider in inlining is how you gather and interpret profiles.&lt;/p&gt;

&lt;h2 id=&quot;call-context-and-profiles-the-other-harder-part&quot;&gt;Call context and profiles: the other harder part&lt;/h2&gt;

&lt;p&gt;When you compile a function, you tend to specialize it based on the input it
has historically been given. For a monomorphic input, maybe you guard that the
type is still the same and otherwise jump into the interpreter. For a
polymorphic input, maybe you check the top K (~4) common cases and otherwise
jump into the interpreter. Fine.&lt;/p&gt;

&lt;p&gt;But sometimes you can be compiling a polymorphic method &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bar&lt;/code&gt; that is actually
monomorphic in its caller &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;foo&lt;/code&gt;. That is, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;foo&lt;/code&gt; might only ever pass one kind
of input to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bar&lt;/code&gt;, but other callers pass all kinds of stuff. Here is a bit of
a silly example to show what I mean:&lt;/p&gt;

&lt;div class=&quot;language-ruby highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;HashWithIndifferentAccess&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;initialize&lt;/span&gt;
    &lt;span class=&quot;vi&quot;&gt;@hash&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{}&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;# Allow reading from the Hash with either a String or a Symbol&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;[]&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;vi&quot;&gt;@hash&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;to_sym&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;# ...&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# some method...&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;some_hash&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;HashWithIndifferentAccess&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;new&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# ...&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;some_hash&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;abc&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# some other method...&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;another_hash&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;HashWithIndifferentAccess&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;new&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# ...&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;another_hash&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;:xyz&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Just kidding, not so silly at all. It’s a super common pattern &lt;a href=&quot;https://github.com/rails/rails/blob/6c75e6d5663afa4278ee593c2d6c20c1ee396e32/activesupport/lib/active_support/hash_with_indifferent_access.rb#L55&quot;&gt;in
Rails&lt;/a&gt;. It makes &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;key&lt;/code&gt; polymorphic in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HashWithIndifferentAccess#[]&lt;/code&gt; even
though for many of its callers, it may well be monomorphic (or even a
constant).&lt;/p&gt;

&lt;p&gt;In order to plumb this information through to the compiler, you have to figure
out this call context relationship. There are a couple of common ways to do it.&lt;/p&gt;

&lt;h3 id=&quot;splitting&quot;&gt;Splitting&lt;/h3&gt;

&lt;p&gt;YJIT, for example, though it does not inline, splits methods based on the types
of the arguments going in. This means that it clones the compiled code,
generating a new version for each context. This does not give &lt;em&gt;call&lt;/em&gt; context
(“A calls B”) but gives type context (“B is called with integers, B’ is called
with strings”).&lt;/p&gt;

&lt;p&gt;A compiler could do type-based splitting in the interpreter or a baseline tier.&lt;/p&gt;

&lt;h3 id=&quot;profile-splitting&quot;&gt;Profile splitting&lt;/h3&gt;

&lt;p&gt;If you don’t fancy duplicating the code, you can instead duplicate the
profiles. You could either do this using type context (as above) or using call
context. SpiderMonkey, for example, does “trial inlining” that allows callers
to pass down a bit of memory for potential inline candidate callees to record
their inline caches. Instead of each function holding its own ICScript, the
caller allocates a unique ICScript for that potential-inline call-site. This
gives each callee function (at least?) one level of call context.&lt;/p&gt;

&lt;p&gt;Later, when inlining the callee into the caller, we don’t have other callers’
type information polluting the IR builder (or whatever reads the profiles).&lt;/p&gt;

&lt;h3 id=&quot;bytecode-inlining&quot;&gt;Bytecode inlining&lt;/h3&gt;

&lt;p&gt;JavaScriptCore handles this by inlining bytecode into other bytecode. This is a
gnarly transformation but gives the interpreter, even (!) access to call
context. On tier-up to the compiler, all the inlining decisions have been made
already.&lt;/p&gt;

&lt;h3 id=&quot;early-tier-with-counters&quot;&gt;Early tier with counters&lt;/h3&gt;

&lt;p&gt;HotSpot handles this with multiple tiers. The interpreter tiers up to the
client compiler, C1. C1 profiles branch and call targets in compiled code. C1
may eventually recompile based on this new information. C1 may eventually tier
up to C2, which copies C1 inlining decisions. This way, we get call context in
profiles via inlining.&lt;/p&gt;

&lt;h3 id=&quot;inline-and-analyze-and-hope&quot;&gt;Inline and analyze and hope&lt;/h3&gt;

&lt;p&gt;One last thing you could do is just trust your type inference and branch
folding in the optimizer. You could inline and do polymorphic specialization in
the callee when building the IR, then hope that your branch pruning
monomorphizes the inlined callee. It’s a little wasteful because the
polymorphic code is built “for nothing”, but it might work fine?&lt;/p&gt;

&lt;!--
### Inline and merge profiles
--&gt;

&lt;p&gt;Okay, onto the collected notes and half-baked commentary. Here’s a survey of a
bunch of JIT compilers and how they reason about inlining heuristics.&lt;/p&gt;

&lt;h3 id=&quot;thanks&quot;&gt;Thanks&lt;/h3&gt;

&lt;p&gt;But before we get into that, thanks to Iain Ireland, CF Bolz-Tereick, and Ian
Rogers for feedback on this blog post!&lt;/p&gt;

&lt;h2 id=&quot;the-survey-bits-and-bobbles&quot;&gt;The survey: bits and bobbles&lt;/h2&gt;

&lt;p&gt;What follows is mostly a “bits and bobbles” section a la &lt;a href=&quot;https://www.philipzucker.com/&quot;&gt;Phil
Zucker&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We’ll start with &lt;a href=&quot;https://github.com/facebookincubator/cinderx&quot;&gt;Cinder&lt;/a&gt;, because when I wrote Cinder’s inliner I
added only the simplest heuristics, mostly “don’t inline” signals. Over time,
after I left, people tuned it a bit more.&lt;/p&gt;

&lt;h3 id=&quot;cinder&quot;&gt;Cinder&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/facebookincubator/cinderx/blob/88189ebf4bfd196ac7578c5076efa39bfa11f211/cinderx/Jit/hir/inliner.cpp#L341&quot;&gt;inliner&lt;/a&gt; starts from the caller CFG, walking it to find
suitable inlining candidates. Inlining candidates are only for call targets
that are known—in Cinder’s case, only for monomorphic call targets—and pass
some checks. The callee is only known by it’s function object, which includes
its bytecode. There is no IR available for the callee until we decide to inline.&lt;/p&gt;

&lt;p&gt;Most of the “can’t handle this” checks are related to argument handling. Python
has a pretty complex calling convention, so if the caller/callee have not
agreed on how the arguments should be passed through, the inliner doesn’t care
to try and figure it out on its own. That is the responsibility of &lt;a href=&quot;https://github.com/facebookincubator/cinderx/blob/88189ebf4bfd196ac7578c5076efa39bfa11f211/cinderx/Jit/hir/simplify.cpp#L1765&quot;&gt;other parts
of the compiler&lt;/a&gt;. Things in this &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;canInline&lt;/code&gt;
function could be considered “TODO”.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;canInline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Function&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;caller&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AbstractCall&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;call_instr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BorrowedRef&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PyFunctionObject&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;call_instr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;func&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fail&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;](&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InlineFailureType&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;failure_type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;dlogAndCollectFailureStats&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;caller&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;call_instr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;failure_type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;func&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;func_kwdefaults&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nullptr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fail&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InlineFailureType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kHasKwdefaults&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;BorrowedRef&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PyCodeObject&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;code&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;func&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;func_code&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;JIT_CHECK&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PyCode_Check&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;code&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Expected PyCodeObject&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;code&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;co_kwonlyargcount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fail&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InlineFailureType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kHasKwOnlyArgs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Failures are logged so they can be analyzed. If the Cinder team determines that
there is some very frequent case they should handle, they will find out from
the logs.&lt;/p&gt;

&lt;p&gt;The inliner collects all candidate call instructions in one pass over the CFG.
It loads the configurable “cost limit” from the options struct. Then it does
one pass over the inlining candidates vector, inlining until it (maybe) hits
the cost limit.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;size_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cost_limit&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getConfig&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inliner_cost_limit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;size_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;codeCost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;irfunc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;code&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// Inline as many calls as possible, starting from the top of the function and&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// working down.&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;call&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to_inline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;BorrowedRef&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PyCodeObject&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;call_code&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;call&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;func&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;func_code&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;size_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;new_cost&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;codeCost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;call_code&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;new_cost&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cost_limit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;LOG_INLINER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;&quot;Inliner reached cost limit of {} when trying to inline {} into {}, &quot;&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;&quot;inlining stopping early&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;new_cost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;funcFullname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;call&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;func&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;irfunc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fullname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;new_cost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;inlineFunctionCall&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;irfunc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;call&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;// We need to reflow types after every inline to propagate new type&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// information from the callee.&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;reflowTypes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;irfunc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It does some graph maintenance work after inlining these calls, but that’s it.&lt;/p&gt;

&lt;p&gt;This approach gets a surprising amount of utility for being so simple: it
inlines constants (quite a few methods look like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;def foo(): return 5&lt;/code&gt;), small
methods, and (at least, as far as I can remember) shrinks the compiled code
size. All for very little compile time overhead.&lt;/p&gt;

&lt;p&gt;There’s one other “standalone” Python JIT out there, PyPy. So we should look at
that too.&lt;/p&gt;

&lt;h3 id=&quot;pypy&quot;&gt;PyPy&lt;/h3&gt;

&lt;p&gt;There are two inliners in PyPy. One is inside the RPython to C translation
pipeline, which acts more like an ahead-of-time compiler&lt;sup id=&quot;fnref:rpython-inliner&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:rpython-inliner&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.
Then there is the tracing JIT bit, which has its own optimizer and heuristics.
We’re going to look at the latter.&lt;/p&gt;

&lt;p&gt;I talked to &lt;a href=&quot;https://cfbolz.de/&quot;&gt;CF Bolz-Tereick&lt;/a&gt; about the inliner and their
comment was that PyPy’s inlining heuristic is “yes”. There are a couple of
exceptions, such as not inlining recursive functions or functions with loops.
But the basic idea of tracing includes tracing through call instructions, which
naturally means that you are “inlining”.&lt;/p&gt;

&lt;p&gt;PyPy also does this neat thing where they treat frame pushes like normal
allocation. Frame pushes, frame reads, and frame writes get written to the
trace like normal object memory traffic and can get optimized away like other
field reads and writes. This means that they can “just” use DCE to eliminate
frame pushes and pops, whereas Cinder has some complicated mechanism to do it
(which is my fault).&lt;/p&gt;

&lt;p&gt;TODO get more details here&lt;/p&gt;

&lt;h3 id=&quot;v8&quot;&gt;V8&lt;/h3&gt;

&lt;p&gt;V8 is a JS engine and it has over the years had many execution approaches.
We’ll look at three of them since they all have or had their place in the
history:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Hydrogen was the first real SSA IR and it looks very familiar to me, having
worked on Cinder and now ZJIT. It is now defunct.&lt;/li&gt;
  &lt;li&gt;Turbofan was the replacement, going full Sea of Nodes. In the grand scheme of
things it is a pretty fast compiler, but it does not hold back from doing some
expensive rewrites. This was recently rewritten from Sea of Nodes to a mode
traditional CFG and nicknamed Turboshaft.&lt;/li&gt;
  &lt;li&gt;Maglev is meant to coexist alongside Turbofan, preferring to speculate a little
more eagerly and do fewer incremental rewrites in the name of compile
time.&lt;sup id=&quot;fnref:turbolev&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:turbolev&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They also each inline at different times in the pipeline, which made for a fun
time trying to understand the different codebases.&lt;/p&gt;

&lt;h4 id=&quot;v8-hydrogen&quot;&gt;V8 Hydrogen&lt;/h4&gt;

&lt;p&gt;Inlining happens during Hydrogen graph building&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/tekknolagi/v8/blob/a969ab67f8e1e7475d9b26468225c3a772890c64/src/crankshaft/hydrogen.cc#L9236&quot;&gt;https://github.com/tekknolagi/v8/blob/a969ab67f8e1e7475d9b26468225c3a772890c64/src/crankshaft/hydrogen.cc#L9236&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Don’t store function bytecode of all functions; need to re-parse callee &lt;em&gt;text
source&lt;/em&gt; to inline&lt;/p&gt;

&lt;p&gt;Heuristics &lt;a href=&quot;https://github.com/tekknolagi/v8/blob/a969ab67f8e1e7475d9b26468225c3a772890c64/src/crankshaft/hydrogen.cc#L7807&quot;&gt;https://github.com/tekknolagi/v8/blob/a969ab67f8e1e7475d9b26468225c3a772890c64/src/crankshaft/hydrogen.cc#L7807&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;something about native context&lt;/li&gt;
  &lt;li&gt;check callee AST size against configurable limit&lt;/li&gt;
  &lt;li&gt;check inlining depth against configurable limit&lt;/li&gt;
  &lt;li&gt;don’t inline recursive functions&lt;/li&gt;
  &lt;li&gt;check current cumulative method size (as tracked by AST node count) against
configurable limit&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;v8-turbofan&quot;&gt;V8 TurboFan&lt;/h4&gt;

&lt;p&gt;&lt;a href=&quot;https://docs.google.com/document/d/1VoYBhpDhJC4VlqMXCKvae-8IGuheBGxy32EOgC2LnT8/edit&quot;&gt;https://docs.google.com/document/d/1VoYBhpDhJC4VlqMXCKvae-8IGuheBGxy32EOgC2LnT8/edit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/v8/v8/blob/036842f4841326130a40adfcff38f85a9b4cd30a/src/compiler/js-inlining-heuristic.h#L14&quot;&gt;https://github.com/v8/v8/blob/036842f4841326130a40adfcff38f85a9b4cd30a/src/compiler/js-inlining-heuristic.h#L14&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Find candidates &lt;a href=&quot;https://github.com/v8/v8/blob/036842f4841326130a40adfcff38f85a9b4cd30a/src/compiler/js-inlining-heuristic.cc#L134&quot;&gt;https://github.com/v8/v8/blob/036842f4841326130a40adfcff38f85a9b4cd30a/src/compiler/js-inlining-heuristic.cc#L134&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Can inline &lt;a href=&quot;https://github.com/v8/v8/blob/036842f4841326130a40adfcff38f85a9b4cd30a/src/compiler/js-inlining-heuristic.cc#L75&quot;&gt;https://github.com/v8/v8/blob/036842f4841326130a40adfcff38f85a9b4cd30a/src/compiler/js-inlining-heuristic.cc#L75&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Force inline small functions &lt;a href=&quot;https://github.com/v8/v8/blob/036842f4841326130a40adfcff38f85a9b4cd30a/src/compiler/js-inlining-heuristic.cc#L309&quot;&gt;https://github.com/v8/v8/blob/036842f4841326130a40adfcff38f85a9b4cd30a/src/compiler/js-inlining-heuristic.cc#L309&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Loop over sorted (by comparator) list &lt;a href=&quot;https://github.com/v8/v8/blob/036842f4841326130a40adfcff38f85a9b4cd30a/src/compiler/js-inlining-heuristic.cc#L847&quot;&gt;https://github.com/v8/v8/blob/036842f4841326130a40adfcff38f85a9b4cd30a/src/compiler/js-inlining-heuristic.cc#L847&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;v8-maglev&quot;&gt;V8 Maglev&lt;/h4&gt;

&lt;p&gt;When optimizing, add call instructions to the inline candidates list: &lt;a href=&quot;https://github.com/v8/v8/blob/1a391f98cc7a9196369f2d6cab7df35ffbe92c08/src/maglev/maglev-graph-optimizer.cc#L1271&quot;&gt;https://github.com/v8/v8/blob/1a391f98cc7a9196369f2d6cab7df35ffbe92c08/src/maglev/maglev-graph-optimizer.cc#L1271&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;ProcessResult&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MaglevGraphOptimizer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;VisitCall&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Call&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                              &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ProcessingState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bytecode_length&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;shared&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GetBytecodeArray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;broker&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;call_frequency&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bytecode_length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loop_depth_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.5&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

  &lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;is_small_function&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;bytecode_length&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;reducer_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graph&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;compilation_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;flags&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_eager_inlined_bytecode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;MaglevCallSiteInfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;call_site&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;reducer_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;zone&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;New&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MaglevCallSiteInfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;MaglevCallerDetails&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
          &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
          &lt;span class=&quot;n&quot;&gt;is_small_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;call_frequency&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
          &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;score&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bytecode_length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;reducer_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PushInlineCandidate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;call_site&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/v8/v8/blob/036842f4841326130a40adfcff38f85a9b4cd30a/src/maglev/maglev-inlining.h#L36&quot;&gt;https://github.com/v8/v8/blob/036842f4841326130a40adfcff38f85a9b4cd30a/src/maglev/maglev-inlining.h#L36&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Unlike for example Cinder, Maglev looks like it does not have a lot of
restrictions about what can get inlined into what, so its “can inline” signal
is about budget. Actually two budgets: small budget and normal budget.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MaglevInliner&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CanInlineCall&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// We stop inlining entirely if the small budget is exhausted.&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// Inlining decisions after that become bad if we stop inlining small&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// functions, but keep inlining large ones.&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graph_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inlineable_calls&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;empty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;
         &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graph_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;total_inlined_bytecode_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;
              &lt;span class=&quot;n&quot;&gt;max_inlined_bytecode_size_cumulative&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt;
          &lt;span class=&quot;n&quot;&gt;graph_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;total_inlined_bytecode_size_small&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;
              &lt;span class=&quot;n&quot;&gt;max_inlined_bytecode_size_small_total&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then its inlining loop is a greedy walk of the to-inline queue checking
candidate sizes.&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MaglevInliner&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InlineCallSites&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;DCHECK&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CanInlineCall&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graph_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inlineable_calls&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;empty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// pop from inlineable_calls&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;MaglevCallSiteInfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;call_site&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ChooseNextCallSite&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;

    &lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;is_small_with_heapnum_input_outputs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;IsSmallWithHeapNumberInputsOutputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;call_site&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graph_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;total_inlined_bytecode_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;max_inlined_bytecode_size_cumulative&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;c1&quot;&gt;// We ran out of budget. Checking if this is a small-ish function that we&lt;/span&gt;
      &lt;span class=&quot;c1&quot;&gt;// can still inline.&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graph_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;total_inlined_bytecode_size_small&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
          &lt;span class=&quot;n&quot;&gt;max_inlined_bytecode_size_small_total&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;graph_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;compilation_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_could_not_inline_all_candidates&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

      &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;is_small_with_heapnum_input_outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;graph_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;compilation_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_could_not_inline_all_candidates&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// Not that we don&apos;t break just rather just continue: next candidates&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// might be inlineable.&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;continue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;InliningResult&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;BuildInlineFunction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;call_site&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;is_small_with_heapnum_input_outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It runs this loop (which drains the queue) interleaved with the optimizer
(which populates the queue).&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MaglevInliner&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graph_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inlineable_calls&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;empty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CanInlineCall&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InlineCallSites&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;RunOptimizer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Confusingly, though, the optimizer also calls another function called
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CanInlineCall&lt;/code&gt; which checks if it legally can inline:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;skip recursion&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/v8/v8/blob/1a391f98cc7a9196369f2d6cab7df35ffbe92c08/src/objects/shared-function-info-inl.h#L421&quot;&gt;https://github.com/v8/v8/blob/1a391f98cc7a9196369f2d6cab7df35ffbe92c08/src/objects/shared-function-info-inl.h#L421&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;not called enough (min call frequency)&lt;/li&gt;
  &lt;li&gt;bytecode too big&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bool MaglevGraphBuilder::ShouldEagerInlineCall(&lt;/code&gt; &lt;del&gt;appears unused? / dead
declaration?&lt;/del&gt; maybe src/maglev/maglev-graph-builder.cc is just not working on
github search&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MaybeReduceResult MaglevGraphBuilder::TryBuildCallKnownJSFunction(&lt;/code&gt; &lt;del&gt;also
unused / dead declaration&lt;/del&gt; same&lt;/p&gt;

&lt;h3 id=&quot;javascriptcore&quot;&gt;JavaScriptCore&lt;/h3&gt;

&lt;p&gt;JavaScriptCore is funky! Unlike these other compilers that do inlining in their
neat little SSA IRs, JSC inlines &lt;em&gt;at the bytecode level&lt;/em&gt;&lt;sup id=&quot;fnref:fil&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:fil&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. This is their way of
making sure that they get at least one level of call context into their
interpreter inline caches, which will eventually give better information to the
compiler.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Bytecode inlining
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://github.com/WebKit/WebKit/blob/709c3895afd71e0836f8c8be7393e44d41fab7e1/Source/JavaScriptCore/bytecode/CodeBlock.cpp#L2453&quot;&gt;https://github.com/WebKit/WebKit/blob/709c3895afd71e0836f8c8be7393e44d41fab7e1/Source/JavaScriptCore/bytecode/CodeBlock.cpp#L2453&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;DFG
    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;https://github.com/WebKit/WebKit/blob/709c3895afd71e0836f8c8be7393e44d41fab7e1/Source/JavaScriptCore/dfg/DFGCapabilities.cpp#L76&quot;&gt;https://github.com/WebKit/WebKit/blob/709c3895afd71e0836f8c8be7393e44d41fab7e1/Source/JavaScriptCore/dfg/DFGCapabilities.cpp#L76&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://github.com/WebKit/WebKit/blob/917854a9c245b87b333e23ed4b195505d574a333/Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp#L1703&quot;&gt;https://github.com/WebKit/WebKit/blob/917854a9c245b87b333e23ed4b195505d574a333/Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp#L1703&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://github.com/WebKit/WebKit/blob/917854a9c245b87b333e23ed4b195505d574a333/Source/JavaScriptCore/bytecode/CallLinkStatus.cpp#L294&quot;&gt;https://github.com/WebKit/WebKit/blob/917854a9c245b87b333e23ed4b195505d574a333/Source/JavaScriptCore/bytecode/CallLinkStatus.cpp#L294&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;https://github.com/WebKit/WebKit/blob/d919344236c47b610930636d3310f00380624d43/Source/JavaScriptCore/bytecode/InlineCallFrame.h&quot;&gt;https://github.com/WebKit/WebKit/blob/d919344236c47b610930636d3310f00380624d43/Source/JavaScriptCore/bytecode/InlineCallFrame.h&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;JSC only inlines based on bytecode profile information, and only inlines
bytecode??&lt;/p&gt;

&lt;p&gt;TODO find better sources for bytecode inlining&lt;/p&gt;

&lt;!--
Compile plan
https://github.com/WebKit/WebKit/blob/709c3895afd71e0836f8c8be7393e44d41fab7e1/Source/JavaScriptCore/dfg/DFGPlan.cpp#L186
--&gt;

&lt;h3 id=&quot;spidermonkey&quot;&gt;SpiderMonkey&lt;/h3&gt;

&lt;p&gt;SpiderMonkey has another way of getting that call context without doing bytecode
inlining: they add call context to their inline caches. Methods can pass down
an &lt;em&gt;ICScript&lt;/em&gt; to their callees where the callee writes its inline cache
information. Then, when compiling, the callee is more likely to be
monomorphized.&lt;/p&gt;

&lt;p&gt;Wasm&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/mozilla-firefox/firefox/blob/438a3ce10eb77fb50d968463b7741117aec5bb4a/js/src/wasm/WasmHeuristics.h#L213&quot;&gt;https://github.com/mozilla-firefox/firefox/blob/438a3ce10eb77fb50d968463b7741117aec5bb4a/js/src/wasm/WasmHeuristics.h#L213&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;SpiderMonkey ICScript&lt;/p&gt;

&lt;h3 id=&quot;wasmtime-and-cranelift&quot;&gt;Wasmtime and Cranelift&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://fitzgen.com/2025/11/19/inliner.html&quot;&gt;https://fitzgen.com/2025/11/19/inliner.html&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;hotspot&quot;&gt;HotSpot&lt;/h3&gt;

&lt;p&gt;Plan: run in interpreter; tier up to C1; profile call targets; inline in C1;
profile branch counts; tier up to C2, which copies C1 inlining decisions in
bytecode parser&lt;/p&gt;

&lt;p&gt;HotSpot C2&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/openjdk/jdk/blob/a05d5d2514c835f2bfeaf7a8c7df0ac241f0177f/src/hotspot/share/opto/bytecodeInfo.cpp#L116&quot;&gt;https://github.com/openjdk/jdk/blob/a05d5d2514c835f2bfeaf7a8c7df0ac241f0177f/src/hotspot/share/opto/bytecodeInfo.cpp#L116&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/openjdk/jdk/blob/497dca2549a9829530670576115bf4b8fab386b3/src/hotspot/share/opto/bytecodeInfo.cpp#L197&quot;&gt;https://github.com/openjdk/jdk/blob/497dca2549a9829530670576115bf4b8fab386b3/src/hotspot/share/opto/bytecodeInfo.cpp#L197&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/openjdk/jdk/blob/497dca2549a9829530670576115bf4b8fab386b3/src/hotspot/share/opto/parse.hpp#L42&quot;&gt;https://github.com/openjdk/jdk/blob/497dca2549a9829530670576115bf4b8fab386b3/src/hotspot/share/opto/parse.hpp#L42&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/openjdk/jdk/blob/497dca2549a9829530670576115bf4b8fab386b3/src/hotspot/share/opto/doCall.cpp#L185&quot;&gt;https://github.com/openjdk/jdk/blob/497dca2549a9829530670576115bf4b8fab386b3/src/hotspot/share/opto/doCall.cpp#L185&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Not too small&lt;/p&gt;

&lt;p&gt;Walk up the call stack to figure out what to compile&lt;/p&gt;

&lt;p&gt;Handling the right thing to inline: def foo(a) = a.each {|x| x }
want to compile &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;foo&lt;/code&gt;, inline each, inline block, not compile block separately
(probably)&lt;/p&gt;

&lt;p&gt;HotSpot C1&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://bernsteinbear.com/assets/img/design-hotspot-client-compiler.pdf&quot;&gt;https://bernsteinbear.com/assets/img/design-hotspot-client-compiler.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/openjdk/jdk/blob/d854a04231a437a6af36ae65780961f40f336343/src/hotspot/share/c1/c1_GraphBuilder.cpp#L755&quot;&gt;https://github.com/openjdk/jdk/blob/d854a04231a437a6af36ae65780961f40f336343/src/hotspot/share/c1/c1_GraphBuilder.cpp#L755&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/openjdk/jdk/blob/d854a04231a437a6af36ae65780961f40f336343/src/hotspot/share/c1/c1_GraphBuilder.cpp#L3854&quot;&gt;https://github.com/openjdk/jdk/blob/d854a04231a437a6af36ae65780961f40f336343/src/hotspot/share/c1/c1_GraphBuilder.cpp#L3854&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;skip callees with exception handlers (unless explicitly allowed with a CLI flag)&lt;/li&gt;
  &lt;li&gt;skip synchronized callees (unless explicitly allowed with a CLI flag)&lt;/li&gt;
  &lt;li&gt;skip classes with unlinked callees&lt;/li&gt;
  &lt;li&gt;skip uninitialized classes&lt;/li&gt;
  &lt;li&gt;…&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;heuristics:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;max inline level (default 9)&lt;/li&gt;
  &lt;li&gt;max recursive inline level (default 1)&lt;/li&gt;
  &lt;li&gt;callee bytecode size (max for top level is 35 bytecodes, but falls off by 10% per inline level)&lt;/li&gt;
  &lt;li&gt;callee stack usage (max of 10 slots)
    &lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;      &lt;span class=&quot;c1&quot;&gt;// Additional condition to limit stack usage for non-recursive calls.&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;callee_recursive_level&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;
          &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;callee&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_stack&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;callee&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_locals&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;callee&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;size_of_parameters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;C1InlineStackLimit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;INLINE_BAILOUT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;callee uses too much stack&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;max total method size (default 8000 bytecodes)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;truffleruby&quot;&gt;TruffleRuby&lt;/h3&gt;

&lt;p&gt;TruffleRuby uses weighted compile queue&lt;/p&gt;

&lt;p&gt;Graal
&lt;a href=&quot;https://ieeexplore.ieee.org/document/8661171&quot;&gt;https://ieeexplore.ieee.org/document/8661171&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;net&quot;&gt;.NET&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/dotnet/runtime/blob/2d638dc1179164a08d9387cbe6354fe2b7e4d823/docs/design/coreclr/jit/inlining-plans.md&quot;&gt;https://github.com/dotnet/runtime/blob/2d638dc1179164a08d9387cbe6354fe2b7e4d823/docs/design/coreclr/jit/inlining-plans.md&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/dotnet/runtime/blob/0b3f3ab1ecf4de06459e5f0e2b7cb3baf70ef981/src/coreclr/jit/inline.def#L94&quot;&gt;https://github.com/dotnet/runtime/blob/0b3f3ab1ecf4de06459e5f0e2b7cb3baf70ef981/src/coreclr/jit/inline.def#L94&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/dotnet/runtime/blob/0b3f3ab1ecf4de06459e5f0e2b7cb3baf70ef981/src/coreclr/jit/inlinepolicy.cpp&quot;&gt;https://github.com/dotnet/runtime/blob/0b3f3ab1ecf4de06459e5f0e2b7cb3baf70ef981/src/coreclr/jit/inlinepolicy.cpp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/dotnet/runtime/blob/0b3f3ab1ecf4de06459e5f0e2b7cb3baf70ef981/docs/design/coreclr/jit/inline-size-estimates.md?plain=1#L5&quot;&gt;https://github.com/dotnet/runtime/blob/0b3f3ab1ecf4de06459e5f0e2b7cb3baf70ef981/docs/design/coreclr/jit/inline-size-estimates.md?plain=1#L5&lt;/a&gt;
&lt;a href=&quot;https://github.com/dotnet/runtime/blob/0b3f3ab1ecf4de06459e5f0e2b7cb3baf70ef981/src/coreclr/jit/fginline.cpp&quot;&gt;https://github.com/dotnet/runtime/blob/0b3f3ab1ecf4de06459e5f0e2b7cb3baf70ef981/src/coreclr/jit/fginline.cpp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/dotnet/runtime/issues/10303&quot;&gt;https://github.com/dotnet/runtime/issues/10303&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/AndyAyersMS/PerformanceExplorer/blob/master/notes/notes-aug-2016.md&quot;&gt;https://github.com/AndyAyersMS/PerformanceExplorer/blob/master/notes/notes-aug-2016.md&lt;/a&gt;
&lt;!--
LSRA heuristics
https://github.com/dotnet/runtime/blob/2d638dc1179164a08d9387cbe6354fe2b7e4d823/docs/design/coreclr/jit/lsra-heuristic-tuning.md
--&gt;&lt;/p&gt;

&lt;h3 id=&quot;dart&quot;&gt;Dart&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/dart-lang/sdk/blob/391212f3da8cc0790fc532d367549042216bd5ca/runtime/vm/compiler/backend/inliner.cc#L49&quot;&gt;https://github.com/dart-lang/sdk/blob/391212f3da8cc0790fc532d367549042216bd5ca/runtime/vm/compiler/backend/inliner.cc#L49&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/dart-lang/sdk/blob/391212f3da8cc0790fc532d367549042216bd5ca/runtime/vm/compiler/backend/inliner.cc#L1023&quot;&gt;https://github.com/dart-lang/sdk/blob/391212f3da8cc0790fc532d367549042216bd5ca/runtime/vm/compiler/backend/inliner.cc#L1023&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://web.archive.org/web/20170830093403id_/https://link.springer.com/content/pdf/10.1007/978-3-540-78791-4_5.pdf&quot;&gt;https://web.archive.org/web/20170830093403id_/https://link.springer.com/content/pdf/10.1007/978-3-540-78791-4_5.pdf&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;DEFINE_FLAG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;deoptimization_counter_inlining_threshold&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;How many times we allow deoptimization before we stop inlining.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DEFINE_FLAG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;trace_inlining&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Trace inlining&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DEFINE_FLAG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;charp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inlining_filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;nullptr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Inline only in named function&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// Flags for inlining heuristics.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DEFINE_FLAG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;inline_getters_setters_smaller_than&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;Always inline getters and setters that have fewer instructions&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DEFINE_FLAG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;inlining_depth_threshold&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;Inline function calls up to threshold nesting depth&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DEFINE_FLAG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;inlining_size_threshold&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;mi&quot;&gt;25&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;Always inline functions that have threshold or fewer instructions&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DEFINE_FLAG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;inlining_callee_call_sites_threshold&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;Always inline functions containing threshold or fewer calls.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DEFINE_FLAG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;inlining_callee_size_threshold&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;mi&quot;&gt;160&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;Do not inline callees larger than threshold&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DEFINE_FLAG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;inlining_small_leaf_size_threshold&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;Do not inline leaf callees larger than threshold&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DEFINE_FLAG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;inlining_caller_size_threshold&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;mi&quot;&gt;50000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;Stop inlining once caller reaches the threshold.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DEFINE_FLAG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;inlining_hotness&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;Inline only hotter calls, in percents (0 .. 100); &quot;&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;default 10%: calls above-equal 10% of max-count are inlined.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DEFINE_FLAG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;inlining_recursion_depth_threshold&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;Inline recursive function calls up to threshold recursion depth.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DEFINE_FLAG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;max_inlined_per_depth&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;mi&quot;&gt;500&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
            &lt;span class=&quot;s&quot;&gt;&quot;Max. number of inlined calls per depth&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;/assets/img/adaptive-inline.pdf&quot;&gt;An adaptive strategy for inline substitution&lt;/a&gt; (PDF)&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  &lt;span class=&quot;c1&quot;&gt;// Inlining heuristics based on Cooper et al. 2008.&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;InliningDecision&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;ShouldWeInline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Function&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;callee&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                  &lt;span class=&quot;kt&quot;&gt;intptr_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;instr_count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                  &lt;span class=&quot;kt&quot;&gt;intptr_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;call_site_count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// Pragma or size heuristics.&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inliner_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;AlwaysInline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;callee&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;InliningDecision&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Yes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;AlwaysInline&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inlined_size_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FLAG_inlining_caller_size_threshold&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;c1&quot;&gt;// Prevent caller methods becoming humongous and thus slow to compile.&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;InliningDecision&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;No&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;--inlining-caller-size-threshold&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;instr_count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FLAG_inlining_callee_size_threshold&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;c1&quot;&gt;// Prevent inlining of callee methods that exceed certain size.&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;InliningDecision&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;No&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;--inlining-callee-size-threshold&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// Inlining depth.&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;callee_inlining_depth&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;callee&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inlining_depth&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;callee_inlining_depth&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;callee_inlining_depth&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inlining_depth_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;FLAG_inlining_depth_threshold&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;InliningDecision&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;No&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;--inlining-depth-threshold&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// Situation instr_count == 0 denotes no counts have been computed yet.&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// In that case, we say ok to the early heuristic and come back with the&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// late heuristic.&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;instr_count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;InliningDecision&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Yes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;need to count first&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;instr_count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FLAG_inlining_size_threshold&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;InliningDecision&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Yes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;--inlining-size-threshold&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;call_site_count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FLAG_inlining_callee_call_sites_threshold&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;InliningDecision&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Yes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;--inlining-callee-call-sites-threshold&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;InliningDecision&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;No&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;default&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;!--
CompileType
https://github.com/dart-lang/sdk/blob/d3c0a3768bd4be4a92886e136811b5f748b63ddd/runtime/vm/compiler/backend/compile_type.h#L43
--&gt;

&lt;!--
intrinsics
https://github.com/dart-lang/sdk/blob/d3c0a3768bd4be4a92886e136811b5f748b63ddd/runtime/vm/compiler/call_specializer.cc#L3229
--&gt;

&lt;h3 id=&quot;hhvm&quot;&gt;HHVM&lt;/h3&gt;

&lt;p&gt;tracelet based&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/facebook/hhvm/blob/eeba7ad1ffa372a9b8cc9d1ec7f5295d45627009/hphp/runtime/vm/jit/inlining-decider.h#L89&quot;&gt;https://github.com/facebook/hhvm/blob/eeba7ad1ffa372a9b8cc9d1ec7f5295d45627009/hphp/runtime/vm/jit/inlining-decider.h#L89&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  &lt;span class=&quot;c1&quot;&gt;// Refuse if the cost exceeds our thresholds.&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// We measure the cost of inlining each callstack and stop when it exceeds a&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// certain threshold.  (Note that we do not measure the total cost of all the&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// inlined calls for a given caller---just the cost of each nested stack.)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;costOfInlining&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;callerSk&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;callee&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;regionAndUnit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;annotationsPtr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Cfg&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;HHIR&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;AlwaysInlineVasmCostLimit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;accept&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;folly&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sformat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;cost={} within always-inline limit&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;instrSize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;irgs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;budgetBCInstrs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;refuse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;folly&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sformat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;s&quot;&gt;&quot;exhausted bytecode budget: budgetBCInstrs={}, regionSize={}&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;irgs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;budgetBCInstrs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;instrSize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()));&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxTotalCost&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adjustedMaxVasmCost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;irgs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inlineDepth&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;irgs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxCost&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxTotalCost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Cfg&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;HHIR&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InliningUseStackedCost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;maxCost&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;irgs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inlineState&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;baseProfCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s_baseProfCount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;callerProfCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;irgen&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;curProfCount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;irgs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;calleeProfCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;irgen&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;calleeProfCount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;irgs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxCost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;depth&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inlineDepth&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;irgs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;refuse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;folly&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sformat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;s&quot;&gt;&quot;too expensive: cost={} : maxCost={} : &quot;&lt;/span&gt;
      &lt;span class=&quot;s&quot;&gt;&quot;baseProfCount={} : callerProfCount={} : calleeProfCount={} : depth={}&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxCost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;baseProfCount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;callerProfCount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;calleeProfCount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;depth&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;accept&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;folly&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sformat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;small region with return: cost={} : &quot;&lt;/span&gt;
                               &lt;span class=&quot;s&quot;&gt;&quot;maxTotalCost={} : maxCost={} : baseProfCount={}&quot;&lt;/span&gt;
                               &lt;span class=&quot;s&quot;&gt;&quot; : callerProfCount={} : calleeProfCount={}&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                               &lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxTotalCost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxCost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;baseProfCount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                               &lt;span class=&quot;n&quot;&gt;callerProfCount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;calleeProfCount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;art&quot;&gt;ART&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/LineageOS/android_art/blob/8ce603e0c68899bdfbc9cd4c50dcc65bbf777982/compiler/optimizing/inliner.h&quot;&gt;https://github.com/LineageOS/android_art/blob/8ce603e0c68899bdfbc9cd4c50dcc65bbf777982/compiler/optimizing/inliner.h&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;// Instruction limit to control memory.&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;constexpr&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;size_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kMaximumNumberOfTotalInstructions&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// Maximum number of instructions for considering a method small,&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// which we will always try to inline if the other non-instruction limits&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// are not reached.&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;constexpr&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;size_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kMaximumNumberOfInstructionsForSmallMethod&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// Limit the number of dex registers that we accumulate while inlining&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// to avoid creating large amount of nested environments.&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;constexpr&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;size_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kMaximumNumberOfCumulatedDexRegisters&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// Limit recursive call inlining, which do not benefit from too&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// much inlining compared to code locality.&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;constexpr&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;size_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kMaximumNumberOfRecursiveCalls&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// Limit recursive polymorphic call inlining to prevent code bloat, since it can quickly get out of&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// hand in the presence of multiple Wrapper classes. We set this to 0 to disallow polymorphic&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// recursive calls at all.&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;constexpr&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;size_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kMaximumNumberOfPolymorphicRecursiveCalls&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// Controls the use of inline caches in AOT mode.&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;constexpr&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kUseAOTInlineCaches&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// Controls the use of inlining try catches.&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;constexpr&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kInlineTryCatches&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HInliner&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;UpdateInliningBudget&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;total_number_of_instructions_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kMaximumNumberOfTotalInstructions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// Always try to inline small methods.&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;inlining_budget_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kMaximumNumberOfInstructionsForSmallMethod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;inlining_budget_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;kMaximumNumberOfInstructionsForSmallMethod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;kMaximumNumberOfTotalInstructions&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;total_number_of_instructions_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HInliner&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsInliningEncouraged&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HInvoke&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;invoke_instruction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                    &lt;span class=&quot;n&quot;&gt;ArtMethod&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;method&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                    &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CodeItemDataAccessor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;accessor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CountRecursiveCallsOf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;method&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kMaximumNumberOfRecursiveCalls&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;LOG_FAIL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stats_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MethodCompilationStat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kNotInlinedRecursiveBudget&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Method &quot;&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;method&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PrettyMethod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot; is not inlined because it has reached its recursive call budget.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;kt&quot;&gt;size_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inline_max_code_units&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;codegen_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GetCompilerOptions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GetInlineMaxCodeUnits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;accessor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InsnsSizeInCodeUnits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inline_max_code_units&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;LOG_FAIL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stats_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MethodCompilationStat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kNotInlinedCodeItem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Method &quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;method&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PrettyMethod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot; is not inlined because its code item is too big: &quot;&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;accessor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InsnsSizeInCodeUnits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot; &amp;gt; &quot;&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inline_max_code_units&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graph_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsCompilingBaseline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;accessor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InsnsSizeInCodeUnits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CompilerOptions&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kBaselineInlineMaxCodeUnits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;LOG_FAIL_NO_STAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Reached baseline maximum code unit for inlining  &quot;&lt;/span&gt;
                       &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;method&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PrettyMethod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;outermost_graph_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SetUsefulOptimizing&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;invoke_instruction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GetBlock&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GetLastInstruction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsThrow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;LOG_FAIL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stats_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MethodCompilationStat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kNotInlinedEndsWithThrow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Method &quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;method&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PrettyMethod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot; is not inlined because its block ends with a throw&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outermost_graph_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsCompilingBaseline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;current&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsInvokeVirtual&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;current&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsInvokeInterface&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ProfilingInfoBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsInlineCacheUseful&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;current&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;AsInvoke&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;codegen_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maximum_inlining_depth_for_baseline&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;InlineCache&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MaxDexPcEncodingDepth&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
          &lt;span class=&quot;n&quot;&gt;outermost_graph_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GetArtMethod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
          &lt;span class=&quot;n&quot;&gt;codegen_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GetCompilerOptions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GetInlineMaxCodeUnits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;depth_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maximum_inlining_depth_for_baseline&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;LOG_FAIL_NO_STAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Reached maximum depth for inlining in baseline compilation: &quot;&lt;/span&gt;
                       &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;depth_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot; for &quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;callee_graph&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GetArtMethod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PrettyMethod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;outermost_graph_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SetUsefulOptimizing&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;jikesrvm&quot;&gt;JikesRVM&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/JikesRVM/JikesRVM/blob/5072f19761115d987b6ee162f49a03522d36c697/rvm/src/org/jikesrvm/compilers/opt/inlining/DefaultInlineOracle.java#L55&quot;&gt;https://github.com/JikesRVM/JikesRVM/blob/5072f19761115d987b6ee162f49a03522d36c697/rvm/src/org/jikesrvm/compilers/opt/inlining/DefaultInlineOracle.java#L55&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;otherresearch&quot;&gt;Other/research&lt;/h3&gt;

&lt;p&gt;Partial inlining&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://ethz.ch/content/dam/ethz/special-interest/infk/ast-dam/documents/Theodoridis-ASPLOS22-Inlining-Paper.pdf&quot;&gt;Understanding and Exploiting Optimal Function Inlining&lt;/a&gt; (PDF)&lt;/p&gt;

&lt;p&gt;machine learning&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://ieeexplore.ieee.org/document/6495004&quot;&gt;Automatic construction of inlining heuristics using machine learning&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://ssw.jku.at/Teaching/PhDTheses/Mosaner/Dissertation%20Mosaner.pdf&quot;&gt;Machine-Learning-Based Optimization Heuristics in Dynamic Compilers&lt;/a&gt; (PDF)&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://webdocs.cs.ualberta.ca/~amaral/thesis/ErickOchoaMSc.pdf&quot;&gt;Guiding Inlining Decisions Using Post-Inlining Transformations&lt;/a&gt; (PDF)&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://karimali.ca/resources/papers/ourinliner.pdf&quot;&gt;U Can’t Inline This!&lt;/a&gt; (PDF)&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://dl.acm.org/doi/10.1145/182409.182489&quot;&gt;Towards better inlining decisions using inlining trials&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/chrisseaton/rhizome/blob/main/doc/inlining.md&quot;&gt;RhizomeRuby inlining&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://aleksandar-prokopec.com/resources/docs/prio-inliner-final.pdf&quot;&gt;An Optimization-Driven Incremental Inline Substitution Algorithm for Just-in-Time Compilers&lt;/a&gt; (PDF)&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.cresco.enea.it/SC05/schedule/pdf/pap274.pdf&quot;&gt;Automatic Tuning of Inlining Heuristics&lt;/a&gt; (PDF)&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://dl.acm.org/doi/pdf/10.1145/3563838.3567677&quot;&gt;Inlining-Benefit Prediction with Interprocedural Partial Escape Analysis&lt;/a&gt; (PDF)&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/img/virtual-inlining.pdf&quot;&gt;Inlining of Virtual Methods&lt;/a&gt; (PDF)&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/img/sable-inlining.pdf&quot;&gt;A Study of Type Analysis for Speculative Method Inlining in a JIT Environment&lt;/a&gt; (PDF)&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://dl.acm.org/doi/epdf/10.1145/351403.351416&quot;&gt;A Comparative Study of Static and Profile-Based Heuristics for Inlining&lt;/a&gt; (PDF)&lt;/p&gt;

&lt;p&gt;clusters from &lt;a href=&quot;https://llvm.org/devmtg/2022-05/slides/2022EuroLLVM-CustomBenefitDrivenInliner-in-FalconJIT.pdf&quot;&gt;Custom benefit-driven inliner in Falcon JIT&lt;/a&gt; (PDF)&lt;/p&gt;

&lt;h3 id=&quot;graal&quot;&gt;Graal&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/oracle/graal/blob/5dde777cba22a99ebe3f19745d03ddfbc35c563c/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/phases/common/inlining/policy/GreedyInliningPolicy.java&quot;&gt;https://github.com/oracle/graal/blob/5dde777cba22a99ebe3f19745d03ddfbc35c563c/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/phases/common/inlining/policy/GreedyInliningPolicy.java&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/oracle/graal/blob/5dde777cba22a99ebe3f19745d03ddfbc35c563c/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/phases/common/inlining/InliningPhase.java&quot;&gt;https://github.com/oracle/graal/blob/5dde777cba22a99ebe3f19745d03ddfbc35c563c/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/phases/common/inlining/InliningPhase.java&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/oracle/graal/blob/5dde777cba22a99ebe3f19745d03ddfbc35c563c/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/phases/common/inlining/info/elem/InlineableGraph.java#L148&quot;&gt;https://github.com/oracle/graal/blob/5dde777cba22a99ebe3f19745d03ddfbc35c563c/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/phases/common/inlining/info/elem/InlineableGraph.java#L148&lt;/a&gt;
&lt;!--
GVN
https://github.com/oracle/graal/blob/5dde777cba22a99ebe3f19745d03ddfbc35c563c/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/phases/common/DominatorBasedGlobalValueNumberingPhase.java#L132
--&gt;&lt;/p&gt;
&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:aot-split&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;There are some newer papers, especially in Java land, that try to
do a lot of analysis ahead-of-time and bundle the resulting information in
.class files. Then the JIT can read it and see more than local context.&lt;/p&gt;

      &lt;p&gt;Or, if you are an AOT compiler, you can probably do a lot more whole system
reasoning—both for time budget reasons and also because you can see more
functions at once. &lt;a href=&quot;#fnref:aot-split&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:rpython-inliner&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;&lt;a href=&quot;https://github.com/pypy/pypy/blob/bab69dca82606f9e4feaf5507f8dd8dfb3e968b2/rpython/translator/backendopt/inline.py#L144&quot;&gt;Check it
out&lt;/a&gt;
if you like. I stumbled across it by accident. &lt;a href=&quot;#fnref:rpython-inliner&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:turbolev&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See also “Turbolev”, which seems to merge Maglev (CFG) with
Turbofan (Sea of Nodes)… somehow. &lt;a href=&quot;#fnref:turbolev&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:fil&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Potentially a misunderstanding based on a private conversation. I’m
working on tracking down the implementation… &lt;a href=&quot;#fnref:fil&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
            <pubDate>Wed, 03 Jun 2026 00:00:00 +0000</pubDate>
            <niceDate>June 3, 2026</niceDate>
            <link>https://bernsteinbear.com/blog/inlining-heuristics/?utm_source=rss</link>
            <guid isPermaLink="true">https://bernsteinbear.com/blog/inlining-heuristics/</guid>
        </item>
        
        <item>
            <title>Checking assembly with Z3</title>
            <description>&lt;p&gt;Short post today. New ZJIT contributor dak2 &lt;a href=&quot;https://github.com/ruby/ruby/pull/17165&quot;&gt;submitted a
PR&lt;/a&gt; to fix an overflow bug in fixnum
division in ZJIT. We did the division fine, but lied about the type of the
result in the case of dividing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FIXNUM_MIN&lt;/code&gt; by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-1&lt;/code&gt;. You can see how this is
special-cased in CRuby:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;inline&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;rb_fix_divmod_fix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;VALUE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VALUE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VALUE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;divp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VALUE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;modp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FIXNUM_MIN&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;divp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;divp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LONG2NUM&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FIXNUM_MIN&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;modp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;modp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LONG2FIX&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Since &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-FIXNUM_MIN&lt;/code&gt; (note the negative) does not fit in a fixnum, it gets
promoted to a bignum. It’s one of two special cases in fixnum division that
does not produce a fixnum, the other being dividing by zero (which produces an
error).&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ irb
irb(main):001&amp;gt; LONG_MAX = 2**63 - 1
=&amp;gt; 9223372036854775807
irb(main):002&amp;gt; FIXNUM_MAX = LONG_MAX / 2
=&amp;gt; 4611686018427387903
irb(main):003&amp;gt; LONG_MIN = -LONG_MAX - 1
=&amp;gt; -9223372036854775808
irb(main):004&amp;gt; FIXNUM_MIN = LONG_MIN / 2
=&amp;gt; -4611686018427387904
irb(main):005&amp;gt; (-FIXNUM_MIN) &amp;lt; FIXNUM_MAX
=&amp;gt; false
irb(main):006&amp;gt;
$
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is due to the numbers being &lt;a href=&quot;https://en.wikipedia.org/wiki/Two%27s_complement&quot;&gt;two’s
complement&lt;/a&gt; and therefore
having more negative numbers than positive numbers (because of zero).&lt;/p&gt;

&lt;p&gt;dak2’s proposed patch included a branchless test for this &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;left == FIXNUM_MIN&lt;/code&gt; and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;right == -1&lt;/code&gt; case, making us leave JIT code and enter the interpreter rather
than handle it inline. The patch encodes this branchless test as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;xor&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;xor&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;or&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;test&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;je&lt;/code&gt; in our platform-independent low-level IR (LIR):&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;// Side exit on FIXNUM_MIN / -1, which overflows to a Bignum, not a Fixnum.&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Branchless (left == FIXNUM_MIN &amp;amp;&amp;amp; right == -1): (left ^ MIN) | (right ^ -1) == 0.&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;left_diff&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asm&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.xor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;left&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;Opnd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;from&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;VALUE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;fixnum_from_isize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RUBY_FIXNUM_MIN&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)));&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;right_diff&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asm&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.xor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;right&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;Opnd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;from&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;VALUE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;fixnum_from_isize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)));&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;combined&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asm&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.or&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;left_diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;right_diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;asm&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;combined&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;combined&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;asm&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.je&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;side_exit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FixnumDivOverflow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I didn’t understand why those were equivalent. Rather than try and bang my head
against it, I thought I’d let Z3 try. After all, I’ve been watching &lt;a href=&quot;https://pypy.org/posts/2024/08/toy-knownbits.html&quot;&gt;CF have
fun with it&lt;/a&gt; &lt;a href=&quot;https://pypy.org/posts/2024/07/finding-simple-rewrite-rules-jit-z3.html&quot;&gt;for a couple
years now&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Z3 is an SMT solver. The core trick to use Z3 as a “proof engine” that I
learned from CF&lt;sup id=&quot;fnref:standard&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:standard&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; is to make Z3 search for counter-examples by negating the
condition:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;#!/usr/bin/env python3
# /// script
# requires-python = &quot;&amp;gt;=3.13&quot;
# dependencies = [
#     &quot;z3-solver&quot;,
# ]
# ///
&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;z3&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;WORD_SIZE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;LONG_MAX&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;z3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BitVecVal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;63&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WORD_SIZE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;LONG_MIN&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;LONG_MAX&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;FIXNUM_MIN&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LONG_MIN&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;prove&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cond&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;solver&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;z3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Solver&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;solver&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;check&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;z3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Not&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cond&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;z3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unsat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;solver&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;left&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;z3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BitVec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;left&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WORD_SIZE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;right&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;z3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BitVec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;right&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WORD_SIZE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# The original C condition
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lhs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;z3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;And&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;left&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FIXNUM_MIN&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;right&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# The new branchless LIR
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rhs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;left&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;^&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FIXNUM_MIN&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;right&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;^&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prove&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lhs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rhs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This negation is required because the core model of Z3 is a machine that finds
example values. This means that there is an implicit “exists” in front of your
condition. To disprove this, Z3 needs to search all inputs. However, if you
negate the condition, it becomes a “for all”. This means that in order to
disprove the “for all”, Z3 only needs to find a single counterexample.&lt;/p&gt;

&lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;uv run prove_fixnum_min.py
&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Z3 did not complain, so the new code must be fine. Just as a quick check, in
case &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;assert&lt;/code&gt; is turned off or something, I like to see tests fail. So after
modifying one of the constants from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-1&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;uv run prove_fixnum_min.py
&lt;span class=&quot;go&quot;&gt;Traceback (most recent call last):
&lt;/span&gt;&lt;span class=&quot;gp&quot;&gt;  File &quot;/path/prove_fixnum_min.py&quot;, line 26, in &amp;lt;module&amp;gt;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;go&quot;&gt;    prove(lhs == rhs)
    ~~~~~^^^^^^^^^^^^
  File &quot;/path/prove_fixnum_min.py&quot;, line 18, in prove
    assert solver.check(z3.Not(cond)) == z3.unsat, solver.model()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: [right = 18446744073709551615, left = 13835058055282163712]
&lt;/span&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Neat.&lt;/p&gt;

&lt;h3 id=&quot;thanks&quot;&gt;Thanks&lt;/h3&gt;

&lt;p&gt;Thanks to &lt;a href=&quot;https://cfbolz.de/&quot;&gt;CF Bolz-Tereick&lt;/a&gt; for reading and giving feedback
on this post.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:standard&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;They note that this is a “standard technique” and they did not
come up with it. &lt;a href=&quot;#fnref:standard&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
            <pubDate>Mon, 01 Jun 2026 00:00:00 +0000</pubDate>
            <niceDate>June 1, 2026</niceDate>
            <link>https://bernsteinbear.com/blog/asm-z3/?utm_source=rss</link>
            <guid isPermaLink="true">https://bernsteinbear.com/blog/asm-z3/</guid>
        </item>
        
        <item>
            <title>Travel notes: RubyKaigi Hakodate</title>
            <description>&lt;p&gt;I just got back from a three and a half week trip to Japan. It was the longest
trip I have ever been on (aside from studying abroad in Germany, which felt
different). I made the following wild circuit with only a backpack and a
duffel:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Tokyo&lt;/li&gt;
  &lt;li&gt;Toyama&lt;/li&gt;
  &lt;li&gt;Kanazawa&lt;/li&gt;
  &lt;li&gt;Nara ish&lt;/li&gt;
  &lt;li&gt;Ito&lt;/li&gt;
  &lt;li&gt;Hakodate&lt;/li&gt;
  &lt;li&gt;Nikko&lt;/li&gt;
  &lt;li&gt;Mashiko&lt;/li&gt;
  &lt;li&gt;Karuizawa&lt;/li&gt;
  &lt;li&gt;Tokyo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This trip was split into three parts: time with my immediate family, going to a
conference, and then time with my partner. They were all great and also I am
glad to be home.&lt;/p&gt;

&lt;p&gt;I’ll post my abbreviated travel notes here, including activity and food
recommendations.&lt;/p&gt;

&lt;h2 id=&quot;part-one&quot;&gt;Part one&lt;/h2&gt;

&lt;p&gt;We started in Tokyo but we were only there for about 40 hours. We focused our
time mostly on arts and crafts: we did a kintsugi workshop, spent time at an
artists cooperative, and then did a lot of walking around. This was a good
intro to the trip, because everyone kept waking up at 4am and crashing at 7pm
due to the jet lag. 4am wakeup makes for nice morning walks to 7-Eleven.&lt;/p&gt;

&lt;p&gt;I brought my family to &lt;a href=&quot;https://plus.codes/MQH8+WQ&quot;&gt;T’s Tantan&lt;/a&gt; in Tokyo
Station because I’m vegetarian and it’s otherwise hard to find ramen that
approaches kosher in Japan. It continues to be great and I really appreciate
having a steady vegetarian option available. Many years ago when I visited
Tokyo there was a place that served a delicious tomato-based vegetarian ramen,
but I hear it has since permanently closed. Bummer.&lt;/p&gt;

&lt;p&gt;We took the shinkansen to Kanazawa. I love the train. It’s fast. It’s quiet. You
can eat your snacks on board and gaze out the window as the world whizzes by.
It’s nice.&lt;/p&gt;

&lt;p&gt;We toured a soy sauce factory (meh; they don’t let you in the room where the
magic happens) and the old town (pretty!) before finally eventually ending up
at our small hotel in Toyama: Satoyama Auberge Maki No Oto. I highly recommend
this hotel. It is beautiful, the staff is lovely, the food was excellent, and
they were very accomodating of me being vegetarian.&lt;/p&gt;

&lt;p&gt;We continued on to Toyama, which is a port town. We got to talking with an
older local guy who told us all about his favorite local spots. We learned
after leaving that this guy has extraordinarily fancy taste and they were all
either Michelin starred or at least Michelin rated and with a lead time of
months. We opted to instead go to a local brewery, which had a ghost pepper
beer (!) and pizza.&lt;/p&gt;

&lt;p&gt;We then moved on via train to Osaka, where we transferred to a car to head
(eventually) to our hotel in the hills near Nara. We toured the Daimon sake
brewery. They explained every little thing about the process, which was
especially interesting to me, as I’ve done some small amount of homebrewing and
I bake. They sounded similar. We had a tasting and even got to talk to
Daimon-san. I recommend going.&lt;/p&gt;

&lt;p&gt;I also recommend the Akame 48 waterfalls walk/hike, which has some exquisite
falls, and Murou Art Forest. They had some really wonderful installations.&lt;/p&gt;

&lt;p&gt;My brother and I parted ways from the rest of my family in Osaka: they headed
further west and we headed north to Itō on the Izu peninsula. We got a surprise
perfectly clear view of Fuji along the way.&lt;/p&gt;

&lt;p&gt;It’s beautiful there. They don’t seem to welcome foreigners in a lot of their
restaurants (we were turned away several times) but one place had a guy who
enthusiastically welcomed us in. We ended that evening enjoying a some food and
a beer while also being stared at by a 300lb completely tattooed guy. It was a
little unsettling but we left without incident.&lt;/p&gt;

&lt;p&gt;My brother and I made our way to Tokyo for the day before his flight and before
my train north to Hakodate for RubyKaigi. I once again did that thing where I
walked around in humid 80F heat with a large backpack and pants and was
extraordinarily warm toward the end of the day. After about a liter of Aquarius
on the train north I felt better.&lt;/p&gt;

&lt;h2 id=&quot;part-two&quot;&gt;Part two&lt;/h2&gt;

&lt;p&gt;I stayed at &lt;a href=&quot;https://maps.app.goo.gl/fQaP6XfCQMUGA7Zx8&quot;&gt;Yunokawa Prince Hotel
Nagisatei&lt;/a&gt; which I would like to
especially call out for having an enormous, diverse, and very vegetarian
friendly breakfast. Every morning I got to try new and tasty things and even
feel full after. It was great.&lt;/p&gt;

&lt;p&gt;Hakodate is &lt;em&gt;beautiful&lt;/em&gt; in the spring. I arrived at peak cherry blossom season
and Goryokaku, their star shaped fort, is absolutely decked out in cherry
blossoms. It is also moderately swarmed by tourists (in this case, three cruise
ships). It didn’t feel over-crowded though. I enjoyed eating at &lt;em&gt;The Bear King&lt;/em&gt;
which had a vegetarian friendly option.&lt;/p&gt;

&lt;p&gt;The next day was the committer meeting. I don’t remember a ton from it other
than people talking at length about the semantics of deep freezing an object
(do you freeze its class? its class’s superclass? …?). I picked up my badge
and also got to check out my colleague Chris Salzberg’s bar
&lt;a href=&quot;https://maps.app.goo.gl/AE48wZopJB16VBta9&quot;&gt;SOLENOID&lt;/a&gt;! It’s a neat spot. I
headed out to go find some dinner.&lt;/p&gt;

&lt;p&gt;This is about when I got a message on my phone that there was going to be an
earthquake, so I walked back into the bar and said “hey, did you get this?”
just before everything started shaking. It was the biggest earthquake I’ve
experienced, but I was metaphorically not too shaken up. Then we got the
tsunami warning.&lt;/p&gt;

&lt;p&gt;Chris’s bar is already something like 8 meters above sea level and at the foot
of Mt Hakodate. With the city sirens going off and the police directing traffic
with batons, though, I decided my best bet was just to march directly up the
mountain to get more elevation. Since the tsunami wasn’t scheduled to arrive
for about 20 or 30 minutes and my hotel was across the sea-level part of town,
I parked myself on a little concrete post. Chris found me eventually.
Someone told us that there was a middle school offering refuge, so we went and
hung out on the side of the gymnasium. They were really nice about it.&lt;/p&gt;

&lt;p&gt;On Wednesday, the conference started. It was really well signed and organized.
My usual complaint with conferences is that there’s nothing to eat for
vegetarians (or that we get mashed with the gluten-free people and each group
only gets a salad and bad bread) but that did not happen! They had really
stellar vegetarian bento. They had a lot of leftovers toward the end of lunch
so I even went and got a second. This was about when I started freaking out
because my speaking slot was approaching and I wasn’t yet feeling my talk.&lt;/p&gt;

&lt;p&gt;Normally when I give a talk, I get up in front of people and I pace and
gesticulate and productively complain and throw in some fun anecdotes and the
audience, one way or another, ends up learning about JITs at scale, or Scheme
semantics, or something. It’s what I’d done for my little lunch talk at Brown
two weeks prior. I even titled that talk &lt;em&gt;One must imagine compiler engineers
happy&lt;/em&gt; so there was plenty of room for educational complaining. But this
RubyKaigi talk was in front of an enormous crowd and toward a more general
audience than I was used to addressing. The slides did not feel like they were
flowing until about twenty minutes before my talk.&lt;/p&gt;

&lt;p&gt;In the end it went alright. I realized about 40 seconds in that I had way
too much content so I ended up speaking rapidly for 30 minutes straight,
completely unaware of the audience (which you can’t see anyway because of the
lights). I only really noticed people when I made a dumb six-seven joke and
Aaron laughed.&lt;/p&gt;

&lt;p&gt;The rest of the conference I was able to relax and enjoy other people’s talks.
I got some good hallway track in, too. I think there’s a good group of people
who are interested in Ruby tracing (for example, &lt;a href=&quot;https://railsatscale.com/2026-03-27-using-perfetto-in-zjit/&quot;&gt;Perfetto in
ZJIT&lt;/a&gt;) so maybe
we will make something happen.&lt;/p&gt;

&lt;p&gt;We had a nice small dinner at &lt;em&gt;Yasai Bar Miruya&lt;/em&gt;, which was vegan (!) and had
some nice sake. The host was very friendly, too.&lt;/p&gt;

&lt;p&gt;I nerd-sniped John and J into implementing a VM for the &lt;a href=&quot;https://www.boundvariable.org/task.shtml&quot;&gt;Universal
Machine&lt;/a&gt;. This was a daunting
homework assignment back in undergrad but it was a fun project later in life.&lt;/p&gt;

&lt;p&gt;S joined toward the end of the conference. She’s also vegetarian so we got some
really excellent vegetarian ramen at &lt;a href=&quot;https://maps.app.goo.gl/JNqiB6aH9sStnZ5UA&quot;&gt;MAIDO
Ramen&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Finally, S and I headed south on the shinkansen for Nikko.&lt;/p&gt;

&lt;h2 id=&quot;part-three&quot;&gt;Part three&lt;/h2&gt;

&lt;p&gt;Nikko is small, beautiful, and a tourist day-trip town. Dinner closes early.
Shops close earlier. Since we were staying there we had to make sure to track
down and visit the one or two vegetarian places before they shuttered.&lt;/p&gt;

&lt;p&gt;S and I, along with J and J, took the bus up from Nikko, up the windiest
switchbacks, to the Kegon Falls. We were going to take a boat across the lake,
but the water level was too low for the dock on the other side, so we ended up
half hiking and half taking a bus. Then we continued our hike through the
Senjōgahara Marshland (beautiful), to the Yudaki Cascades (lovely), which also
had a surprise restaurant and ice cream shop at the base! It’s called &lt;a href=&quot;https://maps.app.goo.gl/G5tNK46WamixEUJp8&quot;&gt;Yutaki
Rest House&lt;/a&gt;. After some great
(vegetarian friendly!! wow!!) udon, we marched up the waterfall and around Yuno
Lake at the top to Yumoto Onsen. In order to make the last reasonable bus back
to town, we just enjoyed putting our feet in the foot bath.&lt;/p&gt;

&lt;p&gt;One day was rainy. In the evening, J and I thought it would be fun to continue
our Universal Machine implementations. As Norman Ramsey would say, “my
implementation is 90 lines long and runs sandmark in under six seconds.”&lt;/p&gt;

&lt;p&gt;We also enjoyed doing a tour of the shrines right above Nikko. The shrines are
resplendent against the backdrop of forest.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Pro bus tip: you can either pay by IC card or credit card. No need to grab a
ticket if you do that.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;S and I shipped our bags (thanks, Yamato) before continuing on to the small
town of Moka, the staging area for our big pottery festival day. Unfortunately,
there was no good way to get there: there was no reasonable series of trains
and no taxi would take us. Ultimately we ended up taking the train to
Utsonomiya and catching the long local bus to Moka. About twenty minutes into
this ride, in the middle of nowhere, bus nearly empty, the bus driver pulled
over and ran over to us looking kind of panicked. He asked where we were going
and was visibly relieved when we said Moka. I suppose we are not the usual
riders. Very nice of him.&lt;/p&gt;

&lt;p&gt;Upon arrival, S introduced me to CoCo ICHIBANYA, which is also super vegetarian
friendly. I loved it. We ate really well before walking to our tiny hotel.&lt;/p&gt;

&lt;p&gt;We did not really know what to expect from the Mashiko pottery festival. The
internet said it would be crowded and to arrive early, so we got up at 6:30am
for estimated 7am departure on the tiny train from Moka to Mashiko. On most
trains you can pay with an IC card but we were out in the sticks so we asked
the only other guy on the platform how to pay for the train. He said he had no
idea and that this was his first time here. When the train showed up completely
packed to the gills and we had to (politely) push onto it, we started to
realize that this was The Event and it was going to be mayhem.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Also, fun fact: the way the Moka train payment works is that you grab a
little ticket from the train, and, upon arrival, wait in line to present your
ticket to two very overwhelmed looking people at a table, who charge you, and
you pay in cash.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Onto Mashiko: the festival was &lt;em&gt;packed&lt;/em&gt;. There’s pottery everywhere the eye can
see. There are tents and there are full buildings. It varies in quality and
artistry from fine to jaw-droppingly spectacular. You could completely stock
your kitchen from this fair alone and it would even be cost-effective. The main
bummer for us is that we had to get pottery safely back home. We limited
ourselves to a reasonable assortment but we really wanted to buy a beautiful
painted 20 inch plate with a bird on a branch.&lt;/p&gt;

&lt;p&gt;After a ton of walking around, we took another long long bus back to Utsonomiya
and continued onto Karuizawa. We didn’t know what to expect from Karuizawa but,
having been, I could probably concisely describe it as “Aspen for people from
Tokyo”. It was… fine. We loved our hotel, Tsuruya Ryokan. The manager was
very excited when we borrowed a Studio Ghibli DVD from their collection.&lt;/p&gt;

&lt;p&gt;We continued on to Tokyo, our final stop. We our usual tour of stationery
stores and bakeries—the bread was something to write home about (har har). We
enjoyed a (vegetarian!! friendly!!) kaiseki meal at &lt;a href=&quot;https://maps.app.goo.gl/NCwQk2ReXdgKndRJ7&quot;&gt;Hyoki Shabu-shabu
Ginza&lt;/a&gt; before enjoying some live
music at &lt;a href=&quot;https://maps.app.goo.gl/Cw3vvo2vpYFbWpAHA&quot;&gt;Rocky Top&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We also recommend &lt;a href=&quot;https://maps.app.goo.gl/bNNgFjMXytKacC5U9&quot;&gt;Jikasei MENSHO&lt;/a&gt;
for vegetarian ramen.&lt;/p&gt;

&lt;p&gt;Bakery checklist:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;BOUL’ANGE NIHONBASHI (check! good croissants)&lt;/li&gt;
  &lt;li&gt;Bricolage bread &amp;amp; co (check! good everything)&lt;/li&gt;
  &lt;li&gt;Brasserie Viron Marunouchi&lt;/li&gt;
  &lt;li&gt;Beaver Bread&lt;/li&gt;
  &lt;li&gt;Bricolage bread &amp;amp; co.&lt;/li&gt;
  &lt;li&gt;Bartizan Bread Factory&lt;/li&gt;
  &lt;li&gt;Gontran Cherrier Tokyo Aoyama Shop&lt;/li&gt;
  &lt;li&gt;Comme’N Tokyo&lt;/li&gt;
  &lt;li&gt;Shiomi Bakery&lt;/li&gt;
  &lt;li&gt;BRØD&lt;/li&gt;
  &lt;li&gt;The Little BAKERY
&amp;lt;!–&lt;/li&gt;
  &lt;li&gt;https://www.jocjapantravel.com/kanto-tokyo-bakeries/
–&amp;gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We had an uneventful and reasonably easy trip home. Whew. Long post for a long
trip. See you next year in Miyazaki!&lt;/p&gt;
</description>
            <pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate>
            <niceDate>May 18, 2026</niceDate>
            <link>https://bernsteinbear.com/blog/travel-notes-rubykaigi-hakodate/?utm_source=rss</link>
            <guid isPermaLink="true">https://bernsteinbear.com/blog/travel-notes-rubykaigi-hakodate/</guid>
        </item>
        
        <item>
            <title>Partial static single information form</title>
            <description>&lt;p&gt;In compilers, static single information form (SSI) is a common extension to
static single assignment form (SSA). It was introduced by C. Scott Ananian in
1999 in his &lt;a href=&quot;/assets/img/ananian-thesis.pdf&quot;&gt;MS thesis&lt;/a&gt; (PDF) &lt;sup id=&quot;fnref:et-al&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:et-al&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;SSI extends your existing SSA intermediate representation by discovering facts
from your existing program and reifying them as path-dependent/flow-sensitive
IR nodes. That might sound complicated, but at least the basic idea is pretty
natural. I talk a little bit about it in &lt;a href=&quot;/blog/irs/&quot;&gt;What I talk about when I talk about
IRs&lt;/a&gt; and I’ll rehash here in more depth, starting with some
motivating examples. Consider this admittedly contrived example:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;v0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# ...
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;v1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PositiveInteger&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AbsoluteValue&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v0&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# ...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We should be able to learn from the comparison that in some branches in the IR,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;v0&lt;/code&gt; is positive. In that region, we can add a new IR instruction &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;v2&lt;/code&gt; that
attaches that knowledge right in the instruction’s type field (yay,
sparseness!) and then rewrite uses of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;v0&lt;/code&gt; to now use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;v2&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;v0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;v2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PositiveInteger&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RefineType&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Positive&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# ...
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;v1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PositiveInteger&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AbsoluteValue&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v2&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# ...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Because we’ve done that, our (imaginary) optimization rule that gets rid of
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AbsoluteValue&lt;/code&gt; on known-positive integers can kick in, and we can delete the
invocation of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AbsoluteValue&lt;/code&gt;. Yay, optimization!&lt;/p&gt;

&lt;p&gt;But a couple of questions remain, at least for me:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Where/when in the compiler pipeline do we insert and remove these type
refinements?&lt;/li&gt;
  &lt;li&gt;Do we need to refine after &lt;em&gt;every&lt;/em&gt; conditional?&lt;/li&gt;
  &lt;li&gt;Do we need to implement the whole into-SSI and out-of-SSI algorithms from
all the complicated-looking papers?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We’ll go through them, starting with the compiler pipeline.&lt;/p&gt;

&lt;h2 id=&quot;when-do-we-insert-type-refinements&quot;&gt;When do we insert type refinements?&lt;/h2&gt;

&lt;p&gt;The original SSI paper starts with (I think?) SSA form and places some number
of new refinement nodes based on conditionals. I have admittedly not tried very
hard, but the into-SSI algorithms look complicated and kind of heavyweight. As
a reward, you get “linear” into-SSI time complexity.&lt;/p&gt;

&lt;p&gt;But I am a humble compiler engineer, and I don’t have the time to go through
and load all of this into my head. Instead what I have seen done and have been
doing is to take a shortcut: build &lt;em&gt;partial SSI&lt;/em&gt; during SSA
construction&lt;sup id=&quot;fnref:llvm-partial-ssi&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:llvm-partial-ssi&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Most of the time this is from bytecode, but it could also be from some other
non-SSA IR. In any case, this is an excellent shortcut for two reasons:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;It lets me cleanly separate adding the type refinements (pretty
straightforward) from the hard part of doing all of the operand rewriting
and phi placement and marking and all manner of other nonsense.&lt;/li&gt;
  &lt;li&gt;In addition to separating the concerns, the hard part is &lt;em&gt;already done&lt;/em&gt; by
SSA construction. We can actually just skip it! SSA construction handles phi
placement, operand rewriting, all of it. It probably fits neatly into a
naive or a &lt;a href=&quot;/assets/img/braun13cc.pdf&quot;&gt;Braun-style&lt;/a&gt; (PDF) construction.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is pretty compelling. We can learn from the bytecode with a very small
amount of marginal new complexity. See &lt;a href=&quot;https://github.com/ruby/ruby/pull/15915/changes#diff-a3cbeb79bf318b2aa8cc979260ba03b0204b436f745dd199a0e0c8ea5c871058&quot;&gt;my implementation in
ZJIT&lt;/a&gt;, for example. All it really does is modify the abstract
interpreter state when building SSA out of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;branchnil&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;branchif&lt;/code&gt;, and
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;branchunless&lt;/code&gt; bytecode instructions to take into account the new refined
values.&lt;/p&gt;

&lt;p&gt;This is fine for branches that are already in the user’s source program but
sometimes optimization, especially of dynamic languages, adds new branches that
were not there before. And sometimes these branches get added much later, long
after SSA construction. What then? Can we do something similar and rely on
existing infrastructure?&lt;/p&gt;

&lt;h3 id=&quot;during-ssa-optimization&quot;&gt;During SSA optimization&lt;/h3&gt;

&lt;p&gt;Implicit in this “can we do it” is the assumption that your IR tracks data
dependencies from use to corresponding def, but &lt;em&gt;not&lt;/em&gt; from def to uses. Sea of
Nodes (at least the &lt;a href=&quot;https://github.com/SeaOfNodes/Simple&quot;&gt;Simple&lt;/a&gt;
implementation), is an IR that tracks both directions all the time for easier
rewriting. Many IRs do not do this, so we will continue assuming that there’s
no “easy way out”.&lt;/p&gt;

&lt;p&gt;JIT optimization of dynamic language compilers often adds synthetic &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Guard&lt;/code&gt;
instructions to the IR that enforce pre-conditions. These guards allow
optimizing happy/fast path cases in JIT code while leaving the interpreter as a
fallback. For example, we might be able to optimize two back-to-back
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;setinstancevariable&lt;/code&gt; instructions (a very dynamic operation in the world of
ideas, but fast when concretely implemented using object shapes) from:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;x = ...
setinstancevariable x, :@a, 1
setinstancevariable x, :@b, 2
# ... use x somewhere ...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;which is very generic and involves calling into C code that might raise an
exception, to something more like:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;x = ...
v0 = GuardHeapObject x
v1 = GuardShape v0, 0xcafe
v2 = Const 1
StoreField v1, 0x8, v2
v3 = GuardHeapObject x
v4 = GuardShape v3, 0xcafe
v5 = Const 2
StoreField v4, 0x10, v5
# ... use x somewhere ...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;which is &lt;em&gt;much faster&lt;/em&gt; (assuming shape stability at run-time). There’s an
irritating problem, though, which is that we have a bunch of duplicate
instructions littered around the IR now because our optimizer worked on each
instruction individually. Kind of a “template optimizer” situation. Now we need
some pass to clean up the detritus.&lt;/p&gt;

&lt;p&gt;Global value numbering (GVN) will do a good job of de-duplicating instructions.
It should notice that we already have an instruction that looks like
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GuardHeapObject x&lt;/code&gt; called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;v0&lt;/code&gt; and rewrite &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;v3&lt;/code&gt; into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;v3 = v0&lt;/code&gt;. That’s great
because we have de-duplicated the guard. GVN may not get everything, though; if
some instructions later use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt;, they will not get rewritten to instead use the
output of these new guard instructions. To do that, we need to add some kind of
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;canonicalize&lt;/code&gt; pass or augment GVN with some canonicalization feature. That
canonicalization would handle rewriting operands to use the “latest version” of
some value, so to speak. See the canonicalization section of Chris Fallin’s
&lt;a href=&quot;https://cfallin.org/blog/2026/04/09/aegraph/&quot;&gt;excellent aegraphs blog post&lt;/a&gt;
for more (and of course the (currently block-local) &lt;a href=&quot;https://github.com/ruby/ruby/commit/ece14b61f505eea1ebefb3b8295df0fcf4d22567&quot;&gt;implementation in
ZJIT&lt;/a&gt;).&lt;/p&gt;

&lt;div class=&quot;language-ruby highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;canonicalize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;rewrite_map&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{}&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;map_operands!&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;rewrite_map&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;opcode&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:guardtype&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;rewrite_map&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;operands&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Where I’m going with all of this, though, is that you may already have some
dominance-based instruction rewriting mechanism in your compiler, either as
part of GVN or separately! And you can use this to do a very low code
into-partial-SSI in the middle of your optimizer.&lt;/p&gt;

&lt;p&gt;This means you could very well get away with inserting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RefineType&lt;/code&gt;
instructions in successor blocks of conditionals and get the into-SSI “for
free”.&lt;/p&gt;

&lt;!--
  * Why not &quot;just&quot; use union-find?
--&gt;

&lt;h2 id=&quot;after-which-conditionals-do-we-refine&quot;&gt;After which conditionals do we refine?&lt;/h2&gt;

&lt;p&gt;That’s up to you. There’s a trade-off between compile-time and run-time,
especially in JITs. Inserting more instructions and rewriting more times may
slow down your compiler. It’s a cheap lunch, not a free one.&lt;/p&gt;

&lt;h2 id=&quot;how-does-this-compare-to-the-complicated-looking-papers&quot;&gt;How does this compare to the complicated looking papers?&lt;/h2&gt;

&lt;p&gt;I don’t know. I don’t have a good grasp of how this “partial SSI” compares to
the “full SSI”. I don’t plan on implementing full SSI in the near future.&lt;/p&gt;

&lt;p&gt;I will note that this partial SSI approach doesn’t do two things:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;It doesn’t split variables with a new sigma node, and it generally inserts
the refine node within the target block rather than above the branch&lt;/li&gt;
  &lt;li&gt;(For &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;canonicalize&lt;/code&gt; only) It doesn’t insert new phi nodes; it just leaves
both IR nodes available and, instead of re-merging, drops them&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I can’t tell what impact this has.&lt;/p&gt;

&lt;h2 id=&quot;in-other-compilers&quot;&gt;In other compilers&lt;/h2&gt;

&lt;p&gt;Like Simple, &lt;a href=&quot;https://truffleruby.dev/&quot;&gt;TruffleRuby&lt;/a&gt; is built on a Sea of Nodes
IR (Graal). Chris Seaton has an &lt;a href=&quot;https://chrisseaton.com/truffleruby/stamping-out-overflow-checks/&quot;&gt;excellent blog
post&lt;/a&gt; about
TruffleRuby’s use of “stamp nodes” (“Pi nodes”&lt;sup id=&quot;fnref:pi-nodes&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:pi-nodes&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;). The
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;replaceAtUsagesAndDelete&lt;/code&gt; function does a lot of heavy lifting, I think
because Graal tracks uses.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/facebookincubator/cinderx&quot;&gt;Cinder&lt;/a&gt; mostly inserts
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RefineType&lt;/code&gt; instructions in the HIR builder, before into-SSA, and then lets
the SSA construction take care of things. That’s where I learned this trick,
actually. Here is &lt;a href=&quot;https://github.com/facebookincubator/cinderx/blob/38c0a17d71df4fddf39ca10d9fdf48d7bcafc1d9/cinderx/Jit/hir/builder.cpp#L4745&quot;&gt;one
example&lt;/a&gt;
of refining the type of the matched operand when building IR for pattern
matching.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/luau-lang/luau&quot;&gt;Luau&lt;/a&gt; is working on something like this,
but for their type checker. Chatting with someone on their team is actually
part of the reason I got motivated to write this post.&lt;/p&gt;

&lt;!-- LLVM PredicateInfo --&gt;
&lt;!-- HHVM AssertType --&gt;

&lt;p&gt;Android ART looks like it has
&lt;a href=&quot;https://github.com/LineageOS/android_art/blob/8ce603e0c68899bdfbc9cd4c50dcc65bbf777982/compiler/optimizing/nodes.h#L7759&quot;&gt;HBoundType&lt;/a&gt;
and inserts them &lt;a href=&quot;https://github.com/LineageOS/android_art/blob/8ce603e0c68899bdfbc9cd4c50dcc65bbf777982/compiler/optimizing/reference_type_propagation.cc#L194&quot;&gt;in reference type
propagation&lt;/a&gt;.
This handles class checks, null checks, and instanceof checks.&lt;/p&gt;

&lt;!-- Dart
https://github.com/dart-lang/sdk/blob/1c947dd88acd6e4b282e445b56e62eb631e87bd8/runtime/vm/compiler/backend/flow_graph.cc#L2012
RedefinitionInstr
--&gt;

&lt;h2 id=&quot;aside-logic-for-eg-heapobject-upgrade&quot;&gt;Aside: logic for e.g. HeapObject upgrade&lt;/h2&gt;

&lt;p&gt;Last, I want to talk a little bit about some interesting reasoning you can do
when you have two implementations of something that you can switch between. For
example, JIT (+ interpreter), or aliasing and non-aliasing cases in C code, or
the weirdo NULL-UB reasoning LLVM can do to C code, things like that.&lt;/p&gt;

&lt;p&gt;In ZJIT, we currently insert &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RefineType&lt;/code&gt;s opportunistically in “easy” cases
when building our HIR from the interpreter bytecode.&lt;/p&gt;

&lt;p&gt;For example, if in the bytecode there is a branch that compares some value &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt;
with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nil&lt;/code&gt;, it will have two outgoing control-flow edges: one block where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt;
is definitely &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nil&lt;/code&gt;, and one block where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; is definitely &lt;em&gt;not&lt;/em&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nil&lt;/code&gt;. In each
of these control-flow edges, we can insert corresponding type refinement hints.
That’s pretty standard. But we can also do weirder stuff.&lt;/p&gt;

&lt;p&gt;CRuby has a notion of heap objects vs immediate objects. Many (most?) objects
are heap objects. However, integer &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;5&lt;/code&gt;, for example is not allocated on the
heap but instead represented by a &lt;a href=&quot;/blog/small-objects/&quot;&gt;tagged bit pattern&lt;/a&gt;
that pretends to be an address: the whole value is encoded in the pointer
itself.&lt;/p&gt;

&lt;p&gt;We encode this knowledge in the HIR’s type system: “heapness” and
“immediateness” each get a bit in the &lt;a href=&quot;/blog/lattice-bitset/&quot;&gt;type lattice&lt;/a&gt;. We
use this in the optimizer to reason about &lt;a href=&quot;/blog/compiler-effects/&quot;&gt;effects&lt;/a&gt;,
among other things.&lt;/p&gt;

&lt;p&gt;We can’t know a lot of the time what type a thing is, so we pessimistically
type most objects flowing through bytecode as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BasicObject&lt;/code&gt;. This type
encapsulates the entire world of possible values that could go on the stack or
in a local variable.&lt;/p&gt;

&lt;p&gt;On most &lt;em&gt;heap&lt;/em&gt; objects, with only a few exceptions, you can write instance
variables (fields, attributes, whatever you want to call them). You can &lt;em&gt;never&lt;/em&gt;
write an instance variable to an immediate. This means that if we observe the
following pattern in the bytecode:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;x: BasicObject = ...
setinstancevariable x, :@abc, 1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then after building and emitting HIR for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;setinstancevariable&lt;/code&gt; opcode, we
can upgrade the type of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; from a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BasicObject&lt;/code&gt; to a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HeapBasicObject&lt;/code&gt;. We can
do this because if it &lt;em&gt;weren’t&lt;/em&gt; a heap-allocated object, we would have left the
compiled code and entered the interpreter.&lt;/p&gt;

&lt;p&gt;This is another SSI-type thing you can do in your compiler.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Uhh I guess the conclusion is that you don’t have to do full SSI and partial
SSI is available and not too scary? Does your compiler do this? Reader, please
write in.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:et-al&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;…and &lt;a href=&quot;/assets/img/singer-ssi.pdf&quot;&gt;optimized in 2002&lt;/a&gt; (PDF),
&lt;a href=&quot;/assets/img/ssi-revisited.pdf&quot;&gt;revisited in 2009&lt;/a&gt; (PDF), &lt;a href=&quot;/assets/img/efficient-ssi.pdf&quot;&gt;implemented in
LLVM in 2010&lt;/a&gt; (PDF), &lt;a href=&quot;/assets/img/ssi-abstract-compilation.pdf&quot;&gt;investigated in
2017 for abstract compilation&lt;/a&gt;
(PDF), and probably more. The 2009 paper by Boissinot, Brisk, Darte, and
Rastello even shows that both Ananian and Singer’s papers have bugs, while
perhaps unintentionally also making an &lt;em&gt;excellent&lt;/em&gt; pun about the literature
being “sparse”. &lt;a href=&quot;#fnref:et-al&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:llvm-partial-ssi&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;This blog post is different than the what the &lt;a href=&quot;/assets/img/efficient-ssi.pdf&quot;&gt;LLVM
paper&lt;/a&gt; (PDF) calls partial SSI. Partial for
different reasons. Maybe it’s not even single information anymore. &lt;a href=&quot;#fnref:llvm-partial-ssi&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:pi-nodes&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Today I learned that this terminology comes from the &lt;a href=&quot;/assets/img/abcd.pdf&quot;&gt;ABCD
paper&lt;/a&gt; (PDF). &lt;a href=&quot;#fnref:pi-nodes&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
            <pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate>
            <niceDate>May 12, 2026</niceDate>
            <link>https://bernsteinbear.com/blog/partial-ssi/?utm_source=rss</link>
            <guid isPermaLink="true">https://bernsteinbear.com/blog/partial-ssi/</guid>
        </item>
        
        <item>
            <title>Value numbering</title>
            <description>&lt;p&gt;Welcome back to compiler land. Today we’re going to talk about &lt;em&gt;value
numbering&lt;/em&gt;, which is like SSA, but more.&lt;/p&gt;

&lt;p&gt;Static single assignment (SSA) gives names to values: every expression has a
name, and each name corresponds to exactly one expression. It transforms
programs like this:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;where the variable &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; is assigned more than once in the program text, into
programs like this:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;v0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;v1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;v2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;where each assignment to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; has been replaced with an assignment to a new
fresh name.&lt;/p&gt;

&lt;p&gt;It’s great because it makes clear the differences between the two &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x + 1&lt;/code&gt;
expressions. Though they textually look similar, they compute different values.
The first computes 1 and the second computes 2. In this example, it is not
possible to substitute in a variable and re-use the value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x + 1&lt;/code&gt;, because
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt;s are different.&lt;/p&gt;

&lt;p&gt;But what if we see two “textually” identical instructions in SSA? That sounds
much more promising than non-SSA because the transformation into SSA form has
removed (much of) the statefulness of it all. When can we re-use the result?&lt;/p&gt;

&lt;p&gt;Identifying instructions that are known at compile-time to always produce the
same value at run-time is called &lt;em&gt;value numbering&lt;/em&gt;. &lt;!-- This is also called common
subexpression elimination (CSE), though for some reason the two mean slightly
different things to different groups of people. --&gt;&lt;/p&gt;

&lt;h2 id=&quot;eliminating-common-subexpressions&quot;&gt;Eliminating common subexpressions&lt;/h2&gt;

&lt;p&gt;To understand value numbering, let’s extend the above IR snippet with two more
instructions, v3 and v4.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;v0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;v1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;v2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;v3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# new
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;do_something&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# new
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In this new snippet, v3 looks the same as v1: adding v0 and 1. Assuming our
addition operation is some ideal mathematical addition, we can absolutely
re-use v1; no need to compute the addition again. We can rewrite the IR to
something like:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;v0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;v1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;v2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;v3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;v4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;do_something&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is kind of similar to the destructive union-find representation that
JavaScriptCore and a couple other compilers use, where the optimizer doesn’t
eagerly re-write all uses but instead leaves a little breadcrumb
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Identity&lt;/code&gt;/&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Assign&lt;/code&gt; instruction&lt;sup id=&quot;fnref:cinder&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:cinder&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;We could then run our copy propagation pass (“union-find cleanup”?) and get:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;v0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;v1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;v2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;v4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;do_something&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Great. But how does this happen? How does an optimizer identify reusable
instruction candidates that are “textually identical”? Generally, there is &lt;a href=&quot;https://pointersgonewild.com/2011/10/07/optimizing-global-value-numbering/&quot;&gt;no
actual text in the
IR&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;One popular solution is to compute a hash of each instruction. Then any
instructions with the same hash (that also compare equal, in case of
collisions) are considered equivalent. This is called &lt;em&gt;hash-consing&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;When trying to figure all this out, I read through a couple of different
implementations. I particularly like the &lt;a href=&quot;https://maxine-vm.readthedocs.io/en/stable/&quot;&gt;Maxine VM&lt;/a&gt; implementation.
For example, here is the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;valueNumber&lt;/code&gt; (hashing) and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;valueEqual&lt;/code&gt;
functions for most binary operations, slightly modified for clarity:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;abstract&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Instruction&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Value&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// The base class for binary operations&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;abstract&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Op2&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Instruction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// Each binary operation has an opcode and two opearands&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;opcode&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;// (IMUL, IADD, ...)&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;Value&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;Value&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;valueNumber&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// There are other fields but only opcode, and operands get hashed.&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// Always set at least one bit in case the hash wraps to zero.&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x20000000&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;opcode&lt;/span&gt;
           &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;System&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;identityHashCode&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
           &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;System&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;identityHashCode&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;valueEqual&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Instruction&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;instanceof&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Op2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;nc&quot;&gt;Op2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Op2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;opcode&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;opcode&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The rest of the value numbering implementation assumes that if a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;valueNumber&lt;/code&gt;
function returns 0, it does not wish to be considered for value
numbering. Why might an instruction opt-out of value numbering?&lt;/p&gt;

&lt;h2 id=&quot;pure-vs-impure&quot;&gt;Pure vs impure&lt;/h2&gt;

&lt;p&gt;An instruction might opt out of value numbering if it is not “pure”.&lt;/p&gt;

&lt;p&gt;Some instructions are not pure. Purity is in the eye of the beholder, but in
general it means that an instruction does not interact with the state of the
outside world, except for trivial computation on its operands. (What does it
mean to de-duplicate/cache/reuse &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;printf&lt;/code&gt;?)&lt;/p&gt;

&lt;p&gt;A load from an array object is also not a pure operation&lt;sup id=&quot;fnref:heap-ssa&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:heap-ssa&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. The load operation
implicitly relies on the state of the memory. Also, even if the array was
known-constant, in some runtime
systems, the load might raise an exception. Changing the source location where
an exception is raised is generally frowned upon. Languages such as Java often
have requirements about where exceptions are raised codified in their
specifications.&lt;/p&gt;

&lt;p&gt;We’ll work only on pure operations for now, but we’ll come back to this later.
We do often want to optimize impure operations as well!&lt;/p&gt;

&lt;p&gt;We’ll start off with the simplest form of value numbering, which operates only
on linear sequences of instructions, like basic blocks or traces.&lt;/p&gt;

&lt;h2 id=&quot;local-value-numbering&quot;&gt;Local value numbering&lt;/h2&gt;

&lt;p&gt;Let’s build a small implementation of local value numbering (LVN). We’ll start with
straight-line code—no branches or anything tricky.&lt;/p&gt;

&lt;p&gt;Most compiler optimizations on control-flow graphs (CFGs) iterate over the
instructions “top to bottom”&lt;sup id=&quot;fnref:order&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:order&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; and it seems like we can do the same thing
here too.&lt;/p&gt;

&lt;p&gt;From what we’ve seen so far optimizing our made-up IR snippet, we can do
something like this:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;initialize a map from instruction numbers to instruction pointers&lt;/li&gt;
  &lt;li&gt;for each instruction &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;i&lt;/code&gt;
    &lt;ul&gt;
      &lt;li&gt;if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;i&lt;/code&gt; wants to participate in value numbering
        &lt;ul&gt;
          &lt;li&gt;if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;i&lt;/code&gt;’s value number is already in the map, replace all pointers to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;i&lt;/code&gt;
in the rest of the program with the corresponding value from the map&lt;/li&gt;
          &lt;li&gt;otherwise, add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;i&lt;/code&gt; to the map&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The find-and-replace, remember, is not a literal find-and-replace, but instead
something like:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;instr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;opcode&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Assign&quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;instr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;operands&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;replacement&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;or&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;instr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;make_equal_to&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;replacement&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;(if you have been following along with the &lt;a href=&quot;https://pypy.org/categories/toy-optimizer.html&quot;&gt;toy optimizer&lt;/a&gt; series)&lt;/p&gt;

&lt;p&gt;This several-line function (as long as you already have a hash map and a
union-find available to you) is enough to build local value numbering! And real
compilers are built this way, too.&lt;/p&gt;

&lt;p&gt;If you don’t believe me, take a look at this slightly edited snippet from
&lt;a href=&quot;https://maxine-vm.readthedocs.io/en/stable/&quot;&gt;Maxine’s&lt;/a&gt; value numbering implementation. It has all of the components
we just talked about: iterating over instructions, map lookup, and some
substitution.&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;// Local value numbering&lt;/span&gt;
&lt;span class=&quot;nc&quot;&gt;BlockBegin&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;block&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...;&lt;/span&gt;
&lt;span class=&quot;nc&quot;&gt;ValueMap&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;currentMap&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ValueMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;nc&quot;&gt;InstructionSubstituter&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;subst&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;InstructionSubstituter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// visit all instructions of this block&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Instruction&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;instr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;block&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;instr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;instr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;instr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// attempt value numbering (uses valueNumber() and valueEqual())&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;//&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// return a previous instruction if it exists in the map, or insert the&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// current instruction into the map and return it&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;Instruction&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;currentMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;findInsert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;instr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;instr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// remember the replacement in the union-find&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;subst&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setSubst&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;instr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This alone will get you pretty far. Code generators of all shapes tend to leave
messy repeated computations all over their generated code and this will make
short work of them.&lt;/p&gt;

&lt;p&gt;Sometimes, though, your computations are spread across control flow—over
multiple basic blocks. What do you do then?&lt;/p&gt;

&lt;!--
## Equivalence classes
--&gt;

&lt;h2 id=&quot;global-value-numbering&quot;&gt;Global value numbering&lt;/h2&gt;

&lt;p&gt;Computing value numbers for an entire function is called &lt;em&gt;global value
numbering&lt;/em&gt; (GVN) and it requires dealing with control flow (if, loops, etc). I
don’t just mean that for an entire function, we run local value numbering
block-by-block. Global value numbering implies that expressions can be
de-duplicated and shared across blocks.&lt;/p&gt;

&lt;p&gt;Let’s tackle control flow case by case.&lt;/p&gt;

&lt;p&gt;First is the simple case from above: one block. In this case, we can go top to
bottom with our value numbering and do alright.&lt;/p&gt;

&lt;figure&gt;
  &lt;object class=&quot;svg&quot; type=&quot;image/svg+xml&quot; data=&quot;/assets/img/gvn-one-block.svg&quot;&gt;&lt;/object&gt;
&lt;/figure&gt;

&lt;p&gt;The second case is also reasonable to handle: one block flowing into another. In this
case, we can still go top to bottom. We just have to find a way to iterate over
the blocks.&lt;/p&gt;

&lt;p&gt;If we’re not going to share value maps between blocks, the order doesn’t
matter. But since the point of global value numbering is to share values, we
have to iterate them in topological order (reverse post order (RPO)). This
ensures that predecessors get visited before successors. If you have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bb0 -&amp;gt;
bb1&lt;/code&gt;, we have to visit first &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bb0&lt;/code&gt; and then &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bb1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Because of how SSA works and how CFGs work, the second block can “look up” into
the first block and use the values from it. To get global value numbering
working, we have to copy &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bb0&lt;/code&gt;’s value map before we start processing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bb1&lt;/code&gt; so
we can re-use the instructions.&lt;/p&gt;

&lt;figure&gt;
  &lt;object class=&quot;svg&quot; type=&quot;image/svg+xml&quot; data=&quot;/assets/img/gvn-two-blocks.svg&quot;&gt;&lt;/object&gt;
&lt;/figure&gt;

&lt;p&gt;Maybe something like:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;value_map&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ValueMap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;block&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reverse_post_order&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;local_value_numbering&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;block&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value_map&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then the expressions can accrue across blocks. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bb1&lt;/code&gt; can re-use the
already-computed &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Add v0, 1&lt;/code&gt; from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bb0&lt;/code&gt; because it is still in the map.&lt;/p&gt;

&lt;p&gt;…but this breaks as soon as you have control-flow splits. Consider the
following shape graph:&lt;/p&gt;

&lt;!--
digraph G {
  node [shape=square];
  A -&gt; B;
  A -&gt; C;
}
--&gt;
&lt;figure&gt;
  &lt;object class=&quot;svg&quot; type=&quot;image/svg+xml&quot; data=&quot;/assets/img/gvn-split.svg&quot;&gt;&lt;/object&gt;
&lt;/figure&gt;

&lt;p&gt;We’re going to iterate over that graph in one of two orders: A B C or A C B. In
either case, we’re going to be adding all this stuff into the value map from
one block (say, B) that is not actually available to its sibling block (say,
C).&lt;/p&gt;

&lt;p&gt;When I say “not available”, I mean “would not have been computed before”. This
is because we execute either A then B or A then C. There’s no world in which we
execute B then C.&lt;/p&gt;

&lt;p&gt;But alright, look at a third case where there is such a world: a control-flow
join. In this diagram, we have two predecessor blocks B and C each flowing into
D. In this diagram, B &lt;em&gt;always&lt;/em&gt; flows into D and also C &lt;em&gt;always&lt;/em&gt; flows into D.
So the iterator order is fine, right?&lt;/p&gt;

&lt;!--
digraph G {
  node [shape=square];
  A -&gt; B;
  A -&gt; C;
  B -&gt; D;
  C -&gt; D;
}
--&gt;
&lt;figure&gt;
  &lt;object class=&quot;svg&quot; type=&quot;image/svg+xml&quot; data=&quot;/assets/img/gvn-join.svg&quot;&gt;&lt;/object&gt;
&lt;/figure&gt;

&lt;p&gt;Well, still no. We have the same sibling problem as before. B and C still can’t
share value maps.&lt;/p&gt;

&lt;p&gt;We also have a weird question when we enter D: where did we come from? If we
came from B, we can re-use expressions from B. If we came from C, we can re-use
expressions from C. But we cannot in general know which predecessor block we
came from.&lt;/p&gt;

&lt;p&gt;The only block we know &lt;em&gt;for sure&lt;/em&gt; that we executed before D is A. This means we
can re-use A’s value map in D because we can guarantee that all execution paths
that enter D have previously gone through A.&lt;/p&gt;

&lt;p&gt;This relationship is called a &lt;em&gt;dominator&lt;/em&gt; relationship and this is the key to
one style of global value numbering that we’re going to talk about in this
post. A block can always use the value map from any other block that dominates
it. For completeness’ sake, in the diamond diagram, A dominates each of B and
C, too.&lt;/p&gt;

&lt;p&gt;We can compute dominators a couple of ways&lt;sup id=&quot;fnref:compute-doms&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:compute-doms&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, but that’s a little
bit out of scope for this blog post. If we assume that we have dominator
information available in our CFG, we can use that for global value numbering.
And that’s just what—you guessed it—Maxine VM does.&lt;/p&gt;

&lt;p&gt;It iterates over all blocks in reverse post-order, doing local value numbering,
threading through value maps from dominator blocks. In this case, their method
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dominator&lt;/code&gt; gets the &lt;em&gt;immediate dominator&lt;/em&gt;: the “closest” dominator block of
all the blocks that dominate the current one.&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;GlobalValueNumberer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;HashMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;BlockBegin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ValueMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;valueMaps&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;InstructionSubstituter&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;subst&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;ValueMap&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;currentMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;GlobalValueNumberer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;IR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ir&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subst&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;InstructionSubstituter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ir&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// reverse post-order&lt;/span&gt;
        &lt;span class=&quot;nc&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;BlockBegin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blocks&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ir&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;linearScanOrder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;valueMaps&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;HashMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;BlockBegin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ValueMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;blocks&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;optimize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;blocks&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;subst&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;finish&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;optimize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;BlockBegin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blocks&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;numBlocks&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blocks&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
        &lt;span class=&quot;nc&quot;&gt;BlockBegin&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;startBlock&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blocks&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

        &lt;span class=&quot;c1&quot;&gt;// initial value map, with nesting 0&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;valueMaps&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;put&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;startBlock&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ValueMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;numBlocks&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;c1&quot;&gt;// iterate through all the blocks&lt;/span&gt;
            &lt;span class=&quot;nc&quot;&gt;BlockBegin&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;block&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blocks&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;nc&quot;&gt;BlockBegin&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dominator&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;block&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;dominator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

            &lt;span class=&quot;c1&quot;&gt;// create new value map with increased nesting&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;currentMap&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ValueMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;valueMaps&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dominator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;

            &lt;span class=&quot;c1&quot;&gt;// &amp;lt;&amp;lt; INSERT LOCAL VALUE NUMBERING HERE &amp;gt;&amp;gt;&lt;/span&gt;

            &lt;span class=&quot;c1&quot;&gt;// remember value map for successors&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;valueMaps&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;put&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;block&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;currentMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And that’s it! That’s the core of Maxine’s &lt;a href=&quot;https://github.com/beehive-lab/Maxine-VM/blob/e213a842f78983e2ba112ae46de8c64317bc206e/com.sun.c1x/src/com/sun/c1x/opt/GlobalValueNumberer.java&quot;&gt;GVN implementation&lt;/a&gt;. I
love how short it is. For not very much code, you can remove a lot of duplicate
pure SSA instructions.&lt;/p&gt;

&lt;p&gt;This does still work with loops, but with some caveats. From p7 of &lt;a href=&quot;/assets/img/briggs-gvn.pdf&quot;&gt;Briggs GVN&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The φ-functions require special treatment. Before the compiler can analyze
the φ-functions in a block, it must previously have assigned value numbers to
all of the inputs. This is not possible in all cases; specifically, any
φ-function input whose value flows along a back edge (with respect to the
dominator tree) cannot have a value number. If any of the parameters of a
φ-function have not been assigned a value number, then the compiler cannot
analyze the φ-function, and it must assign a unique, new value number to the
result.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It also talks about eliminating useless phis, which is optional, but would
the strengthen global value numbering pass: it makes more information
transparent.&lt;/p&gt;

&lt;p&gt;But what if we want to handle impure instructions?&lt;/p&gt;

&lt;h2 id=&quot;state-management-and-invalidation&quot;&gt;State management and invalidation&lt;/h2&gt;

&lt;p&gt;Languages such as Java allow for reading fields from the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;this&lt;/code&gt;/&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;self&lt;/code&gt; object within
methods as if the field were a variable name. This makes code like the
following common:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;CPU&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;exec_adc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result_int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;regA&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fetched_data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;flagCARRY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;byte&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;byte&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result_int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result_int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;^&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;regA&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result_int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;^&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fetched_data&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;regA&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Each of these reference to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;regA&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fetched_data&lt;/code&gt; is an implicit reference
to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;this.regA&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;this.fetched_data&lt;/code&gt;, which is semantically a field load off
an object. You can see it in &lt;a href=&quot;https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:&apos;1&apos;,fontScale:14,fontUsePx:&apos;0&apos;,j:1,lang:java,selection:(endColumn:19,endLineNumber:14,positionColumn:19,positionLineNumber:14,selectionStartColumn:19,selectionStartLineNumber:14,startColumn:19,startLineNumber:14),source:&apos;class+CPU+%7B%0A++++private+void+exec_adc()+%7B%0A++++++++int+result_int+%3D+regA+%2B+fetched_data+%2B+flagCARRY%3B%0A++++++++byte+result+%3D+(byte)+result_int%3B%0A++++++++//+...%0A++++++++int+a+%3D+result_int+%5E+regA%3B%0A++++++++int+b+%3D+result_int+%5E+fetched_data%3B%0A++++++++//+...%0A++++++++regA+%3D+result%3B%0A++++%7D%0A%0A++++int+regA%3B%0A++++int+fetched_data%3B%0A++++int+flagCARRY%3B%0A%7D%0A&apos;),l:&apos;5&apos;,n:&apos;0&apos;,o:&apos;Java+source+%231&apos;,t:&apos;0&apos;)),k:50,l:&apos;4&apos;,n:&apos;0&apos;,o:&apos;&apos;,s:0,t:&apos;0&apos;),(g:!((h:compiler,i:(compiler:java2501,filters:(b:&apos;0&apos;,binary:&apos;1&apos;,binaryObject:&apos;1&apos;,commentOnly:&apos;0&apos;,debugCalls:&apos;1&apos;,demangle:&apos;0&apos;,directives:&apos;0&apos;,execute:&apos;1&apos;,intel:&apos;0&apos;,libraryCode:&apos;0&apos;,trim:&apos;1&apos;,verboseDemangling:&apos;0&apos;),flagsViewOpen:&apos;1&apos;,fontScale:14,fontUsePx:&apos;0&apos;,j:1,lang:java,libs:!(),options:&apos;&apos;,overrides:!(),selection:(endColumn:19,endLineNumber:40,positionColumn:1,positionLineNumber:1,selectionStartColumn:19,selectionStartLineNumber:40,startColumn:1,startLineNumber:1),source:1),l:&apos;5&apos;,n:&apos;0&apos;,o:&apos;+jdk+25.0.1+(Editor+%231)&apos;,t:&apos;0&apos;)),k:50,l:&apos;4&apos;,n:&apos;0&apos;,o:&apos;&apos;,s:0,t:&apos;0&apos;)),l:&apos;2&apos;,n:&apos;0&apos;,o:&apos;&apos;,t:&apos;0&apos;)),version:4&quot;&gt;the bytecode&lt;/a&gt; (thanks, Matt Godbolt):&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;class CPU {
  int regA;

  int fetched_data;

  int flagCARRY;

  CPU();
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object.&quot;&amp;lt;init&amp;gt;&quot;:()V
         4: return


  private void exec_adc();
         0: aload_0
         1: getfield      #7                  // Field regA:I
         4: aload_0
         // ...
        20: getfield      #7                  // Field regA:I
        23: ixor
        24: istore_3
        25: iload_1
        26: aload_0
        27: getfield      #13                 // Field fetched_data:I
        30: ixor
        31: istore        4
        33: aload_0
        34: iload_2
        35: putfield      #7                  // Field regA:I
        38: return
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When straightforwardly building an SSA IR from the JVM bytecode for this
method, you will end up with a bunch of IR that looks like this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;v0 = LoadField self, :regA
v1 = LoadField self, :fetched_data
v2 = LoadField self, :flagCARRY
v3 = IntAdd v0, v1
v4 = IntAdd v3, v2
// ...
v7 = LoadField self, :regA
v8 = IntXor v4, v7
v9 = LoadField self, :fetched_data
v10 = IntXor v4, v9
// ...
StoreField self, :regA, ...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Pretty much the same as the bytecode. Even though no code in the middle could
modify the field &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;regA&lt;/code&gt; (which would require a re-load), we still have a
duplicate load. Bummer.&lt;/p&gt;

&lt;p&gt;I don’t want to re-hash this too much but it’s possible to fold &lt;a href=&quot;/blog/toy-load-store/&quot;&gt;Load and store
forwarding&lt;/a&gt; into your GVN implementation by either:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;doing load-store forwarding as part of local value numbering and clearing
memory information from the value map at the end of each block, or&lt;/li&gt;
  &lt;li&gt;keeping track of effects across blocks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See, there’s nothing fundamentally stopping you from tracking the state of your
heap at compile-time across blocks. You just have to do a little more
bookkeeping. In our dominator-based GVN implementation, for example, you can:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;track heap write effects for each block&lt;/li&gt;
  &lt;li&gt;at the start of each block B, union all of the “kill” sets for every block
back to its immediate dominator&lt;/li&gt;
  &lt;li&gt;finally, remove the stuff that got killed from the dominator’s value map&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Not so bad.&lt;/p&gt;

&lt;p&gt;Maxine doesn’t do global memory tracking, but they do a limited form of
load-store forwarding while building their HIR from bytecode: see
&lt;a href=&quot;https://github.com/beehive-lab/Maxine-VM/blob/e213a842f78983e2ba112ae46de8c64317bc206e/com.sun.c1x/src/com/sun/c1x/graph/GraphBuilder.java#L871&quot;&gt;GraphBuilder&lt;/a&gt; which uses the &lt;a href=&quot;https://github.com/beehive-lab/Maxine-VM/blob/e213a842f78983e2ba112ae46de8c64317bc206e/com.sun.c1x/src/com/sun/c1x/graph/MemoryMap.java&quot;&gt;MemoryMap&lt;/a&gt; to help track this stuff. At least
they would not have the same duplicate &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LoadField&lt;/code&gt; instructions in the example
above!&lt;/p&gt;

&lt;!--
```ruby
module Psych
  module Visitors
    class YAMLTree &lt; Psych::Visitors::Visitor
      def initialize emitter, ss, options
        # ...
        @line_width = options[:line_width]
        if @line_width &amp;&amp; @line_width &lt; 0
          if @line_width == -1
            # Treat -1 as unlimited line-width, same as libyaml does.
            @line_width = nil
          else
            fail(...)
          end
        end
        # ...
    end
  end
end
```
--&gt;

&lt;p&gt;We’ve now looked at one kind of value numbering and one implementation of it.
What else is out there?&lt;/p&gt;

&lt;h2 id=&quot;out-in-the-world&quot;&gt;Out in the world&lt;/h2&gt;

&lt;p&gt;Apparently, you can get better results by having a unified hash table (p9 of
&lt;a href=&quot;/assets/img/briggs-gvn.pdf&quot;&gt;Briggs GVN&lt;/a&gt;) of expressions, not limiting the
value map to dominator-available expressions. Not 100% on how this works yet.
&lt;!-- TODO What do you do in the second pass for available expressions? --&gt;
They note:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Using a unified hash-table has one important algorithmic consequence.
Replacements cannot be performed on-line because the table no longer reflects
availability.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Which is the first time that it occurred to me that hash-based value numbering
with dominators was an approximation of available expression analysis.&lt;/p&gt;

&lt;p&gt;There’s also a totally different kind of value numbering called value
partitioning (p12 of &lt;a href=&quot;/assets/img/briggs-gvn.pdf&quot;&gt;Briggs GVN&lt;/a&gt;). See also a nice
blog post about this by Allen Wang from the &lt;a href=&quot;https://www.cs.cornell.edu/courses/cs6120/2025sp/blog/global-value-numbering/&quot;&gt;Cornell compiler
course&lt;/a&gt;.
I think this mostly replaces the hashing bit, and you still need some other
thing for the available expressions bit.&lt;/p&gt;

&lt;p&gt;Ben Titzer and Seth Goldstein have some good &lt;a href=&quot;https://www.cs.cmu.edu/~411/slides/s25-24-gvn-inlining.pdf&quot;&gt;slides from
CMU&lt;/a&gt;. Where they
talk about the worklist dataflow approach. Apparently this is slower but gets
you more available expressions than just looking to dominator blocks. I wonder
how much it differs from dominator+unified hash table.&lt;/p&gt;

&lt;p&gt;While Maxine uses hash table cloning to copy value maps from dominator blocks,
there are also compilers such as Cranelift that use
&lt;a href=&quot;https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/codegen/src/scoped_hash_map.rs&quot;&gt;scoped hash maps&lt;/a&gt;
to track this information more efficiently. (Though &lt;a href=&quot;https://github.com/bytecodealliance/wasmtime/issues/4371#issuecomment-1255956651&quot;&gt;Amanieu
notes&lt;/a&gt; that you may
not need a scoped hash map and instead can tag values in your value map with the
block they came from, ignoring non-dominating values with a quick check. The
dominance check makes sense but I haven’t internalized how this affects the set
of available expressions yet.)&lt;/p&gt;

&lt;p&gt;You may be wondering if this kind of algorithm even helps at all in a dynamic
language JIT context. Surely everything is too dynamic, right? Actually, no!
The JIT hopes to eliminate a lot of method calls and dynamic behaviors,
replacing them with guards, assumptions, and simpler operations. These strength
reductions often leave behind a lot of repeated instructions. Just the other
day, Kokubun filed a &lt;a href=&quot;https://github.com/ruby/ruby/pull/16654&quot;&gt;value-numbering-like
PR&lt;/a&gt; to clean up some of the waste.&lt;/p&gt;

&lt;p&gt;ART has a recent &lt;a href=&quot;https://android-developers.googleblog.com/2025/12/18-faster-compiles-0-compromises.html&quot;&gt;blog
post&lt;/a&gt;
about speeding up GVN.&lt;/p&gt;

&lt;h3 id=&quot;implementations&quot;&gt;Implementations&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/beehive-lab/Maxine-VM/blob/e213a842f78983e2ba112ae46de8c64317bc206e/com.sun.c1x/src/com/sun/c1x/opt/GlobalValueNumberer.java&quot;&gt;Maxine&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://android.googlesource.com/platform/art/+/refs/heads/main/compiler/optimizing/gvn.cc&quot;&gt;ART&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/tekknolagi/v8/blob/f030838700a83cde6992cb8ebcb3facc6a8fc1f1/src/crankshaft/hydrogen-gvn.cc&quot;&gt;V8 Hydrogen&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/facebook/hhvm/blob/1a885fae7421c759d70a8ed85aab1defcf5cc68f/hphp/runtime/vm/jit/gvn.cpp&quot;&gt;HHVM&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/openjdk/jdk/blob/f21e47db805b56d5bf183d7a2cfba076f380612a/src/hotspot/share/c1/c1_ValueMap.cpp#L517&quot;&gt;HotSpot C1&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;wrapping-up-bits-and-bobbles&quot;&gt;Wrapping up; bits and bobbles&lt;/h2&gt;

&lt;p&gt;Go forth and give your values more numbers.&lt;/p&gt;

&lt;p&gt;There’s been an ongoing discussion with Phil Zucker on SSI, GVN, acyclic
egraphs, and scoped union-find. TODO summarize&lt;/p&gt;

&lt;h3 id=&quot;acyclic-e-graphs&quot;&gt;Acyclic e-graphs&lt;/h3&gt;

&lt;p&gt;Commutativity; canonicalization&lt;/p&gt;

&lt;p&gt;Seeding alternative representations into the GVN&lt;/p&gt;

&lt;p&gt;Aegraphs and union-find during GVN &lt;a href=&quot;https://cfallin.org/blog/2026/04/09/aegraph/&quot;&gt;https://cfallin.org/blog/2026/04/09/aegraph/&lt;/a&gt;
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;canonicalize&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/bytecodealliance/rfcs/blob/main/accepted/cranelift-egraph.md&quot;&gt;https://github.com/bytecodealliance/rfcs/blob/main/accepted/cranelift-egraph.md&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/bytecodealliance/wasmtime/issues/9049&quot;&gt;https://github.com/bytecodealliance/wasmtime/issues/9049&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/bytecodealliance/wasmtime/issues/4371&quot;&gt;https://github.com/bytecodealliance/wasmtime/issues/4371&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;partial-redundancy-elimination&quot;&gt;Partial redundancy elimination&lt;/h3&gt;
&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:cinder&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Writing this post is roughly the time when I realized that the whole
time I was wondering why Cinder did not use union-find for rewriting, it
actually did! Optimizing instruction &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X = A + 0&lt;/code&gt; by replacing with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X =
Assign A&lt;/code&gt; followed by copy propagation is equivalent to union-find. &lt;a href=&quot;#fnref:cinder&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:heap-ssa&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In some forms of SSA, like heap-array SSA or sea of nodes, it’s
possible to more easily de-duplicate loads because the memory
representation has been folded into (modeled in) the IR. &lt;a href=&quot;#fnref:heap-ssa&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:order&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The order is a little more complicated than that: &lt;a href=&quot;https://stackoverflow.com/questions/36131500/what-is-the-reverse-postorder&quot;&gt;reverse
post-order&lt;/a&gt;
(RPO). And there’s a paper called “A Simple Algorithm for Global Data Flow
Analysis Problems” that I don’t yet have a PDF for that claims that RPO is
optimal for solving dataflow problems. &lt;a href=&quot;#fnref:order&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:compute-doms&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;There’s the iterative dataflow way (described in the &lt;a href=&quot;/assets/img/dominators-engineered.pdf&quot;&gt;Cooper
paper&lt;/a&gt; (PDF)),
&lt;a href=&quot;/assets/img/dominators-lengauer-tarjan.pdf&quot;&gt;Lengauer-Tarjan&lt;/a&gt; (PDF), the
&lt;a href=&quot;/assets/img/dominators-engineered.pdf&quot;&gt;Engineered Algorithm&lt;/a&gt; (PDF),
&lt;a href=&quot;/assets/img/dominators-practice.pdf&quot;&gt;hybrid/Semi-NCA approach&lt;/a&gt; (PDF), … &lt;a href=&quot;#fnref:compute-doms&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
            <pubDate>Sat, 04 Apr 2026 00:00:00 +0000</pubDate>
            <niceDate>April 4, 2026</niceDate>
            <link>https://bernsteinbear.com/blog/value-numbering/?utm_source=rss</link>
            <guid isPermaLink="true">https://bernsteinbear.com/blog/value-numbering/</guid>
        </item>
        
        <item>
            <title>Using Perfetto in ZJIT</title>
            <description>&lt;p&gt;&lt;em&gt;Originally published on &lt;a href=&quot;https://railsatscale.com/2026-03-27-using-perfetto-in-zjit/&quot;&gt;Rails At Scale&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Look! A trace of slow events in a benchmark! Hover over the image to see it get bigger.&lt;/p&gt;

&lt;style&gt;
img {
    max-width: 100%;
}
img:hover {
  transform: scale(2);
  transition: transform 0.1s ease-in;
}
img:not(:hover) {
  transition: transform 0.1s ease-out;
}
&lt;/style&gt;

&lt;figure&gt;

  &lt;p&gt;&lt;img src=&quot;/assets/img/perfetto-demo.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

  &lt;figcaption&gt;
A sneak preview of what the trace looks like.
&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Now read on to see what the slow events are and how we got this pretty picture.&lt;/p&gt;

&lt;h2 id=&quot;the-rules&quot;&gt;The rules&lt;/h2&gt;

&lt;p&gt;The first rule of just-in-time compilers is: you stay in JIT code. The second
rule of JIT is: you STAY in JIT code!&lt;/p&gt;

&lt;p&gt;When control leaves the compiled code to run in the interpreter—what the ZJIT
team calls either a “side-exit” or a “deopt”, depending on who you talk
to—things slow down. In a well-tuned system, this should happen pretty
rarely. Right now, because we’re still bringing up the compiler and runtime
system, it happens more than we would like.&lt;/p&gt;

&lt;p&gt;We’re reducing the number of exits over time.&lt;/p&gt;

&lt;h2 id=&quot;lies-damned-lies-and-statistics&quot;&gt;Lies, damned lies, and statistics&lt;/h2&gt;

&lt;p&gt;We can track our side-exit reduction progress with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--zjit-stats&lt;/code&gt;, which,
on process exit, prints out a tidy summary of the counters for all of the bad
stuff we track. It’s got side-exits. It’s got calls to C code. It’s got calls
to slow-path runtime helpers. It’s got everything.&lt;/p&gt;

&lt;p&gt;Here is a chopped-up sample of stats output for the Lobsters benchmark,
which is a large Rails app:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ WARMUP_ITRS=0 MIN_BENCH_ITRS=20 MIN_BENCH_TIME=0 ruby --zjit-stats benchmarks/lobsters/benchmark.rb
...
***ZJIT: Printing ZJIT statistics on exit***
...
Top-20 side exit reasons (100.0% of total 12,549,876):
                   guard_type_failure: 6,020,734 (48.0%)
                  guard_shape_failure: 5,556,147 (44.3%)
  block_param_proxy_not_iseq_or_ifunc:   445,358 ( 3.5%)
                   unhandled_hir_insn:   215,168 ( 1.7%)
                        compile_error:   181,474 ( 1.4%)
...
compiled_iseq_count:                               5,581
failed_iseq_count:                                     2
compile_time:                                    1,443ms
...
guard_type_count:                            133,425,094
guard_type_exit_ratio:                              4.5%
guard_shape_count:                            49,386,694
guard_shape_exit_ratio:                            11.3%
...
code_region_bytes:                            31,571,968
side_exit_size_ratio:                              33.1%
zjit_alloc_bytes:                             19,329,659
total_mem_bytes:                              50,901,627
...
ratio_in_zjit:                                     82.8%
$
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;(I’ve cut out significant chunks of the stats output and replaced them with
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;...&lt;/code&gt; because it’s overwhelming the first time you see it.)&lt;/p&gt;

&lt;p&gt;The first thing you might note is that the thing I just described as terrible
for performance is happening &lt;em&gt;over twelve million times&lt;/em&gt;. The second thing you
might notice is that despite this, we’re staying in JIT code seemingly a high
percentage of the time. Or are we? Is 80% high? Is a 4.5% class guard miss
ratio high? What about 11% for shapes? It’s hard to say.&lt;/p&gt;

&lt;p&gt;The counters are great because they’re &lt;em&gt;quick&lt;/em&gt; and they’re reasonably stable
proxies for performance. There’s no substitute for painstaking measurements on
a quiet machine but if the counter for Bad Slow Thing goes down (and others do
not go up), we’re probably doing a good job.&lt;/p&gt;

&lt;p&gt;But they’re not great for building intuition. For intuition, we want more
tangible feeling numbers. We want to see things.&lt;/p&gt;

&lt;h2 id=&quot;building-intuition&quot;&gt;Building intuition&lt;/h2&gt;

&lt;p&gt;The third thing is that you might ask yourself “self, where are these exits
coming from?” Unfortunately, counters cannot tell you that. For that, we
want stack traces. This lets us know where in the guest (Ruby) code triggers
an exit.&lt;/p&gt;

&lt;p&gt;Ideally also we would want some notion of time: we would want to know not just
where these events happen but also when. Are the exits happening early, at
application boot? At warmup? Even during what should be steady state
application time? Hard to say.&lt;/p&gt;

&lt;p&gt;So we need more tools. Thankfully, &lt;a href=&quot;https://perfetto.dev/&quot;&gt;Perfetto&lt;/a&gt; exists.
Perfetto is a system for visualizing and analyzing traces and profiles that your
application generates. It has both a web UI and a command-line UI.&lt;/p&gt;

&lt;p&gt;We can emit traces for Perfetto and visualize them there.&lt;/p&gt;

&lt;h2 id=&quot;a-look-at-perfetto&quot;&gt;A look at Perfetto&lt;/h2&gt;

&lt;p&gt;Take a look at this &lt;a href=&quot;https://ui.perfetto.dev/#!/?url=https://bernsteinbear.com/assets/misc/perfetto-36885.fxt&quot;&gt;sample ZJIT Perfetto
trace&lt;/a&gt;
generated by running Ruby with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--zjit-trace-exits&lt;/code&gt;&lt;sup id=&quot;fnref:sampled&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:sampled&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. What do you see?&lt;/p&gt;

&lt;p&gt;I see a couple arrows on the left. Arrows indicate “instant” point-in-time
events. Then I see a mess of purple to the right of that until the end of the
trace.&lt;/p&gt;

&lt;p&gt;Hover over an arrow. Find out that each arrow is a side-exit. Scream silently.&lt;/p&gt;

&lt;p&gt;But it’s a friendly arrow. It tells you what the side-exit reason is. If you
click it, it even tells you the stack trace in the pop-up panel on the bottom.
If we click a couple of them, maybe we can learn more.&lt;/p&gt;

&lt;p&gt;We can also zoom by mousing over the track, holding Ctrl, and scrolling. That
will get us look closer. But there are so many…&lt;/p&gt;

&lt;p&gt;Fortunately, Perfetto also provides a SQL interface to the traces. We can write
a query to aggregate all of the side exit events from the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;slice&lt;/code&gt; table and
line them up with the topmost method from the backtrace arguments in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;args&lt;/code&gt;
table:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;reason&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;display_value&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;method&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;COUNT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;count&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;slice&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg_set_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg_set_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;key&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;0&apos;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;display_value&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This pulls up a query box at the bottom showing us that there are a couple big
hotspots:&lt;/p&gt;

&lt;figure&gt;

  &lt;p&gt;&lt;img src=&quot;/assets/img/perfetto-method-query.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

  &lt;figcaption&gt;
Query results showing in columns left to right: reason for side-exit, method
that exited, and count. The top three are above 1k but it quickly falls off
after that.
&lt;/figcaption&gt;

&lt;/figure&gt;

&lt;p&gt;It even has a helpful option to export the results Markdown table so I can
paste (an edited version) into this blog post:&lt;/p&gt;

&lt;div style=&quot;overflow-x: auto; font-size: 0.75em; margin-left: max(-10em, calc(-50vw + 50%)); margin-right: max(-10em, calc(-50vw + 50%));&quot;&gt;

  &lt;table&gt;
    &lt;thead&gt;
      &lt;tr&gt;
        &lt;th&gt;reason&lt;/th&gt;
        &lt;th&gt;method&lt;/th&gt;
        &lt;th&gt;count&lt;/th&gt;
      &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
      &lt;tr&gt;
        &lt;td&gt;GuardShape(ShapeId(2475))&lt;/td&gt;
        &lt;td&gt;ActiveModel::AttributeRegistration::ClassMethods#attribute_types&lt;/td&gt;
        &lt;td&gt;5119&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;GuardShape(ShapeId(2099268))&lt;/td&gt;
        &lt;td&gt;ActiveRecord::ConnectionAdapters::AbstractAdapter#extended_type_map_key&lt;/td&gt;
        &lt;td&gt;2295&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;GuardType(FalseClass)&lt;/td&gt;
        &lt;td&gt;ActiveModel::Type::Value#cast&lt;/td&gt;
        &lt;td&gt;1025&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;GuardShape(ShapeId(2099698))&lt;/td&gt;
        &lt;td&gt;ActiveRecord::Associations#association_instance_get&lt;/td&gt;
        &lt;td&gt;904&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;BlockParamProxyNotIseqOrIfunc&lt;/td&gt;
        &lt;td&gt;ActiveRecord::AttributeMethods::Read#_read_attribute&lt;/td&gt;
        &lt;td&gt;902&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;GuardShape(ShapeId(526450))&lt;/td&gt;
        &lt;td&gt;Rack::Request::Env#get_header&lt;/td&gt;
        &lt;td&gt;636&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;GuardType(Class[class_exact*:Class@VALUE(0x128c60100)])&lt;/td&gt;
        &lt;td&gt;ActiveRecord::Base._reflections&lt;/td&gt;
        &lt;td&gt;622&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;GuardType(ObjectSubclass[class_exact:Story])&lt;/td&gt;
        &lt;td&gt;ActiveRecord::Associations#association&lt;/td&gt;
        &lt;td&gt;565&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;GuardShape(ShapeId(2098982))&lt;/td&gt;
        &lt;td&gt;ActiveRecord::Reflection::AssociationReflection#polymorphic?&lt;/td&gt;
        &lt;td&gt;510&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;GuardType(StringSubclass[class_exact:ActiveSupport::SafeBuffer])&lt;/td&gt;
        &lt;td&gt;ActionView::OutputBuffer#&amp;lt;&amp;lt;&lt;/td&gt;
        &lt;td&gt;500&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;GuardShape(ShapeId(2475))&lt;/td&gt;
        &lt;td&gt;ActiveRecord::AttributeMethods::PrimaryKey::ClassMethods#primary_key&lt;/td&gt;
        &lt;td&gt;492&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;GuardType(ObjectSubclass[class_exact:ActiveModel::Type::String])&lt;/td&gt;
        &lt;td&gt;ActiveModel::Type::Value#deserialize&lt;/td&gt;
        &lt;td&gt;442&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;GuardShape(ShapeId(2098982))&lt;/td&gt;
        &lt;td&gt;ActiveRecord::Reflection::AssociationReflection#deprecated?&lt;/td&gt;
        &lt;td&gt;376&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;GuardType(ObjectSubclass[class_exact:Bundler::Dependency])&lt;/td&gt;
        &lt;td&gt;Gem::Dependency#matches_spec?&lt;/td&gt;
        &lt;td&gt;355&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;UnhandledHIRInvokeBuiltin&lt;/td&gt;
        &lt;td&gt;Time#initialize&lt;/td&gt;
        &lt;td&gt;346&lt;/td&gt;
      &lt;/tr&gt;
    &lt;/tbody&gt;
  &lt;/table&gt;

&lt;/div&gt;

&lt;p&gt;Looks like we should figure out why we’re having shape misses so much and that will
clear up a lot of exits. (Hint: it’s because once we make our first guess about
what we think the object shape will be, we don’t re-assess… &lt;strong&gt;yet&lt;/strong&gt;.)&lt;/p&gt;

&lt;p&gt;This has been a taste of Perfetto. There’s probably a lot more to explore.
Please join the &lt;a href=&quot;https://zjit.zulipchat.com&quot;&gt;ZJIT Zulip&lt;/a&gt; and let us know if you have any cool
tracing or exploring tricks.&lt;/p&gt;

&lt;p&gt;Now I’ll explain how you too can use Perfetto from your system. Adding support
to ZJIT was pretty straightforward.&lt;/p&gt;

&lt;h2 id=&quot;implementation&quot;&gt;Implementation&lt;/h2&gt;

&lt;p&gt;The first thing is that you’ll need some way to get trace data out of your
system. We write to a file with a well-known location
(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/tmp/perfetto-PID.fxt&lt;/code&gt;), but you could do any number of things. Perhaps you
can stream events over a socket to another process, or to a server that
aggregates them, or store them internally and expose a webserver that serves
them over the internet, or… anything, really.&lt;/p&gt;

&lt;p&gt;Once you have that, you need a couple lines of code to emit the data. Perfetto
accepts a number of formats. For example, in his &lt;a href=&quot;https://thume.ca/2023/12/02/tracing-methods/&quot;&gt;excellent blog post&lt;/a&gt;,
Tristan Hume opens with such a simple snippet of code for logging Chromium
Trace JSON-formatted events (lightly modified by me):&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;event_name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;timestamp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;duration&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;trace.json&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;a&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;[&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# ... emit some events here ...
&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# Log a single event
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;{&quot;name&quot;: &quot;%s&quot;, &quot;ts&quot;: %d, &quot;dur&quot;: %d, &quot;cat&quot;: &quot;hi&quot;, &quot;ph&quot;: &quot;X&quot;, &quot;pid&quot;: 1, &quot;tid&quot;: 1, &quot;args&quot;: {}},&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;event_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;duration&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# ... emit some events here ...
&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# ... at process exit, close the file ...
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;]&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# this closing ] isn&apos;t actually required
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;close&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This snippet is great. It shows, end-to-end, writing a stream of one event. It
is a &lt;em&gt;complete&lt;/em&gt; (X) event, as opposed to either:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;two discrete timestamped &lt;em&gt;begin&lt;/em&gt; (B) and &lt;em&gt;end&lt;/em&gt; (E) events that book-end
something, or&lt;/li&gt;
  &lt;li&gt;an &lt;em&gt;instant&lt;/em&gt; (i) event that has no duration, or&lt;/li&gt;
  &lt;li&gt;a couple other event types in the &lt;a href=&quot;https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview&quot;&gt;Chromium Trace Event Format doc&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It was enough to get me started. Since it’s JSON, and we have a lot of side
exits, the trace quickly ballooned to 8GB large for a several second benchmark.
Not great. Now, part of this is our fault—we should side exit less—and part
of it is just the verbosity of JSON.&lt;/p&gt;

&lt;p&gt;Thankfully, Perfetto ingests more compact binary formats, such as the &lt;a href=&quot;https://fuchsia.dev/fuchsia-src/reference/tracing/trace-format&quot;&gt;Fuchsia
trace format&lt;/a&gt;.
In addition to being more compact, FXT even supports string interning. After
modifying the tracer to emit FXT, we ended with closer to 100MB for the same
benchmark.&lt;/p&gt;

&lt;p&gt;We can reduce further by &lt;em&gt;sampling&lt;/em&gt;—not writing every exit to the trace, but
instead every &lt;em&gt;K&lt;/em&gt; exits (for some (probably prime) K). This is why we provide
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--zjit-trace-exits-sample-rate=K&lt;/code&gt; option.&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href=&quot;https://github.com/ruby/ruby/blob/eb8051185122d4b7bc9c6a6df694a85f34ced681/zjit/src/stats.rs#L988&quot;&gt;trace writer&lt;/a&gt; implementation from the point this article
was written.&lt;/p&gt;

&lt;h2 id=&quot;tracing-more-things&quot;&gt;Tracing more things&lt;/h2&gt;

&lt;p&gt;We could trace:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;When methods get compiled&lt;/li&gt;
  &lt;li&gt;How big the generated code is&lt;/li&gt;
  &lt;li&gt;How long each compile phase takes&lt;/li&gt;
  &lt;li&gt;When (and where) invalidation events happen&lt;/li&gt;
  &lt;li&gt;When (and where) allocations happen from JITed code&lt;/li&gt;
  &lt;li&gt;Garbage collection events&lt;/li&gt;
  &lt;li&gt;and more!&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Visualizations are awesome. Get your data in the right format so you can ask
the right questions easily. Thanks for Perfetto!&lt;/p&gt;

&lt;p&gt;Also, looks like visualizations are now available in Perfetto canary. Time to
go make some fun histograms…&lt;/p&gt;
&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:sampled&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;This is also sampled/strobed, so not every exit is in there. This
is just 1/K of them for some K that I don’t remember. &lt;a href=&quot;#fnref:sampled&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
            <pubDate>Fri, 27 Mar 2026 00:00:00 +0000</pubDate>
            <niceDate>March 27, 2026</niceDate>
            <link>https://bernsteinbear.com/blog/zjit-perfetto/?utm_source=rss</link>
            <guid isPermaLink="true">https://bernsteinbear.com/blog/zjit-perfetto/</guid>
        </item>
        
        <item>
            <title>A fuzzer for the Toy Optimizer</title>
            <description>&lt;p&gt;&lt;em&gt;Another entry in the &lt;a href=&quot;https://pypy.org/categories/toy-optimizer.html&quot;&gt;Toy Optimizer series&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;It’s hard to get compiler optimizers right. Even if you build up a painstaking test
suite by hand, you will likely miss corner cases, especially corner cases at
the interactions of multiple components or multiple optimization passes.&lt;/p&gt;

&lt;p&gt;I wanted to see if I could write a fuzzer to catch some of these bugs
automatically. But a fuzzer alone isn’t much use without some correctness
oracle—in this case, we want a more interesting bug than accidentally
crashing the optimizer. We want to see if the optimizer introduces a
correctness bug in the program.&lt;/p&gt;

&lt;p&gt;So I set off in the most straightforward way possible, inspired by my
hazy memories of a former &lt;a href=&quot;https://pypy.org/posts/2024/03/fixing-bug-incremental-gc.html&quot;&gt;CF blog post&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;generating-programs&quot;&gt;Generating programs&lt;/h2&gt;

&lt;p&gt;Generating random programs isn’t so bad. We have program generation APIs and we
can dynamically pick which ones we want to call. I wrote a small loop that
generates &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;load&lt;/code&gt;s from and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;store&lt;/code&gt;s to the arguments at random offsets and with
random values, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;escape&lt;/code&gt;s to random instructions with outputs. The idea
with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;escape&lt;/code&gt; is to keep track of the values as if there was some other
function relying on them.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;generate_program&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Block&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getarg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;num_ops&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ops_with_values&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[:]&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;num_ops&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;choice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;load&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;store&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;escape&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;choice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;a_value&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;choice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ops_with_values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;offset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;load&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;offset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;ops_with_values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;elif&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;store&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;store&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;offset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;elif&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;escape&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;escape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a_value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;raise&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NotImplementedError&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Unknown operation &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This generates random programs. Here is an example stringified random program:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;var0 = getarg(0)
var1 = getarg(1)
var2 = getarg(2)
var3 = load(var2, 0)
var4 = load(var0, 1)
var5 = load(var1, 1)
var6 = escape(var0)
var7 = store(var0, 2, 3)
var8 = store(var2, 0, 7)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;No idea what would generate something like this, but oh well.&lt;/p&gt;

&lt;h2 id=&quot;verifying-programs&quot;&gt;Verifying programs&lt;/h2&gt;

&lt;p&gt;Then we want to come up with our invariants. I picked the invariant that, under
the same preconditions, the heap will look the same after running an optimized
program as it would under an un-optimized program&lt;sup id=&quot;fnref:equivalence&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:equivalence&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. So we can delete
instructions, but if we don’t have a load-bearing store, store the wrong
information, or cache stale loads, we will probably catch that.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;verify_program&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;before_no_alias&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interpret_program&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;a&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;b&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;c&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;a&quot;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;before_alias&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interpret_program&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;optimized&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;optimize_load_store&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;after_no_alias&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interpret_program&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;optimized&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;a&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;b&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;c&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;after_alias&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interpret_program&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;optimized&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;before_no_alias&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;after_no_alias&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;before_alias&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;after_alias&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I have a very silly verifier that tests two cases: one where the arguments do
not alias and one where they are all the same object. Generating partial
aliases would be a good extension here.&lt;/p&gt;

&lt;p&gt;Last, we have the interpreter.&lt;/p&gt;

&lt;h2 id=&quot;running-programs&quot;&gt;Running programs&lt;/h2&gt;

&lt;p&gt;The interpreter is responsible for keeping track of the heap (as indexed by
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(object, offset)&lt;/code&gt; pairs) as well as the results of the various instructions.&lt;/p&gt;

&lt;p&gt;We keep track of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;escape&lt;/code&gt;d values so we can see results of some
instructions even if they do not get written back to the heap. Maybe we should
be &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;escape&lt;/code&gt;ing all instructions with output instead of only random ones. Who
knows.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;interpret_program&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;heap&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{}&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ssa&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{}&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;escaped&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;getarg&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;ssa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;elif&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;store&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;obj&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ssa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;offset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;heap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;obj&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;offset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;elif&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;load&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;obj&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ssa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;offset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;obj&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;offset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;unknown&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;ssa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;elif&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;escape&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;isinstance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Constant&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;escaped&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;escaped&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ssa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;raise&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NotImplementedError&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Unknown operation &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;heap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;escaped&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;escaped&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heap&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then we return the heap so that the verifier can check.&lt;/p&gt;

&lt;h2 id=&quot;the-harness&quot;&gt;The harness&lt;/h2&gt;

&lt;p&gt;Then we run a bunch of random tests through the verifier!&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;test_random_programs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# Remove random.seed if using in CI... instead print the seed out so you
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;# can reproduce crashes if you find them
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;seed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;num_programs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;num_programs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;program&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;generate_program&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;verify_program&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;program&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The number of programs is configurable. Or you could make this &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;while True&lt;/code&gt;.
But due to how simple the optimizer is, we will find all the possible bugs
pretty quickly.&lt;/p&gt;

&lt;p&gt;I initially started writing this post because I thought I had found a bug, but
it turns out that I had, with CF’s help, in 2022, walked through every possible
case in the “buggy” situation, and the optimizer handles those cases correctly.
That explains why the verifier didn’t find that bug!&lt;/p&gt;

&lt;h2 id=&quot;testing-the-verifier&quot;&gt;Testing the verifier&lt;/h2&gt;

&lt;p&gt;So does it work? If you run it, it’ll hang for a bit and then report no issues.
That’s helpful, in a sense… it’s revealing that it is unable to find a
certain class of bug in the optimizer.&lt;/p&gt;

&lt;p&gt;Let’s comment out the main load-bearing pillar of correctness in the
optimizer—removing aliasing writes—and see what happens.&lt;/p&gt;

&lt;p&gt;We get a crash nearly instantly:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ uv run --with pytest pytest loadstore.py -k random
...
=========================================== FAILURES ============================================
_____________________________________ test_random_programs ______________________________________

    def test_random_programs():
        random.seed(0)
        num_programs = 100000
        for i in range(num_programs):
            program = generate_program()
&amp;gt;           verify_program(program)

loadstore.py:617:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

bb = [Operation(getarg, [Constant(0)], None, None), Operation(getarg, [Constant(1)], None, None), Operation(getarg, [Consta...], None, None)], None, None), Operation(load, [Operation(getarg, [Constant(0)], None, None), Constant(0)], None, None)]

    def verify_program(bb):
        before_no_alias = interpret_program(bb, [&quot;a&quot;, &quot;b&quot;, &quot;c&quot;])
        a = &quot;a&quot;
        before_alias = interpret_program(bb, [a, a, a])
        optimized = optimize_load_store(bb)
        after_no_alias = interpret_program(optimized, [&quot;a&quot;, &quot;b&quot;, &quot;c&quot;])
        after_alias = interpret_program(optimized, [a, a, a])
        assert before_no_alias == after_no_alias
&amp;gt;       assert before_alias == after_alias
E       AssertionError: assert {(&apos;a&apos;, 0): 4,...&apos;, 3): 1, ...} == {(&apos;a&apos;, 0): 9,...&apos;, 3): 1, ...}
E
E         Omitting 4 identical items, use -vv to show
E         Differing items:
E         {(&apos;a&apos;, 0): 4} != {(&apos;a&apos;, 0): 9}
E         Use -v to get more diff

loadstore.py:610: AssertionError
==================================== short test summary info ====================================
FAILED loadstore.py::test_random_programs - AssertionError: assert {(&apos;a&apos;, 0): 4,...&apos;, 3): 1, ...} == {(&apos;a&apos;, 0): 9,...&apos;, 3): 1, ...}
=============================== 1 failed, 15 deselected in 0.04s ================================
$
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We should probably use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bb_to_str(bb)&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bb_to_str(optimized)&lt;/code&gt; to print out
the un-optimized and optimized traces in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;assert&lt;/code&gt; failure messages. But we
get a nice diff of the heap automatically, which is neat. And it points to an
aliasing problem!&lt;/p&gt;

&lt;h2 id=&quot;full-code&quot;&gt;Full code&lt;/h2&gt;

&lt;p&gt;See the &lt;a href=&quot;https://github.com/tekknolagi/tekknolagi.github.com/blob/fbccf9696e98721ca77c8d5ec5f828a11492b04c/loadstore.py&quot;&gt;full code&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;extensions&quot;&gt;Extensions&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Synthesize (different) types for non-aliasing objects and add them in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;info&lt;/code&gt;
  &lt;!--
    * CF notes that we could maybe do this by, instead of adding `.info`, have a
      `checktype` guard instruction that the optimizer can use to learn types and
      change aliasing from inside the trace
  --&gt;&lt;/li&gt;
  &lt;li&gt;Shrink/reduce failing examples down for easier debugging&lt;/li&gt;
  &lt;li&gt;Use Hypothesis for property-based testing, which CF notes also gives you
shrinking&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://pypy.org/posts/2022/12/jit-bug-finding-smt-fuzzing.html&quot;&gt;Use Z3 to encode&lt;/a&gt; the generated programs instead of randomly interpreting them&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;thanks&quot;&gt;Thanks&lt;/h2&gt;

&lt;p&gt;Thank you to &lt;a href=&quot;https://cfbolz.de/&quot;&gt;CF Bolz-Tereick&lt;/a&gt; for feedback on this post!&lt;/p&gt;
&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:equivalence&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;CF notes that this notion of equivalence works for this
optimizer but not for one that does allocation removal (escape analysis).
If we removed allocations and writes to them, we would be changing the heap
results and our verifier would appear to fail. This means we have to, if we
are to delete allocations, pick a more subtle definition of equivalence.&lt;/p&gt;

      &lt;p&gt;Perhaps something that looks like escape analysis in the verifier’s
interpreter? &lt;a href=&quot;#fnref:equivalence&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
            <pubDate>Wed, 25 Feb 2026 00:00:00 +0000</pubDate>
            <niceDate>February 25, 2026</niceDate>
            <link>https://bernsteinbear.com/blog/toy-fuzzer/?utm_source=rss</link>
            <guid isPermaLink="true">https://bernsteinbear.com/blog/toy-fuzzer/</guid>
        </item>
        
        <item>
            <title>Type-based alias analysis in the Toy Optimizer</title>
            <description>&lt;p&gt;&lt;em&gt;Another entry in the &lt;a href=&quot;https://pypy.org/categories/toy-optimizer.html&quot;&gt;Toy Optimizer series&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Last time, we did &lt;a href=&quot;/blog/toy-load-store/&quot;&gt;load-store forwarding&lt;/a&gt; in the context
of our Toy Optimizer. We managed to cache the results of both reads from and
writes to the heap—at compile-time!&lt;/p&gt;

&lt;p&gt;We were careful to mind object aliasing: we separated our heap information into
alias classes based on what offset the reads/writes referenced. This way, if we
didn’t know if object &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;b&lt;/code&gt; aliased, we could at least know that
different offsets would never alias (assuming our objects don’t overlap and
memory accesses are on word-sized slots). This is a coarse-grained heuristic.&lt;/p&gt;

&lt;p&gt;Fortunately, we often have much more information available at compile-time than
just the offset, so we should use it. I mentioned in a footnote that we could
use type information, for example, to improve our alias analysis. We’ll add
a lightweight form of &lt;a href=&quot;/assets/img/tbaa.pdf&quot;&gt;type-based alias analysis (TBAA)&lt;/a&gt;
(PDF) in this post.&lt;/p&gt;

&lt;h2 id=&quot;representing-types&quot;&gt;Representing types&lt;/h2&gt;

&lt;p&gt;We return once again to Fil Pizlo land, specifically &lt;a href=&quot;https://gist.github.com/pizlonator/cf1e72b8600b1437dda8153ea3fdb963&quot;&gt;How I implement SSA
form&lt;/a&gt;.
We’re going to be using the hierarchical heap effect representation from the
post in our implementation, but you can use your own type representation if you
have one already.&lt;/p&gt;

&lt;p&gt;This representation divides the heap into disjoint regions by type. Consider,
for example, that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Array&lt;/code&gt; objects and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;String&lt;/code&gt; objects do not overlap. A
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LinkedList&lt;/code&gt; pointer is never going to alias an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Integer&lt;/code&gt; pointer. They can
therefore be reasoned about separately.&lt;/p&gt;

&lt;p&gt;But sometimes you don’t have perfect type information available. If you have in
your language an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Object&lt;/code&gt; base class of all objects, then the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Object&lt;/code&gt; heap
overlaps with, say, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Array&lt;/code&gt; heap. So you need some way to represent that
too—just having an enum doesn’t work cleanly.&lt;/p&gt;

&lt;p&gt;Here is an example simplified type hierarchy:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Any
  Object
    Array
    String
  Other
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Other&lt;/code&gt; might represent different parts of the runtime’s data structures,
and could be further segmented into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GC&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Thread&lt;/code&gt;, etc.&lt;/p&gt;

&lt;p&gt;Fil’s idea is that we can represent each node in that hierarchy with a tuple of
integers &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[start, end)&lt;/code&gt; (inclusive, exclusive) that represent the pre- and
post-order traversals of the tree. Or, if tree traversals are not engraved into
your bones, they represent the range of all the nested objects within them.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Any [0, 3)
  Object [0, 2)
    Array [0, 1)
    String [1, 2)
  Other [2, 3)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then the “does this write interfere with this read” check—the aliasing
check—is a range overlap query.&lt;/p&gt;

&lt;p&gt;Here’s a perhaps over-engineered Python implementation of the range and heap
hierarchy based on the Ruby generator and C++ runtime code from JavaScriptCore:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;HeapRange&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;__init__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;end&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;end&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;__repr__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;[&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;, &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;)&quot;&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;is_empty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bool&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;overlaps&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;other&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;HeapRange&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bool&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;# Empty ranges interfere with nothing
&lt;/span&gt;        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;is_empty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;or&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;other&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;is_empty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;other&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;other&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;


&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;AbstractHeap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;__init__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;children&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;add_child&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AbstractHeap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;children&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;compute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;current&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;children&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HeapRange&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;current&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;child&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;children&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;child&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;compute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;current&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;current&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;child&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HeapRange&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;current&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;


&lt;span class=&quot;n&quot;&gt;Any&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AbstractHeap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Any&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Object&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Any&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;add_child&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Object&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Array&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Object&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;add_child&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Array&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Object&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;add_child&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;String&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Other&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Any&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;add_child&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Other&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Any&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;compute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Any.compute(0)&lt;/code&gt; kicks off the tree-numbering scheme.&lt;/p&gt;

&lt;p&gt;Fil’s implementation also covers a bunch of abstract heaps such as SSAState and
Control because his is used for code motion and whatnot. That can be added on
later but we will not do so in this post.&lt;/p&gt;

&lt;p&gt;So there you have it: a type representation. Now we need to use it in our
load-store forwarding.&lt;/p&gt;

&lt;h2 id=&quot;load-store-forwarding&quot;&gt;Load-store forwarding&lt;/h2&gt;

&lt;p&gt;Recall that our load-store optimization pass looks like this:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;optimize_load_store&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Block&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;opt_bb&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Block&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# Stores things we know about the heap at... compile-time.
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;# Key: an object and an offset pair acting as a heap address
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;# Value: a previous SSA value we know exists at that address
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;compile_time_heap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Dict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{}&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;store&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;obj&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;offset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;store_info&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;obj&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;offset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;current_value&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;compile_time_heap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;store_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;new_value&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eq_value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;current_value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;new_value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;continue&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;compile_time_heap&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;load_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;load_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;compile_time_heap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;items&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;load_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;offset&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;compile_time_heap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;store_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;new_value&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;elif&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;load&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;load_info&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;load_info&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;compile_time_heap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;make_equal_to&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;compile_time_heap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;continue&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;compile_time_heap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;opt_bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;opt_bb&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;At its core, it iterates over the instructions, keeping a representation of the
heap at compile-time. Reads get cached, writes get cached, and writes also
invalidate the state of compile-time information about fields that may alias.&lt;/p&gt;

&lt;p&gt;In this case, our &lt;em&gt;may alias&lt;/em&gt; asks only if the offsets overlap. This means that
the following unit test will fail:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;test_store_to_same_offset_different_heaps_does_not_invalidate_load&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Block&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;var0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getarg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;var0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Array&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;var1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getarg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;var1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;var2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;store&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;var3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;store&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;var4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;escape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;opt_bb&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;optimize_load_store&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;bb_to_str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;opt_bb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\
&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;var0 = getarg(0)
var1 = getarg(1)
var2 = store(var0, 0, 3)
var3 = store(var1, 0, 4)
var4 = escape(3)&quot;&quot;&quot;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This test is expecting the write to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;var0&lt;/code&gt; to still remain cached even though
we wrote to the same offset in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;var1&lt;/code&gt;—because we have annotated &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;var0&lt;/code&gt; as
being an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Array&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;var1&lt;/code&gt; as being a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;String&lt;/code&gt;. If we account for type
information in our alias analysis, we can get this test to pass.&lt;/p&gt;

&lt;p&gt;After doing a bunch of fussing around with the load-store forwarding (many
rewrites), I eventually got it down to a very short diff:&lt;/p&gt;

&lt;div class=&quot;language-diff highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gi&quot;&gt;+def may_alias(left: Value, right: Value) -&amp;gt; bool:
+    return (left.info or Any).range.overlaps((right.info or Any).range)
+
+
&lt;/span&gt; def optimize_load_store(bb: Block):
     opt_bb = Block()
     # Stores things we know about the heap at... compile-time.
&lt;span class=&quot;p&quot;&gt;@@ -138,6 +210,10 @@&lt;/span&gt; def optimize_load_store(bb: Block):
                 load_info: value
                 for load_info, value in compile_time_heap.items()
                 if load_info[1] != offset
&lt;span class=&quot;gi&quot;&gt;+                or not may_alias(load_info[0], obj)
&lt;/span&gt;             }
             compile_time_heap[store_info] = new_value
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If we don’t have any type/alias information, we default to “I know nothing”
(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Any&lt;/code&gt;) for each object. Then we check range overlap.&lt;/p&gt;

&lt;p&gt;The boolean logic in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;optimize_load_store&lt;/code&gt; looks a little weird, maybe. But we
can also rewrite (via DeMorgan’s law) as:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;offset&lt;/span&gt;
            &lt;span class=&quot;ow&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;may_alias&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;load_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;obj&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So, keeping all the cached field state about fields that are known by offset
and by type not to alias. Maybe that is clearer (but not as nice a diff).&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Note that the type representation is not so important here! You could use a
bitset version of the type information if you want. The important things are
that you can cheaply construct types and check overlap between them.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;!--
Note that we do not currently have a notion of &quot;must-alias&quot; other than if two
SSA values are equal. Therefore we can&apos;t make use of writes to object A for
loads from object B even if A and B must alias.
--&gt;

&lt;p&gt;Nice, now our test passes! We can differentiate between memory accesses on
objects of different types.&lt;/p&gt;

&lt;p&gt;But what if we knew more?&lt;/p&gt;

&lt;h2 id=&quot;object-provenance--allocation-site&quot;&gt;Object provenance / allocation site&lt;/h2&gt;

&lt;p&gt;Sometimes we know where an object came from. For example, we may have seen it
get allocated in the trace. If we saw an object’s allocation, we know that it
does not alias (for example) any object that was passed in via a parameter. We
can use this kind of information to our advantage.&lt;/p&gt;

&lt;p&gt;For example, in the following made up IR snippet:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trace(arg0):
  v0 = malloc(8)
  v1 = malloc(16)
  ...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We know that (among other facts) &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;v0&lt;/code&gt; doesn’t alias &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;arg0&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;v1&lt;/code&gt; because we
have seen its allocation site.&lt;/p&gt;

&lt;p&gt;I saw this in the old V8 IR Hydrogen’s lightweight alias analysis&lt;sup id=&quot;fnref:fork&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:fork&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;div class=&quot;language-c++ highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HAliasing&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;kMustAlias&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;kMayAlias&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;kNoAlias&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;HAliasing&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;HValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// The same SSA value always references the same object.&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kMustAlias&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsAllocate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsInnerAllocatedObject&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// Two non-identical allocations can never be aliases.&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsAllocate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kNoAlias&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsInnerAllocatedObject&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kNoAlias&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// An allocation can never alias a parameter or a constant.&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsParameter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kNoAlias&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsConstant&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kNoAlias&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsAllocate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsInnerAllocatedObject&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// An allocation can never alias a parameter or a constant.&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsParameter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kNoAlias&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsConstant&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kNoAlias&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;// Constant objects can be distinguished statically.&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsConstant&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsConstant&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Equals&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kMustAlias&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kNoAlias&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kMayAlias&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There is plenty of other useful information such as:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;If we know at compile-time that object A has 5 at offset 0 and object B has 7
at offset 0, then A and B don’t alias (thanks, CF)
    &lt;ul&gt;
      &lt;li&gt;In the RPython JIT in PyPy, this is used to determine if two user (Python)
objects don’t alias because we know the contents of the user (Python) class
field&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Object size (though perhaps that is a special case of the above bullet)&lt;/li&gt;
  &lt;li&gt;Field size/type&lt;/li&gt;
  &lt;li&gt;Deferring alias checks to run-time
    &lt;ul&gt;
      &lt;li&gt;Have a branch &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;if (a == b) { ... } else { ... }&lt;/code&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;…&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have other fun ones, please write in.&lt;/p&gt;

&lt;h2 id=&quot;interacting-with-other-instructions&quot;&gt;Interacting with other instructions&lt;/h2&gt;

&lt;p&gt;We only handle loads and stores in our optimizer. Unfortunately, this means we
may accidentally cache stale information. Consider: what happens if a function
call (or any other opaque instruction) writes into an object we are tracking?&lt;/p&gt;

&lt;p&gt;The conservative approach is to invalidate all cached information on a function
call. This is definitely correct, but it’s a bummer for the optimizer. Can we
do anything?&lt;/p&gt;

&lt;p&gt;Well, perhaps we are calling a well-known function or a specific IR
instruction. In that case, we can annotate it with effects in the same abstract
heap model: if the instruction does not write, or only writes to some heaps, we
can at least only partially invalidate our heap.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;known_builtin_functions&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;&quot;Array_length&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Effects&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reads&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;writes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Empty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()),&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;&quot;Object_setShape&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Effects&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reads&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Empty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;writes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Object&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;&quot;String_setEncoding&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Effects&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reads&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Empty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;writes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;However, if the function is unknown or otherwise opaque, we need at least more
advanced alias information and perhaps even (partial) escape analysis.&lt;/p&gt;

&lt;p&gt;Consider: even if an instruction takes no operands, we have no idea what state
it has access to. If it writes to any object A, we cannot safely cache
information about any other object B unless we know &lt;em&gt;for sure&lt;/em&gt; that A and B do
not alias. And we don’t know what the instruction writes to. So we may only
know we can cache information about B because it was allocated locally and has
not escaped.&lt;/p&gt;

&lt;h2 id=&quot;storing-vs-computing-on-the-fly&quot;&gt;Storing vs computing on the fly&lt;/h2&gt;

&lt;p&gt;Some runtimes such as ART &lt;a href=&quot;https://github.com/LineageOS/android_art/blob/8ce603e0c68899bdfbc9cd4c50dcc65bbf777982/compiler/optimizing/load_store_analysis.h#L395&quot;&gt;pre-compute all of their alias information&lt;/a&gt; in a bit
matrix. This makes more sense if you are using alias information in a full
control-flow graph, where you might need to iterate over the graph a few times.
In a trace context, you can do a lot in one single pass—no need to make a
matrix.&lt;/p&gt;

&lt;h2 id=&quot;when-is-this-useful-how-much&quot;&gt;When is this useful? How much?&lt;/h2&gt;

&lt;p&gt;As usual, this is a toy IR and a toy optimizer, so it’s hard to say how much
faster it makes its toy programs.&lt;/p&gt;

&lt;p&gt;In general, though, there is a dial for analysis and optimization that goes
between precision and speed. This is a happy point on that dial, only a tiny
incremental analysis cost bump above offset-only invalidation, but for higher
precision. I like that tradeoff.&lt;/p&gt;

&lt;p&gt;Also, it is very useful in JIT compilers where generally the managed language
is a little &lt;a href=&quot;https://blog.regehr.org/archives/959&quot;&gt;better-behaved than a C-like
language&lt;/a&gt;. Somewhere in your IR there
will be a lot of duplicate loads and stores from a strength reduction pass, and
this can clean up the mess.&lt;/p&gt;

&lt;!--
## In other languages

Taking address of objects throws a wrench in it

Can&apos;t really do it in C, even though UB
--&gt;

&lt;!--
https://github.com/WebKit/WebKit/blob/main/Source/JavaScriptCore/dfg/DFGObjectAllocationSinkingPhase.cpp
--&gt;

&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping up&lt;/h2&gt;

&lt;p&gt;See the &lt;a href=&quot;https://github.com/tekknolagi/tekknolagi.github.com/blob/67a1c5cbcf81d96cc63f8b3904619c018d1f2be1/loadstore.py&quot;&gt;full code&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thanks for joining as I work through a small use of type-based alias analysis
for myself. I hope you enjoyed.&lt;/p&gt;

&lt;p&gt;See also &lt;a href=&quot;https://wingolog.org/archives/2026/02/18/two-mechanisms-for-dynamic-type-checks&quot;&gt;two mechanisms for dynamic type
checks&lt;/a&gt;
by Andy Wingo. CRuby uses the latter technique described in the article.&lt;/p&gt;

&lt;h2 id=&quot;thanks&quot;&gt;Thanks&lt;/h2&gt;

&lt;p&gt;Thank you to &lt;a href=&quot;https://www.chrisgregory.me/&quot;&gt;Chris Gregory&lt;/a&gt; for helpful feedback.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:fork&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;I made &lt;a href=&quot;https://github.com/tekknolagi/v8&quot;&gt;a fork of V8&lt;/a&gt; to go spelunk
around the Hydrogen IR. I reset the V8 repo to the last commit before they
deleted it in favor of their new Sea of Nodes based IR called TurboFan. &lt;a href=&quot;#fnref:fork&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
            <pubDate>Mon, 16 Feb 2026 00:00:00 +0000</pubDate>
            <niceDate>February 16, 2026</niceDate>
            <link>https://bernsteinbear.com/blog/toy-tbaa/?utm_source=rss</link>
            <guid isPermaLink="true">https://bernsteinbear.com/blog/toy-tbaa/</guid>
        </item>
        
        <item>
            <title>A multi-entry CFG design conundrum</title>
            <description>&lt;h2 id=&quot;background-and-bytecode-design&quot;&gt;Background and bytecode design&lt;/h2&gt;

&lt;p&gt;The ZJIT compiler compiles Ruby bytecode (YARV) to machine code. It starts by
transforming the stack machine bytecode into a high-level graph-based
intermediate representation called HIR.&lt;/p&gt;

&lt;p&gt;We use a more or less typical&lt;sup id=&quot;fnref:ebb&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:ebb&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; control-flow graph (CFG) in HIR. We have a
compilation unit, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Function&lt;/code&gt;, which has multiple basic blocks, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Block&lt;/code&gt;. Each
block contains multiple instructions, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Insn&lt;/code&gt;. HIR is always in SSA form, and we
use the variant of SSA with block parameters instead of phi nodes.&lt;/p&gt;

&lt;p&gt;Where it gets weird, though, is our handling of multiple entrypoints. See, YARV
handles default positional parameters (but &lt;em&gt;not&lt;/em&gt; default keyword parameters) by
embedding the code to compute the defaults inside the callee bytecode. Then
callers are responsible for figuring out what offset in the bytecode they
should start running the callee, depending on the amount of arguments the
caller provides.&lt;sup id=&quot;fnref:keywords&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:keywords&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;In the following example, we have a function that takes two optional positional
parameters &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;b&lt;/code&gt;. If neither is provided, we start at offset &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0000&lt;/code&gt;. If
just &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; is provided, we start at offset &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0005&lt;/code&gt;. If both are provided, we can
start at offset &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0010&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ ruby --dump=insns -e &apos;def foo(a=compute_a, b=compute_b) = a + b&apos;
...
== disasm: #&amp;lt;ISeq:foo@-e:1 (1,0)-(1,41)&amp;gt;
local table (size: 2, argc: 0 [opts: 2, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 2] a@0&amp;lt;Opt=0&amp;gt; [ 1] b@1&amp;lt;Opt=5&amp;gt;
0000 putself                                                          (   1)
0001 opt_send_without_block   &amp;lt;calldata!mid:compute_a, argc:0, FCALL|VCALL|ARGS_SIMPLE&amp;gt;
0003 setlocal_WC_0            a@0
0005 putself
0006 opt_send_without_block   &amp;lt;calldata!mid:compute_b, argc:0, FCALL|VCALL|ARGS_SIMPLE&amp;gt;
0008 setlocal_WC_0            b@1
0010 getlocal_WC_0            a@0[Ca]
0012 getlocal_WC_0            b@1
0014 opt_plus                 &amp;lt;calldata!mid:+, argc:1, ARGS_SIMPLE&amp;gt;[CcCr]
0016 leave                    [Re]
$
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;(See the jump table debug output: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[ 2] a@0&amp;lt;Opt=0&amp;gt; [ 1] b@1&amp;lt;Opt=5&amp;gt;&lt;/code&gt;)&lt;/p&gt;

&lt;p&gt;Unlike in Python, where default arguments are evaluated &lt;em&gt;at function creation
time&lt;/em&gt;, Ruby computes the default values &lt;em&gt;at function call time&lt;/em&gt;. This includes
arbitrary function calls, raising exceptions, doing long I/O, or whatever your
heart desires. For this reason, embedding the default code inside the callee
makes a lot of sense; we have a full call frame already set up, so any
optimizations (!), side-exits, exception handling machinery, profiling, etc
doesn’t need special treatment.&lt;/p&gt;

&lt;p&gt;Since the caller knows what arguments it is passing, and often to what
function, we can efficiently support this in the JIT. We just need to know what
offset in the compiled callee to call into. The interpreter can also call into
the compiled function, which just has a stub to do dispatch to the appropriate
entry block.&lt;/p&gt;

&lt;p&gt;This has led us to design the HIR to support &lt;em&gt;multiple function entrypoints&lt;/em&gt;.
Instead of having just a single entry block, as most control-flow graphs do,
each of our functions now has an array of function entries: one for the
interpreter, at least one for the JIT, and more for default parameter handling.
Each of these entry blocks is separately callable from the outside world.&lt;/p&gt;

&lt;p&gt;Here is what the (slightly cleaned up) HIR looks like for the above example:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Optimized HIR:
fn foo@tmp/branchnil.rb:4:
bb0():
  EntryPoint interpreter
  v1:BasicObject = LoadSelf
  v2:BasicObject = GetLocal :a, l0, SP@5
  v3:BasicObject = GetLocal :b, l0, SP@4
  v4:CPtr = LoadPC
  v5:CPtr[CPtr(0x16d27e908)] = Const CPtr(0x16d282120)
  v6:CBool = IsBitEqual v4, v5
  IfTrue v6, bb2(v1, v2, v3)
  v8:CPtr[CPtr(0x16d27e908)] = Const CPtr(0x16d282120)
  v9:CBool = IsBitEqual v4, v8
  IfTrue v9, bb4(v1, v2, v3)
  Jump bb6(v1, v2, v3)
bb1(v13:BasicObject):
  EntryPoint JIT(0)
  v14:NilClass = Const Value(nil)
  v15:NilClass = Const Value(nil)
  Jump bb2(v13, v14, v15)
bb2(v27:BasicObject, v28:BasicObject, v29:BasicObject):
  v65:HeapObject[...] = GuardType v27, HeapObject[class_exact*:Object@VALUE(0x1043aed00)]
  v66:BasicObject = SendWithoutBlockDirect v65, :compute_a (0x16d282148)
  Jump bb4(v27, v66, v29)
bb3(v18:BasicObject, v19:BasicObject):
  EntryPoint JIT(1)
  v20:NilClass = Const Value(nil)
  Jump bb4(v18, v19, v20)
bb4(v38:BasicObject, v39:BasicObject, v40:BasicObject):
  v69:HeapObject[...] = GuardType v38, HeapObject[class_exact*:Object@VALUE(0x1043aed00)]
  v70:BasicObject = SendWithoutBlockDirect v69, :compute_b (0x16d282148)
  Jump bb6(v38, v39, v70)
bb5(v23:BasicObject, v24:BasicObject, v25:BasicObject):
  EntryPoint JIT(2)
  Jump bb6(v23, v24, v25)
bb6(v49:BasicObject, v50:BasicObject, v51:BasicObject):
  v73:Fixnum = GuardType v50, Fixnum
  v74:Fixnum = GuardType v51, Fixnum
  v75:Fixnum = FixnumAdd v73, v74
  CheckInterrupts
  Return v75
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you’re not a fan of text HIR, here is an embedded clickable visualization of
HIR thanks to our former intern &lt;a href=&quot;https://aidenfoxivey.com/&quot;&gt;Aiden&lt;/a&gt; porting
Firefox’s &lt;a href=&quot;https://github.com/mozilla-spidermonkey/iongraph&quot;&gt;Iongraph&lt;/a&gt;:&lt;/p&gt;

&lt;iframe width=&quot;100%&quot; height=&quot;400&quot; src=&quot;/assets/zjit-multi-entry-iongraph.html&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;(You might have to scroll sideways and down and zoom around. Or you can &lt;a href=&quot;/assets/zjit-multi-entry-iongraph.html&quot;&gt;open it
in its own window&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;Each entry block also comes with block parameters which mirror the function’s
parameters. These get passed in (roughly) the System V ABI registers.&lt;/p&gt;

&lt;p&gt;This is kind of gross. We have to handle these blocks specially in reverse
post-order (RPO) graph traversal. And, recently, I ran into an even worse case
when trying to implement the Cooper-style “engineered” dominator algorithm: if
we walk backwards in block dominators, the walk is not guaranteed to converge.
All non-entry blocks are dominated by all entry blocks, which are only
dominated by themselves. There is no one “start block”. So what is there to do?&lt;/p&gt;

&lt;h2 id=&quot;the-design-conundrum&quot;&gt;The design conundrum&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Approach 1&lt;/strong&gt; is to keep everything as-is, but handle entry blocks specially
in the dominator algorithm too. I’m not exactly sure what would be needed, but
it seems possible. Most of the existing block infra could be left alone, but
it’s not clear how much this would “spread” within the compiler. What else in
the future might need to be handled specially?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach 2&lt;/strong&gt; is to synthesize a super-entry block and make it a predecessor
of every interpreter and JIT entry block. Inside this approach there are two
ways to do it: one (&lt;strong&gt;2.a&lt;/strong&gt;) is to fake it and report some non-existent block.
Another (&lt;strong&gt;2.b&lt;/strong&gt;) is to actually make a block and a new instruction that is a
quasi-jump instruction. In this approach, we would either need to synthesize
fake block arguments for the JIT entry block parameters or add some kind of new
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LoadArg&amp;lt;i&amp;gt;&lt;/code&gt; instruction that reads the argument &lt;em&gt;i&lt;/em&gt; passed in.&lt;/p&gt;

&lt;p&gt;(suggested by Iain Ireland, as seen in the IBM COBOL compiler)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach 3&lt;/strong&gt; is to duplicate the entire CFG per entrypoint. This would return
us to having one entry block per CFG at the expense of code duplication. It
handles the problem pretty cleanly but then &lt;em&gt;forces&lt;/em&gt; code duplication. I think
I want the duplication to be opt-in instead of having it be the only way we
support multiple entrypoints. What if it increases memory too much? The
specialization probably would make the generated code faster, though.&lt;/p&gt;

&lt;p&gt;(suggested by Ben Titzer)&lt;/p&gt;

&lt;p&gt;None of these approaches feel great to me. The probable candidate is &lt;strong&gt;2.b&lt;/strong&gt;
where we have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LoadArg&lt;/code&gt; instructions. That gives us flexibility to also later
add full specialization without forcing it.&lt;/p&gt;

&lt;p&gt;Cameron Zwarich also notes that this this is an analogue to the common problem
people have when implementing the reverse: postdominators. This is because
often functions have multiple return IR instructions. He notes the usual
solution is to transform them into branches to a single return instruction.&lt;/p&gt;

&lt;p&gt;Do you have this problem? What does your compiler do?&lt;/p&gt;

&lt;h2 id=&quot;update-a-conclusion&quot;&gt;Update: a conclusion&lt;/h2&gt;

&lt;p&gt;We have decided to go with &lt;a href=&quot;https://github.com/ruby/ruby/pull/16200&quot;&gt;the superblock
approach&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:ebb&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;We use extended basic blocks (EBBs), but this doesn’t matter for this
post. It makes dominators and predecessors slightly more complicated (now
you have dominating &lt;em&gt;instructions&lt;/em&gt;), but that’s about it as far as I can
tell. We’ll see how they fare in the face of more complicated analysis
later. &lt;a href=&quot;#fnref:ebb&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:keywords&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Keyword parameters have some mix of caller/callee presence checks
in the callee because they are passed in un-ordered. The caller handles
simple constant defaults whereas the callee handles anything that may
raise. Check out &lt;a href=&quot;https://kddnewton.com/2022/12/17/advent-of-yarv-part-17&quot;&gt;Kevin Newton’s awesome overview&lt;/a&gt;. &lt;a href=&quot;#fnref:keywords&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
            <pubDate>Thu, 22 Jan 2026 00:00:00 +0000</pubDate>
            <niceDate>January 22, 2026</niceDate>
            <link>https://bernsteinbear.com/blog/multiple-entry/?utm_source=rss</link>
            <guid isPermaLink="true">https://bernsteinbear.com/blog/multiple-entry/</guid>
        </item>
        
        <item>
            <title>The GDB JIT interface</title>
            <description>&lt;p&gt;GDB is great for stepping through machine code to figure out what is going on.
It uses debug information under the hood to present you with a tidy backtrace
and also determine how much machine code to print when you type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;disassemble&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This debug information comes from your compiler. Clang, GCC, rustc, etc all
produce debug data in a format called &lt;a href=&quot;https://dwarfstd.org/&quot;&gt;DWARF&lt;/a&gt; and then embed that debug
information inside the binary (ELF, Mach-O, …) when you do &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-ggdb&lt;/code&gt; or
equivalent.&lt;/p&gt;

&lt;p&gt;Unfortunately, this means that by default, GDB has no idea what is going on if
you break in a JIT-compiled function. You can step instruction-by-instruction
and whatnot, but that’s about it. This is because the current instruction
pointer is nowhere to be found in any of the existing debug info tables from
the host runtime code, so your terminal is filled with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;???&lt;/code&gt;. See this example
from the V8 docs:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;#8  0x08281674 in v8::internal::Runtime_SetProperty (args=...) at src/runtime.cc:3758
#9  0xf5cae28e in ?? ()
#10 0xf5cc3a0a in ?? ()
#11 0xf5cc38f4 in ?? ()
#12 0xf5cbef19 in ?? ()
#13 0xf5cb09a2 in ?? ()
#14 0x0809e0a5 in v8::internal::Invoke (...) at src/execution.cc:97
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Fortunately, there is a &lt;em&gt;JIT interface&lt;/em&gt; to GDB. If you implement a couple of
functions in your JIT and run them every time you finish compiling a function,
you can get the debugging niceties for your JIT code too. See again a V8
example:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;#6  0x082857fc in v8::internal::Runtime_SetProperty (args=...) at src/runtime.cc:3758
#7  0xf5cae28e in ?? ()
#8  0xf5cc3a0a in loop () at test.js:6
#9  0xf5cc38f4 in test.js () at test.js:13
#10 0xf5cbef19 in ?? ()
#11 0xf5cb09a2 in ?? ()
#12 0x0809e1f9 in v8::internal::Invoke (...) at src/execution.cc:97
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Unfortunately, the GDB docs are &lt;a href=&quot;https://sourceware.org/gdb/current/onlinedocs/gdb.html/JIT-Interface.html&quot;&gt;somewhat sparse&lt;/a&gt;. So I went
spelunking through a bunch of different projects to try and understand what is
going on.&lt;/p&gt;

&lt;h2 id=&quot;the-big-picture-and-the-old-interface&quot;&gt;The big picture (and the old interface)&lt;/h2&gt;

&lt;p&gt;GDB expects your runtime to expose a function called
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;__jit_debug_register_code&lt;/code&gt; and a global variable called
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;__jit_debug_descriptor&lt;/code&gt;. GDB automatically adds its own internal breakpoints
at this function, if it exists. Then, when you compile code, you call this
function from your runtime.&lt;/p&gt;

&lt;p&gt;In slightly more detail:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Compile a function in your JIT compiler. This gives you a function name,
maybe other metadata, an executable code address, and a code size&lt;/li&gt;
  &lt;li&gt;Generate an &lt;em&gt;entire&lt;/em&gt; ELF/Mach-O/… object in-memory (!) for that one
function, describing its name, code region, maybe other DWARF metadata such
as line number maps&lt;/li&gt;
  &lt;li&gt;Write a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jit_code_entry&lt;/code&gt; linked list node that points at your object
(“symfile”)&lt;/li&gt;
  &lt;li&gt;Link it into the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;__jit_debug_descriptor&lt;/code&gt; linked list&lt;/li&gt;
  &lt;li&gt;Call &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;__jit_debug_register_code&lt;/code&gt;, which gives GDB control of the process so it can
pick up the new function’s metadata&lt;/li&gt;
  &lt;li&gt;Optionally, break into (or crash inside) one of your JITed functions&lt;/li&gt;
  &lt;li&gt;At some point, later, when your function gets GCed, unregister your code by
editing the linked list and calling &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;__jit_debug_register_code&lt;/code&gt; again&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is why you see compiler projects such as V8 including large swaths of code
just to make object files:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/v8/v8/blob/5668ed57de1c7c8dd5c3dc1598bf071e17d29c8c/src/diagnostics/gdb-jit.cc&quot;&gt;V8&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/facebookincubator/cinderx/blob/e6e925b20e6fa3fe1e100f147e1c8cd03076ebfb/cinderx/Jit/jit_gdb_support.cpp&quot;&gt;Cinder&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/zendtech/php-src/blob/f82e5b3abe1ff1d3ffc7954b0810bc584fd650a5/ext/opcache/jit/zend_jit_gdb.c#L473&quot;&gt;Zend PHP&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/dotnet/runtime/blob/3c040478f19e0f317790acab05dbe3ada9f52dc4/src/coreclr/vm/gdbjit.cpp&quot;&gt;CoreCLR/.NET&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/qemu/qemu/blob/942b0d378a1de9649085ad6db5306d5b8cef3591/tcg/tcg.c#L7064&quot;&gt;QEMU&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/WebKit/WebKit/blob/0afc2a867ab45651ac6c353c7b6ade5482b7bba7/Source/JavaScriptCore/jit/GdbJIT.cpp&quot;&gt;JavaScriptCore&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/LuaJIT/LuaJIT/blob/7152e15489d2077cd299ee23e3d51a4c599ab14f/src/lj_gdbjit.c&quot;&gt;LuaJIT&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/LineageOS/android_art/blob/8ce603e0c68899bdfbc9cd4c50dcc65bbf777982/runtime/jit/debugger_interface.cc#L187&quot;&gt;ART&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;which looks like it does something smart about grouping the JIT code
entries together (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RepackEntries&lt;/code&gt;), but I’m not sure exactly what it does&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/facebook/hhvm/blob/b1c47dcfbc574b508fd084f27ba4a06bcf4ba188/hphp/runtime/vm/debug/elfwriter.cpp#L622&quot;&gt;HHVM&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/TomatOrg/TomatoDotNet/blob/80266bb8dc0e7f0644f0638ecd98dfad4fb74427/src/dotnet/jit/gdb.c&quot;&gt;TomatoDotNet&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/jatovm/jato/blob/bb1c7d4fd987e016b2e0379182c4bfbb8c1c1a78/jit/elf.c#L164&quot;&gt;Jato JVM&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://gist.github.com/yyny/4a012029b5889853c18b1efc19bb598e&quot;&gt;a minimal example&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/sisshiki1969/jit-debug/blob/213c72512761f815fc0b067ce68ee0ae12962e2a/src/main.rs&quot;&gt;monoruby&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/mono/mono/blob/0f53e9e151d92944cacab3e24ac359410c606df6/mono/mini/dwarfwriter.c&quot;&gt;Mono&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;It looks like Dart &lt;a href=&quot;https://github.com/dart-lang/sdk/commit/c4238c71da13d61ff32332058d371c5b2e92694b&quot;&gt;used to&lt;/a&gt;
have support for this but has since removed it&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/bytecodealliance/wasmtime/blob/b5272a5f103053f5ada2a38d5302a8d1e2de442d/crates/wasmtime/src/runtime/code_memory.rs#L509&quot;&gt;wasmtime&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because this is a huge hassle, GDB also has a newer interface that does not
require making an ELF/Mach-O/…+DWARF object.&lt;/p&gt;

&lt;h2 id=&quot;custom-debug-info-the-new-interface&quot;&gt;Custom debug info (the new interface)&lt;/h2&gt;

&lt;p&gt;This new interface requires writing a binary format of your choice. You make
the writer and you make the reader. Then, when you are in GDB, you load your
reader as a shared object.&lt;/p&gt;

&lt;p&gt;The reader must implement &lt;a href=&quot;https://sourceware.org/gdb/current/onlinedocs/gdb.html/Writing-JIT-Debug-Info-Readers.html#Writing-JIT-Debug-Info-Readers&quot;&gt;the interface specified by GDB&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;GDB_DECLARE_GPL_COMPATIBLE_READER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;extern&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gdb_reader_funcs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;gdb_init_reader&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gdb_reader_funcs&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;cm&quot;&gt;/* Must be set to GDB_READER_INTERFACE_VERSION.  */&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;reader_version&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;cm&quot;&gt;/* For use by the reader.  */&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;priv_data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;gdb_read_debug_info&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;read&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;gdb_unwind_frame&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unwind&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;gdb_get_frame_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_frame_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;gdb_destroy_reader&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;destroy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;read&lt;/code&gt; function pointer does the bulk of the work and is responsible for
matching code ranges to function names, line numbers, and more.&lt;/p&gt;

&lt;p&gt;Here are &lt;a href=&quot;https://pwparchive.wordpress.com/2011/11/20/new-jit-interface-for-gdb/&quot;&gt;some details from Sanjoy Das&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Only a few runtimes implement this interface. Most of them stub out the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unwind&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;get_frame_id&lt;/code&gt; function pointers:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/ykjit/yk/blob/755e533aa74ef5fa82a6586147727e23146b95fc/ykrt/src/compile/jitc_yk/gdb.rs#L216&quot;&gt;yk write&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://github.com/ykjit/yk/blob/755e533aa74ef5fa82a6586147727e23146b95fc/ykrt/yk_gdb_plugin/yk_gdb_plugin.c#L22&quot;&gt;yk read&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/tetzank/asmjit-utilities/blob/2fdbb99f7e002df4f8d7aa97c29910743adfc991/gdb/gdbjit.cpp&quot;&gt;asmjit-utilities write&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://github.com/tetzank/asmjit-utilities/blob/2fdbb99f7e002df4f8d7aa97c29910743adfc991/gdb/jit-reader/gdbjit-reader.c&quot;&gt;asmjit-utilities read&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/erlang/otp/blob/28a44634fb04b95ea666abb8aac7254e2c87ae05/erts/emulator/beam/jit/beam_jit_metadata.cpp#L123&quot;&gt;Erlang/OTP write&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://github.com/erlang/otp-gdb-tools/blob/7b864f58c534699e4124e31ecfda86041b941037/jit-reader.c&quot;&gt;Erlang/OTP read&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/FEX-Emu/FEX/blob/c8d72eabe589392b962bec94d002c5ffdb7381c2/FEXCore/Source/Interface/GDBJIT/GDBJIT.cpp#L110&quot;&gt;FEX write&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://github.com/FEX-Emu/FEX/blob/c8d72eabe589392b962bec94d002c5ffdb7381c2/Source/Tools/FEXGDBReader/FEXGDBReader.cpp#L8&quot;&gt;FEX read&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/bullno1/buxn-jit/blob/69effb96d5fe9725258fe367efcefd6911ef32fd/src/gdb/hook.c&quot;&gt;buxn-jit write&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://github.com/bullno1/buxn-jit/blob/69effb96d5fe9725258fe367efcefd6911ef32fd/src/gdb/reader.c&quot;&gt;buxn-jit read&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/KreitinnSoftware/box64/blob/f224a93cc83f9da34bc85ebb5414168d476a135d/src/tools/gdbjit.c#L45&quot;&gt;box64 write&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://github.com/KreitinnSoftware/box64/blob/f224a93cc83f9da34bc85ebb5414168d476a135d/gdbjit/reader.c&quot;&gt;box64 read&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/no-defun-allowed/ccl/blob/094a9ec5bf203db118e0ffc8ce2b5b80fc1c91dd/lisp-kernel/gdb.c&quot;&gt;ccl write&lt;/a&gt; &lt;br /&gt;
&lt;a href=&quot;https://gist.github.com/no-defun-allowed/32d38c5e664586c724cf2e0e97f0d2b1&quot;&gt;ccl read&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I think it also requires at least the reader to proclaim it is GPL via the
macro &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GDB_DECLARE_GPL_COMPATIBLE_READER&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Since I wrote about the &lt;a href=&quot;/blog/jit-perf-map/&quot;&gt;perf map interface&lt;/a&gt; recently, I
have it on my mind. Why can’t we reuse it in GDB?&lt;/p&gt;

&lt;h2 id=&quot;adapting-to-the-linux-perf-interface&quot;&gt;Adapting to the Linux perf interface&lt;/h2&gt;

&lt;p&gt;I suppose it would be possible to try and upstream a patch to GDB to support
the Linux perf map interface for JITs. After all, why shouldn’t it be able to
automatically pick up symbols from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/tmp/perf-...&lt;/code&gt;? That would be great
baseline debug info for “free”.&lt;/p&gt;

&lt;p&gt;In the meantime, maybe it is reasonable to create a re-usable custom debug
reader:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;When registering code, write the address and name to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/tmp/perf-...&lt;/code&gt; as you normally would&lt;/li&gt;
  &lt;li&gt;Write the filename as the symfile (does this make &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/tmp&lt;/code&gt; the magic number?)&lt;/li&gt;
  &lt;li&gt;Have the debug info reader just parse the perf map file&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It would be less flexible than both the DWARF and custom readers support: it
would only be able to handle filename and code region. No embedding source code
for GDB to display in your debugger. But maybe that is okay for a partial
solution?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; Here is &lt;a href=&quot;https://github.com/tekknolagi/gdb-jit-linux-perf-map&quot;&gt;my small attempt&lt;/a&gt;
at such a plugin.&lt;/p&gt;

&lt;h2 id=&quot;the-n-squared-problem&quot;&gt;The n-squared problem&lt;/h2&gt;

&lt;p&gt;V8 notes in their &lt;a href=&quot;https://v8.dev/docs/gdb-jit&quot;&gt;GDB JIT docs&lt;/a&gt; that because the JIT interface is
a linked list and we only keep a pointer to the head, we get O(n&lt;sup&gt;2&lt;/sup&gt;)
behavior. Bummer. This becomes especially noticeable since they register
additional code objects not just for functions, but also trampolines, cache
stubs, etc.&lt;/p&gt;

&lt;h2 id=&quot;garbage-collection&quot;&gt;Garbage collection&lt;/h2&gt;

&lt;p&gt;Since GDB expects the code pointer in your symbol object file not to move, you
have to make sure to have a stable symbol file pointer and stable executable
code pointer. To make this happen, V8 disables its moving GC.&lt;/p&gt;

&lt;p&gt;Additionally, if your compiled function gets collected, you have to make sure
to unregister the function. Instead of doing this eagerly, ART treats the GDB
JIT linked list as a weakref and periodically removes dead code entries from
it.&lt;/p&gt;
</description>
            <pubDate>Tue, 30 Dec 2025 00:00:00 +0000</pubDate>
            <niceDate>December 30, 2025</niceDate>
            <link>https://bernsteinbear.com/blog/gdb-jit/?utm_source=rss</link>
            <guid isPermaLink="true">https://bernsteinbear.com/blog/gdb-jit/</guid>
        </item>
        
    </channel>
</rss>
