You can use C-Reduce for any language

November 15, 2024

C-Reduce is a tool by Regehr and friends for minimizing C compiler bug reproducers. Imagine if you had a 10,000 line long C file that triggered a Clang bug. You don’t want to send a massive blob to the compiler developers because that’s unhelpful, but you also don’t want to cut it down to size by hand. The good news is that C-Reduce can do that for you. The bad news is that everyone thinks it only works for C.

It’s pretty widely applicable. You only need:

I ran into a bug with RustPython running scrapscript and wanted to report it. So I ran wrote a script interesting.sh to reproduce the bug:

#!/bin/bash
# No -o pipefail; we don't want rustpython failures to cause the script to fail
set -eu

# Note the absolute path to the binary, which is not in $PATH
/path/to/RustPython/target/release/rustpython scrapscript.py 2>&1 | grep \
    "tried to push value onto stack but overflowed max_stackdepth"

And then I ran C-Reduce. This all happened within a couple of seconds:

$ creduce --not-c interesting.sh scrapscript.py
===< 2263604 >===
running 4 interestingness tests in parallel
===< pass_blank :: 0 >===
(0.5 %, 200799 bytes)
(0.6 %, 200607 bytes)
===< pass_lines :: 0 >===
(9.2 %, 183225 bytes)
(18.1 %, 165228 bytes)
(26.5 %, 148382 bytes)
(29.3 %, 142674 bytes)
(34.6 %, 131961 bytes)
(38.1 %, 124960 bytes)
(40.6 %, 119872 bytes)
(42.3 %, 116504 bytes)
(44.4 %, 112161 bytes)
(46.4 %, 108180 bytes)
(47.5 %, 105950 bytes)
...

What you see is C-Reduce cutting down the file by 50% nearly instantly… and I don’t even have a very fast computer.

We use --not-c because otherwise C-Reduce uses a bunch of C-specific passes. If we’re working on Python, it will likely just slow things down (but not materially change the outcome).

There you have it. Fast and easy. As I finish typing these next couple of sentences, we’re already at 96.9% reduced.

See also

Other implementations of/similar to C-Reduce:

I saw somewhere (but can no longer find the link to) someone’s implementation of delta debugging on commit logs (more advanced than git bisect). If you find this, please send it my way.

  1. Or a looping wrapper that you can use to probabilistically use to fake it