Using fuzz testing in physics research

2025-04-26 — Solano Felicio

First, thanks to those who talked to me after last post. Your help is appreciated! I’ve received tips on using Julia’s environments and Manifest.toml to get things working on other machines with little fuss; I’ll definitely check that out, although for now what I want is to make a library, not a standalone program. It should be easy to call from Python and C, easy to vendor, and so on. I was also pointed to Julia’s discourse forum, I’ll ask there when I get the time.

Now properly speaking of fuzz testing / property-based testing, Supposition.jl, and how that’s helping me do research.

If you don’t know what those are, here’s a quick recap:

Writing unit tests for your code takes long, is boring, and mostly catches the bugs you anticipate anyway, so it’s not a great investment of your time. Fuzz testing means that instead you have the computer randomly generate thousands of tests, exploring many possible combinations of inputs that you couldn’t think of. Traditionally, a fuzzer will just check if your program crashes or not, but you can also ask other kinds of questions to it. You can ask things like: is the output of this function always a valid input to that function? Is FooError the only possible exception raised by this method? Is this equation always satisfied?

Asking those questions is checking properties of your code, and that’s where the name “property-based testing” comes from. In pratice, to do this, you have to

  1. Tell the computer how to generate inputs to your code
  2. Tell it what properties the code should satisfy
  3. Let it do its thing. It will find counterexamples and automatically shrink them to give you the smallest, simplest one that reproduces the bug.

Finally, if you write the tests first and the code later, this is called property-driven development (like TDD, test-driven development, but smarter). And Supposition.jl is a Julia package designed for property-based testing.

I started using Supposition.jl this Tuesday (4 days ago).

I’m having a great time with it! It didn’t take long to learn, the API is clean. And it has found so many useful counterexamples that when a test passes I’m really confident that the property in question is satisfied.

It has found trivial and shallow bugs, where I just didn’t understand Julia well enough (for example, type instabilities). It has found important, subtle bugs in my code where the logic was wrong, but required just the right combination of inputs to see. And it has found bugs in the math itself: I’m implementing stuff from a paper and it seems that some of its mathematical claims are wrong, because Supposition.jl found counterexamples.

This is great because I would never have thought of those counterexamples in so little time, or at all. It led me to a rabbit hole of math that was ultimately very productive, because now I have a much deeper understanding of what I’m doing, I know what the authors of the mistaken papers missed, how to fix it, and even how to make it more practical. Overall, great investment of my time as a researcher.

And to emphasize: I didn’t spend this week playing with Supposition.jl! I learned all I needed to learn from around half an hour in total looking up things in the documentation, and that was it. All my brainpower went to fixing bugs and understanding my math, while the fuzzer was the one trying to find clever counterexamples. I’m honestly impressed with how well it worked and how easy it was. It finds bugs faster than you can fix them.

It looks like fuzz testing is useful not just for software but for research. I’m definitely going to put any general claims I make in a future publication through a fuzzer before submitting it. It’s a cheaper, faster version of having your local mathematician check your stuff to find counterexamples; it’s not as rigorous, but damn does it catch edge cases.

A good physicist, like a good programmer, always takes the time to check that their reasoning is correct, at least in the common case. But it’s rare to have the time to rigorously verify that every step is completely correct in every possible case. Given that we are not going to use formal mathematics to prove our stuff correct anyway, fuzz testing is the cheapest, most effective option to find those mistakes. And I’m glad I’m using it.

tags: software, research