Risk Estimation in Differential Fuzzing via Extreme Value Theory
Rafael Baez, Alejandro Olivas, Nathan K. Diamond, Marcelo Frias, Yannic Noller, Saeid Tizpaz-Niari
TL;DR
This work tackles the problem of estimating the risk of missing large differential outputs in differential fuzzing by leveraging Extreme Value Theory (EVT). It introduces a tail-centered framework that uses a thresholded Generalized Pareto Distribution over exceedances and a Generalized Extreme Value extrapolation to forecast return levels for a future fuzzing horizon, complemented by Statistical Testing of Tail to validate tail assumptions. The approach, validated on real-world Java libraries and the DifFuzz side-channel analysis, shows EVT-based extrapolations outperform traditional concentration inequalities and Bayes-based methods in a substantial share of benchmarks, and yields substantial runtime savings through early stopping. The findings highlight the practical potential of EVT to provide principled guarantees for worst-case outcomes in dynamic fuzzing campaigns, with implications for reliable vulnerability discovery and efficient fuzzing workflows.
Abstract
Differential testing is a highly effective technique for automatically detecting software bugs and vulnerabilities when the specifications involve an analysis over multiple executions simultaneously. Differential fuzzing, in particular, operates as a guided randomized search, aiming to find (similar) inputs that lead to a maximum difference in software outputs or their behaviors. However, fuzzing, as a dynamic analysis, lacks any guarantees on the absence of bugs: from a differential fuzzing campaign that has observed no bugs (or a minimal difference), what is the risk of observing a bug (or a larger difference) if we run the fuzzer for one or more steps? This paper investigates the application of Extreme Value Theory (EVT) to address the risk of missing or underestimating bugs in differential fuzzing. The key observation is that differential fuzzing as a random process resembles the maximum distribution of observed differences. Hence, EVT, a branch of statistics dealing with extreme values, is an ideal framework to analyze the tail of the differential fuzzing campaign to contain the risk. We perform experiments on a set of real-world Java libraries and use differential fuzzing to find information leaks via side channels in these libraries. We first explore the feasibility of EVT for this task and the optimal hyperparameters for EVT distributions. We then compare EVT-based extrapolation against baseline statistical methods like Markov's as well as Chebyshev's inequalities, and the Bayes factor. EVT-based extrapolations outperform the baseline techniques in 14.3% of cases and tie with the baseline in 64.2% of cases. Finally, we evaluate the accuracy and performance gains of EVT-enabled differential fuzzing in real-world Java libraries, where we reported an average saving of tens of millions of bytecode executions by an early stop.
