Table of Contents
Fetching ...

Risk Estimation in Differential Fuzzing via Extreme Value Theory

Rafael Baez, Alejandro Olivas, Nathan K. Diamond, Marcelo Frias, Yannic Noller, Saeid Tizpaz-Niari

TL;DR

This work tackles the problem of estimating the risk of missing large differential outputs in differential fuzzing by leveraging Extreme Value Theory (EVT). It introduces a tail-centered framework that uses a thresholded Generalized Pareto Distribution over exceedances and a Generalized Extreme Value extrapolation to forecast return levels for a future fuzzing horizon, complemented by Statistical Testing of Tail to validate tail assumptions. The approach, validated on real-world Java libraries and the DifFuzz side-channel analysis, shows EVT-based extrapolations outperform traditional concentration inequalities and Bayes-based methods in a substantial share of benchmarks, and yields substantial runtime savings through early stopping. The findings highlight the practical potential of EVT to provide principled guarantees for worst-case outcomes in dynamic fuzzing campaigns, with implications for reliable vulnerability discovery and efficient fuzzing workflows.

Abstract

Differential testing is a highly effective technique for automatically detecting software bugs and vulnerabilities when the specifications involve an analysis over multiple executions simultaneously. Differential fuzzing, in particular, operates as a guided randomized search, aiming to find (similar) inputs that lead to a maximum difference in software outputs or their behaviors. However, fuzzing, as a dynamic analysis, lacks any guarantees on the absence of bugs: from a differential fuzzing campaign that has observed no bugs (or a minimal difference), what is the risk of observing a bug (or a larger difference) if we run the fuzzer for one or more steps? This paper investigates the application of Extreme Value Theory (EVT) to address the risk of missing or underestimating bugs in differential fuzzing. The key observation is that differential fuzzing as a random process resembles the maximum distribution of observed differences. Hence, EVT, a branch of statistics dealing with extreme values, is an ideal framework to analyze the tail of the differential fuzzing campaign to contain the risk. We perform experiments on a set of real-world Java libraries and use differential fuzzing to find information leaks via side channels in these libraries. We first explore the feasibility of EVT for this task and the optimal hyperparameters for EVT distributions. We then compare EVT-based extrapolation against baseline statistical methods like Markov's as well as Chebyshev's inequalities, and the Bayes factor. EVT-based extrapolations outperform the baseline techniques in 14.3% of cases and tie with the baseline in 64.2% of cases. Finally, we evaluate the accuracy and performance gains of EVT-enabled differential fuzzing in real-world Java libraries, where we reported an average saving of tens of millions of bytecode executions by an early stop.

Risk Estimation in Differential Fuzzing via Extreme Value Theory

TL;DR

This work tackles the problem of estimating the risk of missing large differential outputs in differential fuzzing by leveraging Extreme Value Theory (EVT). It introduces a tail-centered framework that uses a thresholded Generalized Pareto Distribution over exceedances and a Generalized Extreme Value extrapolation to forecast return levels for a future fuzzing horizon, complemented by Statistical Testing of Tail to validate tail assumptions. The approach, validated on real-world Java libraries and the DifFuzz side-channel analysis, shows EVT-based extrapolations outperform traditional concentration inequalities and Bayes-based methods in a substantial share of benchmarks, and yields substantial runtime savings through early stopping. The findings highlight the practical potential of EVT to provide principled guarantees for worst-case outcomes in dynamic fuzzing campaigns, with implications for reliable vulnerability discovery and efficient fuzzing workflows.

Abstract

Differential testing is a highly effective technique for automatically detecting software bugs and vulnerabilities when the specifications involve an analysis over multiple executions simultaneously. Differential fuzzing, in particular, operates as a guided randomized search, aiming to find (similar) inputs that lead to a maximum difference in software outputs or their behaviors. However, fuzzing, as a dynamic analysis, lacks any guarantees on the absence of bugs: from a differential fuzzing campaign that has observed no bugs (or a minimal difference), what is the risk of observing a bug (or a larger difference) if we run the fuzzer for one or more steps? This paper investigates the application of Extreme Value Theory (EVT) to address the risk of missing or underestimating bugs in differential fuzzing. The key observation is that differential fuzzing as a random process resembles the maximum distribution of observed differences. Hence, EVT, a branch of statistics dealing with extreme values, is an ideal framework to analyze the tail of the differential fuzzing campaign to contain the risk. We perform experiments on a set of real-world Java libraries and use differential fuzzing to find information leaks via side channels in these libraries. We first explore the feasibility of EVT for this task and the optimal hyperparameters for EVT distributions. We then compare EVT-based extrapolation against baseline statistical methods like Markov's as well as Chebyshev's inequalities, and the Bayes factor. EVT-based extrapolations outperform the baseline techniques in 14.3% of cases and tie with the baseline in 64.2% of cases. Finally, we evaluate the accuracy and performance gains of EVT-enabled differential fuzzing in real-world Java libraries, where we reported an average saving of tens of millions of bytecode executions by an early stop.

Paper Structure

This paper contains 16 sections, 4 equations, 4 figures, 5 tables, 3 algorithms.

Figures (4)

  • Figure 1: String equality in Apache WSS4J ( s1 secret, s2 public). The code snippet is the implementation for the secret comparison with a given public guess.
  • Figure 2: Overview Example. (a) The cost differences over training samples (the first 3,226 samples in DifFuzz). (b) The empirical tail distribution of DifFuzz. (c) m-return level plot of cost differences with expected values (and their 95% CI).
  • Figure 3: Temporal Plot of Prediction. We use the size of training (x-axis) to predict the max difference in the next 1,000 fuzzing iterations (green) as compared to the ground truth (red) with Exponential and PP distributions.
  • Figure 4: The Conceptual Diagram of Algorithm \ref{['alg:overall']}: Steps to infer the worst-case differences.

Theorems & Definitions (1)

  • Definition 4.1: Tail Distributions of Differential Fuzzing