Gray-Box Fuzzing via Gradient Descent and Boolean Expression Coverage (Technical Report)
Martin Jonáš, Jan Strejček, Marek Trtík, Lukáš Urban
TL;DR
This work presents FIzzer, a gray-box fuzzing framework that instrumentally monitors Boolean-converting instructions to enable gradient-descent-driven input generation. By replacing switches with Boolean instru-ments, collecting predicate values, and building an execution tree, FIzzer targets in-depth coverage of Boolean predicates via four interacting analyses: Sensitivity, Bitshare, Typed minimization, and Minimization. The approach leverages gradient-based search in both typed and untyped input spaces, with a sophisticated node-selection strategy, loop-head detection, and Monte Carlo exploration to progressively invert predicate evaluations and expand coverage. Experimental results on Test-Comp 2023 show competitive performance against tools that use symbolic execution or model checking, despite relying solely on gray-box fuzzing. The framework emphasizes precise data collection and a modular design to integrate multiple input-generation strategies and fast execution caching, enabling scalable coverage improvement in real-world programs.
Abstract
We present a novel gray-box fuzzing algorithm monitoring executions of instructions converting numerical values to Boolean ones. An important class of such instructions evaluate predicates, e.g., *cmp in LLVM. That alone allows us to infer the input dependency (c.f. the taint analysis) during the fuzzing on-the-fly with reasonable accuracy, which in turn enables an effective use of the gradient descent on these instructions (to invert the result of their evaluation). Although the fuzzing attempts to maximize the coverage of the instructions, there is an interesting correlation with the standard branch coverage, which we are able to achieve indirectly. The evaluation on Test-Comp 2023 benchmarks shows that our approach, despite being a pure gray-box fuzzing, is able to compete with the leading tools in the competition, which combine fuzzing with other powerful techniques like model checking, symbolic execution, or abstract interpretation.
