SoK: Where to Fuzz? Assessing Target Selection Methods in Directed Fuzzing
Felix Weissberg, Jonas Möller, Tom Ganz, Erik Imgrund, Lukas Pirch, Lukas Seidel, Moritz Schloegel, Thorsten Eisenhofer, Konrad Rieck
TL;DR
The paper tackles the problem of where to fuzz in directed fuzzing by framing target selection as an information retrieval task. It systematically surveys 25 directed-fuzzing papers, develops a taxonomy of information sources, scoring mechanisms, and granularity, and evaluates target-selection methods on a ground-truth crash corpus of over 1,600 OSS-Fuzz crashes using an IR-based metric (NDCG). Key findings show that simple software metrics often outperform pattern-based heuristics and that large language models like CodeT5+ can approach this efficacy, highlighting target selection as a distinct lever to boost fuzzing performance. The work provides practical guidance for selecting targets and suggests that ML-based approaches may further enhance directed fuzzing, while offering publicly available data and artifacts to spur future research.
Abstract
A common paradigm for improving fuzzing performance is to focus on selected regions of a program rather than its entirety. While previous work has largely explored how these locations can be reached, their selection, that is, the where, has received little attention so far. A common paradigm for improving fuzzing performance is to focus on selected regions of a program rather than its entirety. While previous work has largely explored how these locations can be reached, their selection, that is, the where, has received little attention so far. In this paper, we fill this gap and present the first comprehensive analysis of target selection methods for fuzzing. To this end, we examine papers from leading security and software engineering conferences, identifying prevalent methods for choosing targets. By modeling these methods as general scoring functions, we are able to compare and measure their efficacy on a corpus of more than 1,600 crashes from the OSS-Fuzz project. Our analysis provides new insights for target selection in practice: First, we find that simple software metrics significantly outperform other methods, including common heuristics used in directed fuzzing, such as recently modified code or locations with sanitizer instrumentation. Next to this, we identify language models as a promising choice for target selection. In summary, our work offers a new perspective on directed fuzzing, emphasizing the role of target selection as an orthogonal dimension to improve performance.
