Table of Contents
Fetching ...

SoK: Where to Fuzz? Assessing Target Selection Methods in Directed Fuzzing

Felix Weissberg, Jonas Möller, Tom Ganz, Erik Imgrund, Lukas Pirch, Lukas Seidel, Moritz Schloegel, Thorsten Eisenhofer, Konrad Rieck

TL;DR

The paper tackles the problem of where to fuzz in directed fuzzing by framing target selection as an information retrieval task. It systematically surveys 25 directed-fuzzing papers, develops a taxonomy of information sources, scoring mechanisms, and granularity, and evaluates target-selection methods on a ground-truth crash corpus of over 1,600 OSS-Fuzz crashes using an IR-based metric (NDCG). Key findings show that simple software metrics often outperform pattern-based heuristics and that large language models like CodeT5+ can approach this efficacy, highlighting target selection as a distinct lever to boost fuzzing performance. The work provides practical guidance for selecting targets and suggests that ML-based approaches may further enhance directed fuzzing, while offering publicly available data and artifacts to spur future research.

Abstract

A common paradigm for improving fuzzing performance is to focus on selected regions of a program rather than its entirety. While previous work has largely explored how these locations can be reached, their selection, that is, the where, has received little attention so far. A common paradigm for improving fuzzing performance is to focus on selected regions of a program rather than its entirety. While previous work has largely explored how these locations can be reached, their selection, that is, the where, has received little attention so far. In this paper, we fill this gap and present the first comprehensive analysis of target selection methods for fuzzing. To this end, we examine papers from leading security and software engineering conferences, identifying prevalent methods for choosing targets. By modeling these methods as general scoring functions, we are able to compare and measure their efficacy on a corpus of more than 1,600 crashes from the OSS-Fuzz project. Our analysis provides new insights for target selection in practice: First, we find that simple software metrics significantly outperform other methods, including common heuristics used in directed fuzzing, such as recently modified code or locations with sanitizer instrumentation. Next to this, we identify language models as a promising choice for target selection. In summary, our work offers a new perspective on directed fuzzing, emphasizing the role of target selection as an orthogonal dimension to improve performance.

SoK: Where to Fuzz? Assessing Target Selection Methods in Directed Fuzzing

TL;DR

The paper tackles the problem of where to fuzz in directed fuzzing by framing target selection as an information retrieval task. It systematically surveys 25 directed-fuzzing papers, develops a taxonomy of information sources, scoring mechanisms, and granularity, and evaluates target-selection methods on a ground-truth crash corpus of over 1,600 OSS-Fuzz crashes using an IR-based metric (NDCG). Key findings show that simple software metrics often outperform pattern-based heuristics and that large language models like CodeT5+ can approach this efficacy, highlighting target selection as a distinct lever to boost fuzzing performance. The work provides practical guidance for selecting targets and suggests that ML-based approaches may further enhance directed fuzzing, while offering publicly available data and artifacts to spur future research.

Abstract

A common paradigm for improving fuzzing performance is to focus on selected regions of a program rather than its entirety. While previous work has largely explored how these locations can be reached, their selection, that is, the where, has received little attention so far. A common paradigm for improving fuzzing performance is to focus on selected regions of a program rather than its entirety. While previous work has largely explored how these locations can be reached, their selection, that is, the where, has received little attention so far. In this paper, we fill this gap and present the first comprehensive analysis of target selection methods for fuzzing. To this end, we examine papers from leading security and software engineering conferences, identifying prevalent methods for choosing targets. By modeling these methods as general scoring functions, we are able to compare and measure their efficacy on a corpus of more than 1,600 crashes from the OSS-Fuzz project. Our analysis provides new insights for target selection in practice: First, we find that simple software metrics significantly outperform other methods, including common heuristics used in directed fuzzing, such as recently modified code or locations with sanitizer instrumentation. Next to this, we identify language models as a promising choice for target selection. In summary, our work offers a new perspective on directed fuzzing, emphasizing the role of target selection as an orthogonal dimension to improve performance.

Paper Structure

This paper contains 16 sections, 2 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Target selection. The initial step in target selection is the extraction of code locations from the SUT (Step ❶). The granularity of this extraction depends on the specifics of the selection method and could, for example, be on a function-level or basic-block level. After extraction, the locations and (optionally) the SUT or external information (e.g., code change timestamps) are forwarded to the target selection method. This method then assigns a score to each location (Step ❷), which is either (a) continuous or (b) discrete. Finally, the fuzzer utilizes the annotated code locations for guidance (Step ❸).
  • Figure 2: Overview of a retrieval.We compute a ranking using target selection method $\rho$ which assigns each function $\mathsf{f} \in \mathbf{F}$ a relevance score $\hat{r}_\mathsf{f}$. To measure the quality of target selection method, we compute the $NDCG_k$ for a retrieval $F$ with cardinality $k$. As ground truth, we use the oracle $\mathrm{O}$ to assign relevance scores $r_\mathsf{f}$ to each location $\mathsf{f} \in \mathbf{F}$.
  • Figure 3: Dataset generation process.As basis for our analysis, we collect $1,621$ reproducible crashes. We crawl OSS-Fuzz, which yields a crashing input and a fuzzing configuration for a project (❶). To reproduce the crash, we search for a commit of the project which crashes (❷) when executed under the input from OSS-Fuzz (❸ ). Once we reproduce a crash, we extract functions from the project's code and label them according to the stack trace (❹).
  • Figure 4: Crash types.We show the top ten crash types in our ground truth dataset.
  • Figure 5: Stack trace lengths.We show the frequency of stack trace lengths from reproduced crashes. We observe the dominant peak at 7 functions per stack trace.
  • ...and 4 more figures