Table of Contents
Fetching ...

Human in the Loop for Fuzz Testing: Literature Review and the Road Ahead

Jiongchi Yu, Xiaolin Wen, Sizhe Cheng, Xiaofei Xie, Qiang Hu, Yong Wang

Abstract

Fuzz testing is one of the most effective techniques for detecting bugs and vulnerabilities in software. However, as the basis of fuzz testing, automated heuristics often fail to uncover deep or complex vulnerabilities. As a result, the performance of fuzz testing remains limited. One promising way to address this limitation is to integrate human expert guidance into the paradigm of fuzz testing. Even though some works have been proposed in this direction, there is still a lack of a systematic research roadmap for combining Human-in-the-Loop (HITL) and fuzz testing, hindering the potential for further enhancing fuzzing effectiveness. To bridge this gap, this paper outlines a forward-looking research roadmap for HITL for fuzz testing. Specifically, we highlight the promise of visualization techniques for interpretable fuzzing processes, as well as on-the-fly interventions that enable experts to guide fuzzing toward hard-to-reach program behaviors. Moreover, the rise of Large Language Models (LLMs) introduces new opportunities and challenges, raising questions about how humans can efficiently provide actionable knowledge, how expert meta-knowledge can be leveraged, and what roles humans should play in the intelligent fuzzing loop with LLMs. To address these questions, we survey existing work on HITL fuzz testing and propose a research agenda emphasizing future opportunities in (1) human monitoring, (2) human steering, and (3) human-LLM collaboration. We call for a paradigm shift toward interactive, human-guided fuzzing systems that integrate expert insight with AI-powered automation in the next-generation fuzzing ecosystem.

Human in the Loop for Fuzz Testing: Literature Review and the Road Ahead

Abstract

Fuzz testing is one of the most effective techniques for detecting bugs and vulnerabilities in software. However, as the basis of fuzz testing, automated heuristics often fail to uncover deep or complex vulnerabilities. As a result, the performance of fuzz testing remains limited. One promising way to address this limitation is to integrate human expert guidance into the paradigm of fuzz testing. Even though some works have been proposed in this direction, there is still a lack of a systematic research roadmap for combining Human-in-the-Loop (HITL) and fuzz testing, hindering the potential for further enhancing fuzzing effectiveness. To bridge this gap, this paper outlines a forward-looking research roadmap for HITL for fuzz testing. Specifically, we highlight the promise of visualization techniques for interpretable fuzzing processes, as well as on-the-fly interventions that enable experts to guide fuzzing toward hard-to-reach program behaviors. Moreover, the rise of Large Language Models (LLMs) introduces new opportunities and challenges, raising questions about how humans can efficiently provide actionable knowledge, how expert meta-knowledge can be leveraged, and what roles humans should play in the intelligent fuzzing loop with LLMs. To address these questions, we survey existing work on HITL fuzz testing and propose a research agenda emphasizing future opportunities in (1) human monitoring, (2) human steering, and (3) human-LLM collaboration. We call for a paradigm shift toward interactive, human-guided fuzzing systems that integrate expert insight with AI-powered automation in the next-generation fuzzing ecosystem.
Paper Structure (22 sections, 3 figures, 1 table)

This paper contains 22 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: A typical workflow of five key phases for fuzz testing.
  • Figure 2: Distribution of collected articles. The left panel shows the temporal trend of published works, grouped by year. The right panel details how articles on HITL for fuzzing are distributed across different fuzzing stages, further distinguishing between human monitoring and human steering roles. Note that the totals may exceed the overall number of reviewed papers, since some works contribute to multiple phases of human monitoring or human steering.
  • Figure 3: The interaction design space for human involvement in fuzzing across three stages. Numbered circles mark six future research directions and their corresponding phases in the fuzzing pipeline. Gray arrows denote traditional offline feedback after execution, while black arrows denote in-situ runtime intervention. Stage ii highlights the transition from offline post-execution analysis to real-time supervision and steering.