Table of Contents
Fetching ...

Following Dragons: Code Review-Guided Fuzzing

Viet Hoang Luu, Amirmohammad Pasdar, Wachiraphan Charoenwet, Toby Murray, Shaanan Cohney, Van-Thuan Pham

TL;DR

EyeQ presents a workflow that converts developers' code-review insights into annotation-guided fuzzing to reach high-risk program states that conventional fuzzers often miss. The approach combines a human-feasible feasibility study with an automated LLM-powered pipeline to classify reviews, localize implicated code, and instrument it with IJON annotations for AFL++-based fuzzing. Across a PHP codebase, EyeQ demonstrates substantial improvements in vulnerability discovery over baseline fuzzing, uncovering dozens of previously unknown bugs and showing scalable generalization to new review data. The work highlights the value of leveraging code-review discourse as semantic guidance for dynamic analysis, while acknowledging limitations in localization accuracy and the potential for further gains via directed fuzzing and broader deployment.

Abstract

Modern fuzzers scale to large, real-world software but often fail to exercise the program states developers consider most fragile or security-critical. Such states are typically deep in the execution space, gated by preconditions, or overshadowed by lower-value paths that consume limited fuzzing budgets. Meanwhile, developers routinely surface risk-relevant insights during code review, yet this information is largely ignored by automated testing tools. We present EyeQ, a system that leverages developer intelligence from code reviews to guide fuzzing. EyeQ extracts security-relevant signals from review discussions, localizes the implicated program regions, and translates these insights into annotation-based guidance for fuzzing. The approach operates atop existing annotation-aware fuzzing, requiring no changes to program semantics or developer workflows. We first validate EyeQ through a human-guided feasibility study on a security-focused dataset of PHP code reviews, establishing a strong baseline for review-guided fuzzing. We then automate the workflow using a large language model with carefully designed prompts. EyeQ significantly improves vulnerability discovery over standard fuzzing configurations, uncovering more than 40 previously unknown bugs in the security-critical PHP codebase.

Following Dragons: Code Review-Guided Fuzzing

TL;DR

EyeQ presents a workflow that converts developers' code-review insights into annotation-guided fuzzing to reach high-risk program states that conventional fuzzers often miss. The approach combines a human-feasible feasibility study with an automated LLM-powered pipeline to classify reviews, localize implicated code, and instrument it with IJON annotations for AFL++-based fuzzing. Across a PHP codebase, EyeQ demonstrates substantial improvements in vulnerability discovery over baseline fuzzing, uncovering dozens of previously unknown bugs and showing scalable generalization to new review data. The work highlights the value of leveraging code-review discourse as semantic guidance for dynamic analysis, while acknowledging limitations in localization accuracy and the potential for further gains via directed fuzzing and broader deployment.

Abstract

Modern fuzzers scale to large, real-world software but often fail to exercise the program states developers consider most fragile or security-critical. Such states are typically deep in the execution space, gated by preconditions, or overshadowed by lower-value paths that consume limited fuzzing budgets. Meanwhile, developers routinely surface risk-relevant insights during code review, yet this information is largely ignored by automated testing tools. We present EyeQ, a system that leverages developer intelligence from code reviews to guide fuzzing. EyeQ extracts security-relevant signals from review discussions, localizes the implicated program regions, and translates these insights into annotation-based guidance for fuzzing. The approach operates atop existing annotation-aware fuzzing, requiring no changes to program semantics or developer workflows. We first validate EyeQ through a human-guided feasibility study on a security-focused dataset of PHP code reviews, establishing a strong baseline for review-guided fuzzing. We then automate the workflow using a large language model with carefully designed prompts. EyeQ significantly improves vulnerability discovery over standard fuzzing configurations, uncovering more than 40 previously unknown bugs in the security-critical PHP codebase.
Paper Structure (47 sections, 10 figures)

This paper contains 47 sections, 10 figures.

Figures (10)

  • Figure 1: Code review discussion on fiber stack protection strategies in PHP. Underlines highlight developer reasoning and domain knowledge expressed during the review.
  • Figure 2: End-to-end workflow of code review-guided fuzzing. The pipeline transforms security-relevant reviews into localized code annotations and uses annotation-aware fuzzing to guide exploration toward vulnerability-prone program behaviors.
  • Figure 3: System prompt used for Stage 1, Call 1. Coarse-grained security-relevant review filtering in EyeQLLM.
  • Figure 4: System prompt used for Stage 1, Call 2. Fine-grained CWE assignment using the filtered CWE context pack in EyeQLLM.
  • Figure 5: System prompt used for Stage 2, Step 1 candidate function selection using names and file paths in EyeQLLM.
  • ...and 5 more figures