Table of Contents
Fetching ...

Fuzzing with Agents? Generators Are All You Need

Vasudev Vikram, Rohan Padhye

Abstract

Modern generator-based fuzzing techniques combine lightweight input generators with coverage-guided mutation as a method of exploring deep execution paths in a target program. A complimentary approach in prior research focuses on creating highly customized, domain-specific generators that encode structural and semantic logic sufficient enough to reach deep program states; the challenge comes from the overhead of writing and testing these complex generators. We investigate whether AI coding agents can automatically synthesize such target-specific generators, and whether the resulting generators are strong enough to obviate the need for coverage guidance and mutation entirely. Our approach, Gentoo, is comprised of an LLM coding agent (provided terminal access and source code of the fuzz target and its library) instructed to iteratively synthesize and refine an input generator, and optionally provided fine-grained predicate-level coverage feedback. We evaluate three configurations of Gentoo against human-written generators on fuzz targets for 7 real-world Java libraries. Our findings show that agent-synthesized generators achieve statistically significantly higher branch coverage than human-written baseline generators on 4 of 7 benchmarks. Critically, the use of coverage guidance and mutation strategies is not statistically significantly beneficial for agent-synthesized generators, but is significant for all human-written generators, suggesting that structural and semantic logic encoded in the agent generators makes coverage guidance largely unnecessary.

Fuzzing with Agents? Generators Are All You Need

Abstract

Modern generator-based fuzzing techniques combine lightweight input generators with coverage-guided mutation as a method of exploring deep execution paths in a target program. A complimentary approach in prior research focuses on creating highly customized, domain-specific generators that encode structural and semantic logic sufficient enough to reach deep program states; the challenge comes from the overhead of writing and testing these complex generators. We investigate whether AI coding agents can automatically synthesize such target-specific generators, and whether the resulting generators are strong enough to obviate the need for coverage guidance and mutation entirely. Our approach, Gentoo, is comprised of an LLM coding agent (provided terminal access and source code of the fuzz target and its library) instructed to iteratively synthesize and refine an input generator, and optionally provided fine-grained predicate-level coverage feedback. We evaluate three configurations of Gentoo against human-written generators on fuzz targets for 7 real-world Java libraries. Our findings show that agent-synthesized generators achieve statistically significantly higher branch coverage than human-written baseline generators on 4 of 7 benchmarks. Critically, the use of coverage guidance and mutation strategies is not statistically significantly beneficial for agent-synthesized generators, but is significant for all human-written generators, suggesting that structural and semantic logic encoded in the agent generators makes coverage guidance largely unnecessary.

Paper Structure

This paper contains 38 sections, 1 equation, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Comparison of prior approaches (coverage-guided parametric fuzzing) with our proposed approach Gentoo: agentic synthesis of input generators with fine-grained predicate feedback.
  • Figure 2: Skeleton of a JQF Generator for ChocoPy. The generator uses SourceOfRandomness to make random choices that produce a structured string input.
  • Figure 3: Static (a) and dynamic (b) predicate records for a predicate in ChocoPy's TypeChecker. The static record identifies that line 208 is the high-value branch; the dynamic record reveals the generator rarely produces inputs that take it, making it a direct target for refinement.
  • Figure 4: Branch coverage (mean $\pm$ std across five repetitions) for each technique and benchmark, normalized relative to the human-written generator baseline. Higher is better, * is statistically significance on a Mann-Whitney U test. In 4 out of 7 benchmarks, at least one agent-based approach significantly higher coverage.
  • Figure 5: Simplified excerpt from a Gentoo-L agent-synthesized ChocoPy generator. The generator encodes ChocoPy type semantics directly: self is typed with the enclosing class name and return values are constrained to match the method's declared return type. By always returning a type-consistent literal, the generator trades input diversity for guaranteed well-typedness.
  • ...and 2 more figures