Augmenting Greybox Fuzzing with Generative AI

Jie Hu; Qian Zhang; Heng Yin

Augmenting Greybox Fuzzing with Generative AI

Jie Hu, Qian Zhang, Heng Yin

TL;DR

Structured-input fuzzing is hampered by invalid mutations and rigid grammars. ChatFuzz integrates a generative AI mutator with AFL++ to produce format-conforming inputs, exploring key hyper-parameters of the LLM-based mutator. Empirical results across 12 programs show a 12.77% edge-coverage improvement over AFL++, with AI-generated seeds contributing a meaningful portion of the seed queue; results vary by input format and endpoint. Latency, syntax reliability, and stateful targets limit universal gains, but the approach demonstrates a viable path for AI-assisted fuzzing of real-world, structured-input programs and guides future dynamic tuning and format expansion.

Abstract

Real-world programs expecting structured inputs often has a format-parsing stage gating the deeper program space. Neither a mutation-based approach nor a generative approach can provide a solution that is effective and scalable. Large language models (LLM) pre-trained with an enormous amount of natural language corpus have proved to be effective for understanding the implicit format syntax and generating format-conforming inputs. In this paper, propose ChatFuzz, a greybox fuzzer augmented by generative AI. More specifically, we pick a seed in the fuzzer's seed pool and prompt ChatGPT generative models to variations, which are more likely to be format-conforming and thus of high quality. We conduct extensive experiments to explore the best practice for harvesting the power of generative LLM models. The experiment results show that our approach improves the edge coverage by 12.77\% over the SOTA greybox fuzzer (AFL++) on 12 target programs from three well-tested benchmarks. As for vulnerability detection, \sys is able to perform similar to or better than AFL++ for programs with explicit syntax rules but not for programs with non-trivial syntax.

Augmenting Greybox Fuzzing with Generative AI

TL;DR

Abstract

Paper Structure (20 sections, 11 figures, 11 tables)

This paper contains 20 sections, 11 figures, 11 tables.

Introduction
Background and Related Work
Large Language Models
A Motivating Example
Design and Implementation
Hyper-Parameter Selection
Model Endpoint Choice
Prompt Design
max_tokens - Maximum number of tokens
n - Completion choices number
temperature - Sampling Temperature
Prompt Ablation Study
Evaluation
Evaluation Plan
RQ1: Coverage Efficiency
...and 5 more sections

Figures (11)

Figure 1: ChatFuzz Overview
Figure 2: Model Latency and max_tokens
Figure 3: Model Latency and n for CT Endpoint
Figure 4: Model Latency and n for CP Endpoint
Figure 5: Seed unique ratio of all generated seeds. Note that the result of AI_CT is in a solid line while that of AI_CP is in a dashed line.
...and 6 more figures

Augmenting Greybox Fuzzing with Generative AI

TL;DR

Abstract

Augmenting Greybox Fuzzing with Generative AI

Authors

TL;DR

Abstract

Table of Contents

Figures (11)