Table of Contents
Fetching ...

Augmenting Greybox Fuzzing with Generative AI

Jie Hu, Qian Zhang, Heng Yin

TL;DR

Structured-input fuzzing is hampered by invalid mutations and rigid grammars. ChatFuzz integrates a generative AI mutator with AFL++ to produce format-conforming inputs, exploring key hyper-parameters of the LLM-based mutator. Empirical results across 12 programs show a 12.77% edge-coverage improvement over AFL++, with AI-generated seeds contributing a meaningful portion of the seed queue; results vary by input format and endpoint. Latency, syntax reliability, and stateful targets limit universal gains, but the approach demonstrates a viable path for AI-assisted fuzzing of real-world, structured-input programs and guides future dynamic tuning and format expansion.

Abstract

Real-world programs expecting structured inputs often has a format-parsing stage gating the deeper program space. Neither a mutation-based approach nor a generative approach can provide a solution that is effective and scalable. Large language models (LLM) pre-trained with an enormous amount of natural language corpus have proved to be effective for understanding the implicit format syntax and generating format-conforming inputs. In this paper, propose ChatFuzz, a greybox fuzzer augmented by generative AI. More specifically, we pick a seed in the fuzzer's seed pool and prompt ChatGPT generative models to variations, which are more likely to be format-conforming and thus of high quality. We conduct extensive experiments to explore the best practice for harvesting the power of generative LLM models. The experiment results show that our approach improves the edge coverage by 12.77\% over the SOTA greybox fuzzer (AFL++) on 12 target programs from three well-tested benchmarks. As for vulnerability detection, \sys is able to perform similar to or better than AFL++ for programs with explicit syntax rules but not for programs with non-trivial syntax.

Augmenting Greybox Fuzzing with Generative AI

TL;DR

Structured-input fuzzing is hampered by invalid mutations and rigid grammars. ChatFuzz integrates a generative AI mutator with AFL++ to produce format-conforming inputs, exploring key hyper-parameters of the LLM-based mutator. Empirical results across 12 programs show a 12.77% edge-coverage improvement over AFL++, with AI-generated seeds contributing a meaningful portion of the seed queue; results vary by input format and endpoint. Latency, syntax reliability, and stateful targets limit universal gains, but the approach demonstrates a viable path for AI-assisted fuzzing of real-world, structured-input programs and guides future dynamic tuning and format expansion.

Abstract

Real-world programs expecting structured inputs often has a format-parsing stage gating the deeper program space. Neither a mutation-based approach nor a generative approach can provide a solution that is effective and scalable. Large language models (LLM) pre-trained with an enormous amount of natural language corpus have proved to be effective for understanding the implicit format syntax and generating format-conforming inputs. In this paper, propose ChatFuzz, a greybox fuzzer augmented by generative AI. More specifically, we pick a seed in the fuzzer's seed pool and prompt ChatGPT generative models to variations, which are more likely to be format-conforming and thus of high quality. We conduct extensive experiments to explore the best practice for harvesting the power of generative LLM models. The experiment results show that our approach improves the edge coverage by 12.77\% over the SOTA greybox fuzzer (AFL++) on 12 target programs from three well-tested benchmarks. As for vulnerability detection, \sys is able to perform similar to or better than AFL++ for programs with explicit syntax rules but not for programs with non-trivial syntax.
Paper Structure (20 sections, 11 figures, 11 tables)

This paper contains 20 sections, 11 figures, 11 tables.

Figures (11)

  • Figure 1: ChatFuzz Overview
  • Figure 2: Model Latency and max_tokens
  • Figure 3: Model Latency and n for CT Endpoint
  • Figure 4: Model Latency and n for CP Endpoint
  • Figure 5: Seed unique ratio of all generated seeds. Note that the result of AI_CT is in a solid line while that of AI_CP is in a dashed line.
  • ...and 6 more figures