Table of Contents
Fetching ...

Prompt Fuzzing for Fuzz Driver Generation

Yunlong Lyu, Yuxuan Xie, Peng Chen, Hao Chen

TL;DR

PromptFuzz presents a coverage-guided, prompt-based fuzzing framework that leverages large language models to iteratively generate fuzz drivers for libraries. By coupling instructive program generation with erroneous-program elimination and coverage-guided prompt mutation, PromptFuzz directs LLMs toward diverse and deep API usage while validating generated code with runtime sanitizers and critical-path coverage checks. The system also infers and converts API argument constraints to create viable fuzzable seeds, which are compiled into a fuzz driver and scheduled for fuzzing with traditional engines. Evaluations across 14 real-world libraries show PromptFuzz achieving substantially higher branch coverage than OSS-Fuzz and Hopper and discovering numerous new, community-confirmed bugs, indicating practical impact for automated fuzz driver generation. The work discusses limitations, potential improvements, and broader applicability to other software domains and languages.

Abstract

Crafting high-quality fuzz drivers not only is time-consuming but also requires a deep understanding of the library. However, the state-of-the-art automatic fuzz driver generation techniques fall short of expectations. While fuzz drivers derived from consumer code can reach deep states, they have limited coverage. Conversely, interpretative fuzzing can explore most API calls but requires numerous attempts within a large search space. We propose PromptFuzz, a coverage-guided fuzzer for prompt fuzzing that iteratively generates fuzz drivers to explore undiscovered library code. To explore API usage in fuzz drivers during prompt fuzzing, we propose several key techniques: instructive program generation, erroneous program validation, coverage-guided prompt mutation, and constrained fuzzer scheduling. We implemented PromptFuzz and evaluated it on 14 real-world libraries. Compared with OSS-Fuzz and Hopper (the state-of-the-art fuzz driver generation tool), fuzz drivers generated by PromptFuzz achieved 1.61 and 1.63 times higher branch coverage than those by OSS-Fuzz and Hopper, respectively. Moreover, the fuzz drivers generated by PromptFuzz detected 33 genuine, new bugs out of a total of 49 crashes, out of which 30 bugs have been confirmed by their respective communities.

Prompt Fuzzing for Fuzz Driver Generation

TL;DR

PromptFuzz presents a coverage-guided, prompt-based fuzzing framework that leverages large language models to iteratively generate fuzz drivers for libraries. By coupling instructive program generation with erroneous-program elimination and coverage-guided prompt mutation, PromptFuzz directs LLMs toward diverse and deep API usage while validating generated code with runtime sanitizers and critical-path coverage checks. The system also infers and converts API argument constraints to create viable fuzzable seeds, which are compiled into a fuzz driver and scheduled for fuzzing with traditional engines. Evaluations across 14 real-world libraries show PromptFuzz achieving substantially higher branch coverage than OSS-Fuzz and Hopper and discovering numerous new, community-confirmed bugs, indicating practical impact for automated fuzz driver generation. The work discusses limitations, potential improvements, and broader applicability to other software domains and languages.

Abstract

Crafting high-quality fuzz drivers not only is time-consuming but also requires a deep understanding of the library. However, the state-of-the-art automatic fuzz driver generation techniques fall short of expectations. While fuzz drivers derived from consumer code can reach deep states, they have limited coverage. Conversely, interpretative fuzzing can explore most API calls but requires numerous attempts within a large search space. We propose PromptFuzz, a coverage-guided fuzzer for prompt fuzzing that iteratively generates fuzz drivers to explore undiscovered library code. To explore API usage in fuzz drivers during prompt fuzzing, we propose several key techniques: instructive program generation, erroneous program validation, coverage-guided prompt mutation, and constrained fuzzer scheduling. We implemented PromptFuzz and evaluated it on 14 real-world libraries. Compared with OSS-Fuzz and Hopper (the state-of-the-art fuzz driver generation tool), fuzz drivers generated by PromptFuzz achieved 1.61 and 1.63 times higher branch coverage than those by OSS-Fuzz and Hopper, respectively. Moreover, the fuzz drivers generated by PromptFuzz detected 33 genuine, new bugs out of a total of 49 crashes, out of which 30 bugs have been confirmed by their respective communities.
Paper Structure (39 sections, 4 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 39 sections, 4 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: A fuzz driver for libvpx
  • Figure 2: Fuzz driver generation in PromptFuzz. Seed represents a program instance generated by LLMs.
  • Figure 3: Prompt template
  • Figure 4: Erroneous program validation. corpora represents a program input, and fuzzing corpus represents a collection of program inputs.
  • Figure 5: Branches covered by seed programs generated by PromptFuzz under coverage-guided mutation and blind mutation
  • ...and 4 more figures