Table of Contents
Fetching ...

Harnessing Large Language Models for Seed Generation in Greybox Fuzzing

Wenxuan Shi, Yunhang Zhang, Xinyu Xing, Jun Xu

TL;DR

SeedMind addresses the challenge of generating high-quality seeds for greybox fuzzing, especially for non-standard input formats. It uses LLMs to produce seed-generating programs rather than direct seeds, integrating a coverage-guided, iterative refinement loop and a state-driven realignment mechanism to handle context limits and model reliability. The approach yields seed quality close to human-generated seeds and consistently outperforms prior LLM-based baselines across OSS-Fuzz and MAGMA benchmarks, using multiple models with practical cost. The work demonstrates the practicality and generality of LLM-assisted seed generation for diverse software targets and input modalities, facilitating more effective and scalable fuzzing.

Abstract

Greybox fuzzing has emerged as a preferred technique for discovering software bugs, striking a balance between efficiency and depth of exploration. While research has focused on improving fuzzing techniques, the importance of high-quality initial seeds remains critical yet often overlooked. Existing methods for seed generation are limited, especially for programs with non-standard or custom input formats. Large Language Models (LLMs) has revolutionized numerous domains, showcasing unprecedented capabilities in understanding and generating complex patterns across various fields of knowledge. This paper introduces SeedMind, a novel system that leverages LLMs to boost greybox fuzzing through intelligent seed generation. Unlike previous approaches, SeedMind employs LLMs to create test case generators rather than directly producing test cases. Our approach implements an iterative, feedback-driven process that guides the LLM to progressively refine test case generation, aiming for increased code coverage depth and breadth. In developing SeedMind, we addressed key challenges including input format limitations, context window constraints, and ensuring consistent, progress-aware behavior. Intensive evaluations with real-world applications show that SeedMind effectively harnesses LLMs to generate high-quality test cases and facilitate fuzzing in bug finding, presenting utility comparable to human-created seeds and significantly outperforming the existing LLM-based solutions.

Harnessing Large Language Models for Seed Generation in Greybox Fuzzing

TL;DR

SeedMind addresses the challenge of generating high-quality seeds for greybox fuzzing, especially for non-standard input formats. It uses LLMs to produce seed-generating programs rather than direct seeds, integrating a coverage-guided, iterative refinement loop and a state-driven realignment mechanism to handle context limits and model reliability. The approach yields seed quality close to human-generated seeds and consistently outperforms prior LLM-based baselines across OSS-Fuzz and MAGMA benchmarks, using multiple models with practical cost. The work demonstrates the practicality and generality of LLM-assisted seed generation for diverse software targets and input modalities, facilitating more effective and scalable fuzzing.

Abstract

Greybox fuzzing has emerged as a preferred technique for discovering software bugs, striking a balance between efficiency and depth of exploration. While research has focused on improving fuzzing techniques, the importance of high-quality initial seeds remains critical yet often overlooked. Existing methods for seed generation are limited, especially for programs with non-standard or custom input formats. Large Language Models (LLMs) has revolutionized numerous domains, showcasing unprecedented capabilities in understanding and generating complex patterns across various fields of knowledge. This paper introduces SeedMind, a novel system that leverages LLMs to boost greybox fuzzing through intelligent seed generation. Unlike previous approaches, SeedMind employs LLMs to create test case generators rather than directly producing test cases. Our approach implements an iterative, feedback-driven process that guides the LLM to progressively refine test case generation, aiming for increased code coverage depth and breadth. In developing SeedMind, we addressed key challenges including input format limitations, context window constraints, and ensuring consistent, progress-aware behavior. Intensive evaluations with real-world applications show that SeedMind effectively harnesses LLMs to generate high-quality test cases and facilitate fuzzing in bug finding, presenting utility comparable to human-created seeds and significantly outperforming the existing LLM-based solutions.

Paper Structure

This paper contains 27 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Illustration of challenges incurred by input formats.
  • Figure 2: Workflow of SeedMind.
  • Figure 3: An illustration of code coverage on dynamic call graph.
  • Figure 4: State Machine of SeedMind.
  • Figure 5: Results of bug-finding evaluation with MAGMA. NONE means no seeds are used, and DEFAULT represents the default seed corpus shipped with the fuzzing target. The numbers stand for the average time-to-trigger of the corresponding bug. Values highlighted with green indicate the shortest time-to-trigger among the four solutions.
  • ...and 3 more figures