Table of Contents
Fetching ...

Large Language Model assisted Hybrid Fuzzing

Ruijie Meng, Gregory J. Duck, Abhik Roychoudhury

TL;DR

The paper tackles the inefficiency of traditional concolic execution within hybrid fuzzing by introducing HyLLfuzz, an approach that leverages Large Language Models to generate inputs guiding exploration at coverage roadblocks. Rather than translating and solving symbolic path constraints, HyLLfuzz constructs a code slice from the concrete execution trace and prompts an LLM to modify or craft inputs that satisfy the sliced constraints, enabling efficient path exploration. Empirical results across multiple benchmarks show HyLLfuzz delivering 40–50% more branch coverage than state-of-the-art hybrid fuzzers and reducing concolic execution time by 4–19x, while maintaining usability by avoiding environment modeling and heavy symbolic tooling. A case study on a Java Jenkins subject demonstrates the approach's ability to expose complex constraints that hinder traditional concolic engines. Overall, the work suggests that integrating LLM-driven input generation into hybrid fuzzing can significantly improve effectiveness, efficiency, and accessibility in real-world software testing.

Abstract

Greybox fuzzing is one of the most popular methods for detecting software vulnerabilities, which conducts a biased random search within the program input space. To enhance its effectiveness in achieving deep coverage of program behaviors, greybox fuzzing is often combined with concolic execution, which performs a path-sensitive search over the domain of program inputs. In hybrid fuzzing, conventional greybox fuzzing is followed by concolic execution in an iterative loop, where reachability roadblocks encountered by greybox fuzzing are tackled by concolic execution. However, such hybrid fuzzing still suffers from difficulties conventionally faced by symbolic execution, such as the need for environment modeling and system call support. In this work, we show how to achieve the effect of concolic execution without having to compute and solve symbolic path constraints. When coverage-based greybox fuzzing reaches a roadblock in terms of reaching certain branches, we conduct a slicing on the execution trace and suggest modifications of the input to reach the relevant branches. A Large Language Model (LLM) is used as a solver to generate the modified input for reaching the desired branches. Compared with both the vanilla greybox fuzzer AFL and hybrid fuzzers Intriguer and Qsym, our LLM-based hybrid fuzzer HyLLfuzz (pronounced "hill fuzz") demonstrates superior coverage. Furthermore, the LLM-based concolic execution in HyLLfuzz takes a time that is 4-19 times faster than the concolic execution running in existing hybrid fuzzing tools. This experience shows that LLMs can be effectively inserted into the iterative loop of hybrid fuzzers, to efficiently expose more program behaviors.

Large Language Model assisted Hybrid Fuzzing

TL;DR

The paper tackles the inefficiency of traditional concolic execution within hybrid fuzzing by introducing HyLLfuzz, an approach that leverages Large Language Models to generate inputs guiding exploration at coverage roadblocks. Rather than translating and solving symbolic path constraints, HyLLfuzz constructs a code slice from the concrete execution trace and prompts an LLM to modify or craft inputs that satisfy the sliced constraints, enabling efficient path exploration. Empirical results across multiple benchmarks show HyLLfuzz delivering 40–50% more branch coverage than state-of-the-art hybrid fuzzers and reducing concolic execution time by 4–19x, while maintaining usability by avoiding environment modeling and heavy symbolic tooling. A case study on a Java Jenkins subject demonstrates the approach's ability to expose complex constraints that hinder traditional concolic engines. Overall, the work suggests that integrating LLM-driven input generation into hybrid fuzzing can significantly improve effectiveness, efficiency, and accessibility in real-world software testing.

Abstract

Greybox fuzzing is one of the most popular methods for detecting software vulnerabilities, which conducts a biased random search within the program input space. To enhance its effectiveness in achieving deep coverage of program behaviors, greybox fuzzing is often combined with concolic execution, which performs a path-sensitive search over the domain of program inputs. In hybrid fuzzing, conventional greybox fuzzing is followed by concolic execution in an iterative loop, where reachability roadblocks encountered by greybox fuzzing are tackled by concolic execution. However, such hybrid fuzzing still suffers from difficulties conventionally faced by symbolic execution, such as the need for environment modeling and system call support. In this work, we show how to achieve the effect of concolic execution without having to compute and solve symbolic path constraints. When coverage-based greybox fuzzing reaches a roadblock in terms of reaching certain branches, we conduct a slicing on the execution trace and suggest modifications of the input to reach the relevant branches. A Large Language Model (LLM) is used as a solver to generate the modified input for reaching the desired branches. Compared with both the vanilla greybox fuzzer AFL and hybrid fuzzers Intriguer and Qsym, our LLM-based hybrid fuzzer HyLLfuzz (pronounced "hill fuzz") demonstrates superior coverage. Furthermore, the LLM-based concolic execution in HyLLfuzz takes a time that is 4-19 times faster than the concolic execution running in existing hybrid fuzzing tools. This experience shows that LLMs can be effectively inserted into the iterative loop of hybrid fuzzers, to efficiently expose more program behaviors.

Paper Structure

This paper contains 19 sections, 10 figures, 2 tables, 2 algorithms.

Figures (10)

  • Figure 1: Code fragment adapted from the cJSON.c file within the cJSON subject.
  • Figure 2: Workflow of solving the roadblock at cJSON.c:24 based on a seed input (selected from the seed corpus of the greybox fuzzer) and the constraints related to the roadblock (sliced from source code), and the new input is generated by the LLM.
  • Figure 3: Overall workflow of HyLLfuzz including greybox fuzzing and LLM-based concolic execution.
  • Figure 4: Prompt template for generating new input based on the given input and relevant code slice.
  • Figure 5: Average code coverage over time by AFL, Intriguer, QSYM and HyLLfuzz across 10 runs of 24 hours.
  • ...and 5 more figures