Table of Contents
Fetching ...

Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback

Shiyin Lin

TL;DR

This work presents a hybrid fuzzing framework that fuses static analysis, dynamic execution, and Large Language Model (LLM) guided input mutation with semantic feedback to explore deep program states beyond traditional coverage. By converting static analysis outputs into structured LLM prompts and augmenting coverage with embedding-based semantic novelty signals, the approach prioritizes inputs that provoke novel behavioral changes rather than merely increasing code coverage. Implemented on AFL++ and evaluated on real-world targets (libpng, tcpdump, sqlite), the method achieves faster time-to-first-bug and greater semantic diversity while maintaining competitive bug yields compared to state-of-the-art fuzzers. The study demonstrates that semantic-aware guidance, combined with context-driven mutation, can accelerate vulnerability discovery and deepen behavioral exploration, albeit with overhead and cost considerations associated with LLM usage.

Abstract

Software fuzzing has become a cornerstone in automated vulnerability discovery, yet existing mutation strategies often lack semantic awareness, leading to redundant test cases and slow exploration of deep program states. In this work, I present a hybrid fuzzing framework that integrates static and dynamic analysis with Large Language Model (LLM)-guided input mutation and semantic feedback. Static analysis extracts control-flow and data-flow information, which is transformed into structured prompts for the LLM to generate syntactically valid and semantically diverse inputs. During execution, I augment traditional coverage-based feedback with semantic feedback signals-derived from program state changes, exception types, and output semantics-allowing the fuzzer to prioritize inputs that trigger novel program behaviors beyond mere code coverage. I implement our approach atop AFL++, combining program instrumentation with embedding-based semantic similarity metrics to guide seed selection. Evaluation on real-world open-source targets, including libpng, tcpdump, and sqlite, demonstrates that our method achieves faster time-to-first-bug, higher semantic diversity, and a competitive number of unique bugs compared to state-of-the-art fuzzers. This work highlights the potential of combining LLM reasoning with semantic-aware feedback to accelerate and deepen vulnerability discovery.

Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback

TL;DR

This work presents a hybrid fuzzing framework that fuses static analysis, dynamic execution, and Large Language Model (LLM) guided input mutation with semantic feedback to explore deep program states beyond traditional coverage. By converting static analysis outputs into structured LLM prompts and augmenting coverage with embedding-based semantic novelty signals, the approach prioritizes inputs that provoke novel behavioral changes rather than merely increasing code coverage. Implemented on AFL++ and evaluated on real-world targets (libpng, tcpdump, sqlite), the method achieves faster time-to-first-bug and greater semantic diversity while maintaining competitive bug yields compared to state-of-the-art fuzzers. The study demonstrates that semantic-aware guidance, combined with context-driven mutation, can accelerate vulnerability discovery and deepen behavioral exploration, albeit with overhead and cost considerations associated with LLM usage.

Abstract

Software fuzzing has become a cornerstone in automated vulnerability discovery, yet existing mutation strategies often lack semantic awareness, leading to redundant test cases and slow exploration of deep program states. In this work, I present a hybrid fuzzing framework that integrates static and dynamic analysis with Large Language Model (LLM)-guided input mutation and semantic feedback. Static analysis extracts control-flow and data-flow information, which is transformed into structured prompts for the LLM to generate syntactically valid and semantically diverse inputs. During execution, I augment traditional coverage-based feedback with semantic feedback signals-derived from program state changes, exception types, and output semantics-allowing the fuzzer to prioritize inputs that trigger novel program behaviors beyond mere code coverage. I implement our approach atop AFL++, combining program instrumentation with embedding-based semantic similarity metrics to guide seed selection. Evaluation on real-world open-source targets, including libpng, tcpdump, and sqlite, demonstrates that our method achieves faster time-to-first-bug, higher semantic diversity, and a competitive number of unique bugs compared to state-of-the-art fuzzers. This work highlights the potential of combining LLM reasoning with semantic-aware feedback to accelerate and deepen vulnerability discovery.

Paper Structure

This paper contains 34 sections, 3 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Overall architecture of the proposed hybrid fuzzing framework, integrating static analysis, LLM-guided input mutation, and semantic feedback.
  • Figure 2: Time-to-first-bug comparison betIen AFL++ and our approach over a 72-hour fuzzing campaign. LoIr values indicate faster bug discovery.
  • Figure 3: Mean semantic novelty score over time for AFL++ and our approach. Higher novelty indicates greater behavioral diversity in generated test cases.