Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback
Shiyin Lin
TL;DR
This work presents a hybrid fuzzing framework that fuses static analysis, dynamic execution, and Large Language Model (LLM) guided input mutation with semantic feedback to explore deep program states beyond traditional coverage. By converting static analysis outputs into structured LLM prompts and augmenting coverage with embedding-based semantic novelty signals, the approach prioritizes inputs that provoke novel behavioral changes rather than merely increasing code coverage. Implemented on AFL++ and evaluated on real-world targets (libpng, tcpdump, sqlite), the method achieves faster time-to-first-bug and greater semantic diversity while maintaining competitive bug yields compared to state-of-the-art fuzzers. The study demonstrates that semantic-aware guidance, combined with context-driven mutation, can accelerate vulnerability discovery and deepen behavioral exploration, albeit with overhead and cost considerations associated with LLM usage.
Abstract
Software fuzzing has become a cornerstone in automated vulnerability discovery, yet existing mutation strategies often lack semantic awareness, leading to redundant test cases and slow exploration of deep program states. In this work, I present a hybrid fuzzing framework that integrates static and dynamic analysis with Large Language Model (LLM)-guided input mutation and semantic feedback. Static analysis extracts control-flow and data-flow information, which is transformed into structured prompts for the LLM to generate syntactically valid and semantically diverse inputs. During execution, I augment traditional coverage-based feedback with semantic feedback signals-derived from program state changes, exception types, and output semantics-allowing the fuzzer to prioritize inputs that trigger novel program behaviors beyond mere code coverage. I implement our approach atop AFL++, combining program instrumentation with embedding-based semantic similarity metrics to guide seed selection. Evaluation on real-world open-source targets, including libpng, tcpdump, and sqlite, demonstrates that our method achieves faster time-to-first-bug, higher semantic diversity, and a competitive number of unique bugs compared to state-of-the-art fuzzers. This work highlights the potential of combining LLM reasoning with semantic-aware feedback to accelerate and deepen vulnerability discovery.
