SAFuzz: Semantic-Guided Adaptive Fuzzing for LLM-Generated Code
Ziyi Yang, Kalit Inani, Keshav Kabra, Vima Gupta, Anand Padmanabha Iyer
TL;DR
SAFuzz addresses the challenge of testing AI-generated code at scale by introducing semantic-guided adaptive fuzzing. It uses three core components: prompt variant generation to capture prompt-induced diversity, an LLM-driven fuzz-harness generator with semantic oracles, and a vulnerability predictor that allocates fuzzing resources adaptively. On 96 CSES algorithmic problems, SAFuzz improves vulnerability discrimination precision from $77.9\%$ to $85.7\%$, reduces total fuzzing time by $1.71\times$, and, when combined with unit test generation, increases bug-detection recall from $67.3\%$ to $79.5\%$. These results demonstrate substantial efficiency gains and complementary strengths between fuzzing and unit testing, offering a scalable path for robust AI-generated code safety.
Abstract
While AI-coding assistants accelerate software development, current testing frameworks struggle to keep pace with the resulting volume of AI-generated code. Traditional fuzzing techniques often allocate resources uniformly and lack semantic awareness of algorithmic vulnerability patterns, leading to inefficient resource usage and missed vulnerabilities. To address these limitations, we present a hybrid testing framework that leverages LLM-guided adaptive fuzzing to detect algorithmic vulnerabilities efficiently. Our system SAFuzz integrates prompt-based behavioral diversification, harness generation with problem-specific oracles, and an LLM-based predictor to enable adaptive resource allocation and dynamic early stopping. Evaluating SAFuzz on CSES algorithmic problems, we improve vulnerability discrimination precision from 77.9% to 85.7% and achieve a 1.71x reduction in time cost compared to SOTA GreenFuzz while maintaining comparable recall. We further observe that combining our approach with existing unit test generation methods yields complementary gains, increasing the bug detection recall from 67.3% to 79.5%.
