Towards Reliable LLM-Driven Fuzz Testing: Vision and Road Ahead
Yiran Cheng, Hong Jin Kang, Lwin Khin Shar, Chaopeng Dong, Zhiqiang Shi, Shichao Lv, Limin Sun
TL;DR
This paper investigates the reliability bottlenecks in LLM-driven fuzz testing (LLM4Fuzz) and surveys current driver- and seed-generation approaches, highlighting low validity rates such as $27.43\\%$ on PyTorch and $17.41\\%$ on TensorFlow for drivers. It then outlines a road map toward reliable LLM4Fuzz, including API-knowledge enhancement via program analysis, robust driver verification, a hybrid traditional+LLM fuzzing approach, and the use of multimodal LLMs, accompanied by concrete reliability metrics. The authors identify four key challenges—complex API dependencies, driver QA gaps, seed quality trade-offs, and semantic consistency—and propose S1–S4 research directions to address them. If realized, these developments could enable fast, private, and affordable security testing at industry scale, broadening access and reducing time-to-vulnerability discovery.
Abstract
Fuzz testing is a crucial component of software security assessment, yet its effectiveness heavily relies on valid fuzz drivers and diverse seed inputs. Recent advancements in Large Language Models (LLMs) offer transformative potential for automating fuzz testing (LLM4Fuzz), particularly in generating drivers and seeds. However, current LLM4Fuzz solutions face critical reliability challenges, including low driver validity rates and seed quality trade-offs, hindering their practical adoption. This paper aims to examine the reliability bottlenecks of LLM-driven fuzzing and explores potential research directions to address these limitations. It begins with an overview of the current development of LLM4SE and emphasizes the necessity for developing reliable LLM4Fuzz solutions. Following this, the paper envisions a vision where reliable LLM4Fuzz transforms the landscape of software testing and security for industry, software development practitioners, and economic accessibility. It then outlines a road ahead for future research, identifying key challenges and offering specific suggestions for the researchers to consider. This work strives to spark innovation in the field, positioning reliable LLM4Fuzz as a fundamental component of modern software testing.
