Table of Contents
Fetching ...

On the Challenges of Fuzzing Techniques via Large Language Models

Linghan Huang, Peizhou Zhao, Huaming Chen, Lei Ma

TL;DR

The paper addresses automating fuzzing for software security by surveying how large language models are used to generate and drive fuzzing inputs. It synthesizes methods into two main strands—prompt-based fuzzing and fine-tuning-based fuzzers—and reviews benchmarks, metrics, and evaluations. It reports that LLM-based fuzzers can boost API and code coverage and uncover deeper vulnerabilities while reducing manual effort, but face hallucinations, data quality, and computational cost challenges. The work provides a roadmap for future research toward scalable, automated, human-in-the-loop fuzzing systems and potential hardware testing applications.

Abstract

In the modern era where software plays a pivotal role, software security and vulnerability analysis are essential for secure software development. Fuzzing test, as an efficient and traditional software testing method, has been widely adopted across various domains. Meanwhile, the rapid development in Large Language Models (LLMs) has facilitated their application in the field of software testing, demonstrating remarkable performance. As existing fuzzing test techniques are not fully automated and software vulnerabilities continue to evolve, there is a growing interest in leveraging large language models to generate fuzzing test. In this paper, we present a systematic overview of the developments that utilize large language models for the fuzzing test. To our best knowledge, this is the first work that covers the intersection of three areas, including LLMs, fuzzing test, and fuzzing test generated based on LLMs. A statistical analysis and discussion of the literature are conducted by summarizing the state-of-the-art methods up to date of the submission. Our work also investigates the potential for widespread deployment and application of fuzzing test techniques generated by LLMs in the future, highlighting their promise for advancing automated software testing practices.

On the Challenges of Fuzzing Techniques via Large Language Models

TL;DR

The paper addresses automating fuzzing for software security by surveying how large language models are used to generate and drive fuzzing inputs. It synthesizes methods into two main strands—prompt-based fuzzing and fine-tuning-based fuzzers—and reviews benchmarks, metrics, and evaluations. It reports that LLM-based fuzzers can boost API and code coverage and uncover deeper vulnerabilities while reducing manual effort, but face hallucinations, data quality, and computational cost challenges. The work provides a roadmap for future research toward scalable, automated, human-in-the-loop fuzzing systems and potential hardware testing applications.

Abstract

In the modern era where software plays a pivotal role, software security and vulnerability analysis are essential for secure software development. Fuzzing test, as an efficient and traditional software testing method, has been widely adopted across various domains. Meanwhile, the rapid development in Large Language Models (LLMs) has facilitated their application in the field of software testing, demonstrating remarkable performance. As existing fuzzing test techniques are not fully automated and software vulnerabilities continue to evolve, there is a growing interest in leveraging large language models to generate fuzzing test. In this paper, we present a systematic overview of the developments that utilize large language models for the fuzzing test. To our best knowledge, this is the first work that covers the intersection of three areas, including LLMs, fuzzing test, and fuzzing test generated based on LLMs. A statistical analysis and discussion of the literature are conducted by summarizing the state-of-the-art methods up to date of the submission. Our work also investigates the potential for widespread deployment and application of fuzzing test techniques generated by LLMs in the future, highlighting their promise for advancing automated software testing practices.
Paper Structure (26 sections, 1 figure, 1 table)