FuzzCoder: Byte-level Fuzzing Test via Large Language Model

Liqun Yang; Jian Yang; Chaoren Wei; Guanglin Niu; Ge Zhang; Yunli Wang; Linzheng ChaI; Wanxu Xia; Hongcheng Guo; Shun Zhang; Jiaheng Liu; Yuwei Yin; Junran Peng; Jiaxin Ma; Liang Sun; Zhoujun Li

FuzzCoder: Byte-level Fuzzing Test via Large Language Model

Liqun Yang, Jian Yang, Chaoren Wei, Guanglin Niu, Ge Zhang, Yunli Wang, Linzheng ChaI, Wanxu Xia, Hongcheng Guo, Shun Zhang, Jiaheng Liu, Yuwei Yin, Junran Peng, Jiaxin Ma, Liang Sun, Zhoujun Li

TL;DR

FuzzCoder tackles the challenge of learning effective input mutations for byte-level fuzzing by fine-tuning code LLMs on a domain-specific Fuzz-Instruct corpus. It reframes fuzzing as a sequence-to-sequence task that predicts mutation positions and strategies, integrating with AFL and evaluated on the Fuzz-Bench eight-program suite. The results show consistent improvements in the effective proportion of mutations $EPM$ and the number of crashes $NC$ across multiple data formats, demonstrating improved coverage and vulnerability discovery. The work contributes a dataset, a benchmark, and an instruction-tuning approach that enhances practical fuzzing performance, with potential for broader applicability and further gains via adapters.

Abstract

Fuzzing is an important dynamic program analysis technique designed for finding vulnerabilities in complex software. Fuzzing involves presenting a target program with crafted malicious input to cause crashes, buffer overflows, memory errors, and exceptions. Crafting malicious inputs in an efficient manner is a difficult open problem and the best approaches often apply uniform random mutations to pre-existing valid inputs. In this work, we propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks to guide future fuzzing explorations. Specifically, we develop a framework to leverage the code LLMs to guide the mutation process of inputs in fuzzing. The mutation process is formulated as the sequence-to-sequence modeling, where LLM receives a sequence of bytes and then outputs the mutated byte sequence. FuzzCoder is fine-tuned on the created instruction dataset (Fuzz-Instruct), where the successful fuzzing history is collected from the heuristic fuzzing tool. FuzzCoder can predict mutation locations and strategies locations in input files to trigger abnormal behaviors of the program. Experimental results show that FuzzCoder based on AFL (American Fuzzy Lop) gain significant improvements in terms of effective proportion of mutation (EPM) and number of crashes (NC) for various input formats including ELF, JPG, MP3, and XML.

FuzzCoder: Byte-level Fuzzing Test via Large Language Model

TL;DR

and the number of crashes

across multiple data formats, demonstrating improved coverage and vulnerability discovery. The work contributes a dataset, a benchmark, and an instruction-tuning approach that enhances practical fuzzing performance, with potential for broader applicability and further gains via adapters.

Abstract

Paper Structure (30 sections, 6 equations, 5 figures, 4 tables)

This paper contains 30 sections, 6 equations, 5 figures, 4 tables.

Introduction
Preliminary: Fuzzing Test
Fuzz-Bench
Data Construction
Data Split
Simulation Environment
Fuzzing Test via Generation Model
Input Encoding
Encoder-Decoder Framework
Decoder-only Framework
Mutation Strategy Prediction
Jointly Training
Incorporating LLMs into Fuzzing Test
Experiments
Implementation Details
...and 15 more sections

Figures (5)

Figure 1: Comparison between the standard byte-level fuzz test and our proposed method.
Figure 2: The workflow of the fuzzing test with fine-tuned LLMs FuzzCoder.
Figure 3: The prompt to get mutation positions and strategies of FuzzCoder.
Figure 4: Comparison between the baselines and FuzzCoder.
Figure 5: Comparison between the original JPG file and the JPG file after blur test

FuzzCoder: Byte-level Fuzzing Test via Large Language Model

TL;DR

Abstract

FuzzCoder: Byte-level Fuzzing Test via Large Language Model

Authors

TL;DR

Abstract

Table of Contents

Figures (5)