Table of Contents
Fetching ...

HarnessAgent: Scaling Automatic Fuzzing Harness Construction with Tool-Augmented LLM Pipelines

Kang Yang, Yunhang Zhang, Zichuan Li, Guanhong Tao, Jun Xu, Xiaojing Liao

TL;DR

This work tackles the scalability gap in automatic fuzzing harness construction for internal functions by addressing context provisioning, validation integrity, and compilation reliability. It presents HarnessAgent, a tool-augmented agent framework featuring a compilation-error triage system, a hybrid LSP/Tree-Sitter tool pool for robust code retrieval, and an enhanced validation module to detect fake definitions. Across 243 OSS-Fuzz target functions, HarnessAgent delivers about a 20% improvement in three-shot harness success rates over state-of-the-art baselines and achieves substantial gains in target-function coverage during one-hour fuzzing, with source-code retrieval rates exceeding 90%. The results demonstrate that coordinated context routing, strong tooling, and disciplined validation enable scalable, reliable harness generation in large, real-world codebases, accompanied by released datasets and code.

Abstract

Large language model (LLM)-based techniques have achieved notable progress in generating harnesses for program fuzzing. However, applying them to arbitrary functions (especially internal functions) \textit{at scale} remains challenging due to the requirement of sophisticated contextual information, such as specification, dependencies, and usage examples. State-of-the-art methods heavily rely on static or incomplete context provisioning, causing failure of generating functional harnesses. Furthermore, LLMs tend to exploit harness validation metrics, producing plausible yet logically useless code. % Therefore, harness generation across large and diverse projects continues to face challenges in reliable compilation, robust code retrieval, and comprehensive validation. To address these challenges, we present HarnessAgent, a tool-augmented agentic framework that achieves fully automated, scalable harness construction over hundreds of OSS-Fuzz targets. HarnessAgent introduces three key innovations: 1) a rule-based strategy to identify and minimize various compilation errors; 2) a hybrid tool pool for precise and robust symbol source code retrieval; and 3) an enhanced harness validation pipeline that detects fake definitions. We evaluate HarnessAgent on 243 target functions from OSS-Fuzz projects (65 C projects and 178 C++ projects). It improves the three-shot success rate by approximately 20\% compared to state-of-the-art techniques, reaching 87\% for C and 81\% for C++. Our one-hour fuzzing results show that more than 75\% of the harnesses generated by HarnessAgent increase the target function coverage, surpassing the baselines by over 10\%. In addition, the hybrid tool-pool system of HarnessAgent achieves a response rate of over 90\% for source code retrieval, outperforming Fuzz Introspector by more than 30\%.

HarnessAgent: Scaling Automatic Fuzzing Harness Construction with Tool-Augmented LLM Pipelines

TL;DR

This work tackles the scalability gap in automatic fuzzing harness construction for internal functions by addressing context provisioning, validation integrity, and compilation reliability. It presents HarnessAgent, a tool-augmented agent framework featuring a compilation-error triage system, a hybrid LSP/Tree-Sitter tool pool for robust code retrieval, and an enhanced validation module to detect fake definitions. Across 243 OSS-Fuzz target functions, HarnessAgent delivers about a 20% improvement in three-shot harness success rates over state-of-the-art baselines and achieves substantial gains in target-function coverage during one-hour fuzzing, with source-code retrieval rates exceeding 90%. The results demonstrate that coordinated context routing, strong tooling, and disciplined validation enable scalable, reliable harness generation in large, real-world codebases, accompanied by released datasets and code.

Abstract

Large language model (LLM)-based techniques have achieved notable progress in generating harnesses for program fuzzing. However, applying them to arbitrary functions (especially internal functions) \textit{at scale} remains challenging due to the requirement of sophisticated contextual information, such as specification, dependencies, and usage examples. State-of-the-art methods heavily rely on static or incomplete context provisioning, causing failure of generating functional harnesses. Furthermore, LLMs tend to exploit harness validation metrics, producing plausible yet logically useless code. % Therefore, harness generation across large and diverse projects continues to face challenges in reliable compilation, robust code retrieval, and comprehensive validation. To address these challenges, we present HarnessAgent, a tool-augmented agentic framework that achieves fully automated, scalable harness construction over hundreds of OSS-Fuzz targets. HarnessAgent introduces three key innovations: 1) a rule-based strategy to identify and minimize various compilation errors; 2) a hybrid tool pool for precise and robust symbol source code retrieval; and 3) an enhanced harness validation pipeline that detects fake definitions. We evaluate HarnessAgent on 243 target functions from OSS-Fuzz projects (65 C projects and 178 C++ projects). It improves the three-shot success rate by approximately 20\% compared to state-of-the-art techniques, reaching 87\% for C and 81\% for C++. Our one-hour fuzzing results show that more than 75\% of the harnesses generated by HarnessAgent increase the target function coverage, surpassing the baselines by over 10\%. In addition, the hybrid tool-pool system of HarnessAgent achieves a response rate of over 90\% for source code retrieval, outperforming Fuzz Introspector by more than 30\%.

Paper Structure

This paper contains 28 sections, 7 figures, 8 tables.

Figures (7)

  • Figure 1: The General Harness Generation Workflow with Large Language Model (LLM).
  • Figure 2: The number of successful harnesses generated by existing methods.
  • Figure 3: The agentic framework with tool augmentation for harness generation.
  • Figure 4: The overlapped successful project between HarnessAgent and the rest of the evaluated methods.
  • Figure 5: The one-hour fuzzing results for all evaluated methods.
  • ...and 2 more figures