Table of Contents
Fetching ...

FalseCrashReducer: Mitigating False Positive Crashes in OSS-Fuzz-Gen Using Agentic AI

Paschal C. Amusuo, Dongge Liu, Ricardo Andres Calvo Mendez, Jonathan Metzman, Oliver Chang, James C. Davis

TL;DR

FalseCrashReducer addresses false positive crashes in OSS-Fuzz-Gen by introducing two complementary AI-driven strategies: constraint-based fuzz driver generation and context-based crash validation. The Constraint-based approach proactively enforces input and state constraints during fuzz driver generation, while the Context-based approach reactively validates crashes against feasible execution paths from program entry points. Evaluated on 1,555 OSS-Fuzz benchmarks, the methods reduce false positives by over 50% and decrease total crashes by up to 8%, with constraint satisfaction improving to about 64% of drivers meeting all constraints. The results demonstrate practical gains in scalability and reliability for automated fuzz driver pipelines, with manageable overhead (~9% of total API cost) and clear guidance on integrating AI agents into large-scale testing ecosystems. The work also provides open-source tooling and points to future directions in real-time constraint retrieval, improved crash analysis, and broader applicability beyond OSS-Fuzz-Gen.

Abstract

Fuzz testing has become a cornerstone technique for identifying software bugs and security vulnerabilities, with broad adoption in both industry and open-source communities. Directly fuzzing a function requires fuzz drivers, which translate random fuzzer inputs into valid arguments for the target function. Given the cost and expertise required to manually develop fuzz drivers, methods exist that leverage program analysis and Large Language Models to automatically generate these drivers. However, the generated fuzz drivers frequently lead to false positive crashes, especially in functions highly structured input and complex state requirements. This problem is especially crucial in industry-scale fuzz driver generation efforts like OSS-Fuzz-en, as reporting false positive crashes to maintainers impede trust in both the system and the team. This paper presents two AI-driven strategies to reduce false positives in OSS-Fuzz-Gen, a multi-agent system for automated fuzz driver generation. First, constraint-based fuzz driver generation proactively enforces constraints on a function's inputs and state to guide driver creation. Second, context-based crash validation reactively analyzes function callers to determine whether reported crashes are feasible from program entry points. Using 1,500 benchmark functions from OSS-Fuzz, we show that these strategies reduce spurious crashes by up to 8%, cut reported crashes by more than half, and demonstrate that frontier LLMs can serve as reliable program analysis agents. Our results highlight the promise and challenges of integrating AI into large-scale fuzzing pipelines.

FalseCrashReducer: Mitigating False Positive Crashes in OSS-Fuzz-Gen Using Agentic AI

TL;DR

FalseCrashReducer addresses false positive crashes in OSS-Fuzz-Gen by introducing two complementary AI-driven strategies: constraint-based fuzz driver generation and context-based crash validation. The Constraint-based approach proactively enforces input and state constraints during fuzz driver generation, while the Context-based approach reactively validates crashes against feasible execution paths from program entry points. Evaluated on 1,555 OSS-Fuzz benchmarks, the methods reduce false positives by over 50% and decrease total crashes by up to 8%, with constraint satisfaction improving to about 64% of drivers meeting all constraints. The results demonstrate practical gains in scalability and reliability for automated fuzz driver pipelines, with manageable overhead (~9% of total API cost) and clear guidance on integrating AI agents into large-scale testing ecosystems. The work also provides open-source tooling and points to future directions in real-time constraint retrieval, improved crash analysis, and broader applicability beyond OSS-Fuzz-Gen.

Abstract

Fuzz testing has become a cornerstone technique for identifying software bugs and security vulnerabilities, with broad adoption in both industry and open-source communities. Directly fuzzing a function requires fuzz drivers, which translate random fuzzer inputs into valid arguments for the target function. Given the cost and expertise required to manually develop fuzz drivers, methods exist that leverage program analysis and Large Language Models to automatically generate these drivers. However, the generated fuzz drivers frequently lead to false positive crashes, especially in functions highly structured input and complex state requirements. This problem is especially crucial in industry-scale fuzz driver generation efforts like OSS-Fuzz-en, as reporting false positive crashes to maintainers impede trust in both the system and the team. This paper presents two AI-driven strategies to reduce false positives in OSS-Fuzz-Gen, a multi-agent system for automated fuzz driver generation. First, constraint-based fuzz driver generation proactively enforces constraints on a function's inputs and state to guide driver creation. Second, context-based crash validation reactively analyzes function callers to determine whether reported crashes are feasible from program entry points. Using 1,500 benchmark functions from OSS-Fuzz, we show that these strategies reduce spurious crashes by up to 8%, cut reported crashes by more than half, and demonstrate that frontier LLMs can serve as reliable program analysis agents. Our results highlight the promise and challenges of integrating AI into large-scale fuzzing pipelines.

Paper Structure

This paper contains 40 sections, 2 figures, 8 tables.

Figures (2)

  • Figure 1: OSS-Fuzz-Gen design showing its agents. A bottom-up approach is taken, targeting functions with low coverage. This design exhibits a high false positive rate.
  • Figure 2: Agent-driven strategies to mitigate false positive crashes in OSS-Fuzz-Gen (cf. \ref{['fig:oss-fuzz-gen-design']}). Semantic constraints developed by function analyzer improve fuzz driver quality and prevent false crashes. The crash validator analyzes project context to determine crash's feasibility and filter false positives. Agents use tools to access the project’s codebase.