ZeroFalse: Improving Precision in Static Analysis with LLMs

Mohsen Iranmanesh; Sina Moradi Sabet; Sina Marefat; Ali Javidi Ghasr; Allison Wilson; Iman Sharafaldin; Mohammad A. Tayebi

ZeroFalse: Improving Precision in Static Analysis with LLMs

Mohsen Iranmanesh, Sina Moradi Sabet, Sina Marefat, Ali Javidi Ghasr, Allison Wilson, Iman Sharafaldin, Mohammad A. Tayebi

TL;DR

Static analysis tools struggle with high false positives, eroding developer trust. ZeroFalse couples static analysis with LLM adjudication by enriching SARIF alerts with flow-sensitive dataflow traces and CWE-specific knowledge, using deterministic, schema-constrained prompts. Empirical results across ten LLMs and two datasets show that CWE-aware prompting and reasoning-oriented models achieve strong $F1$-scores while maintaining high recall, enabling practical CI/CD integration. The work demonstrates that structured context and domain-specific reasoning are critical for robust false-positive mitigation in real-world codebases. This approach paves the way for more reliable, scalable SAST-assisted security in large-scale software development pipelines.

Abstract

Static Application Security Testing (SAST) tools are integral to modern software development, yet their adoption is undermined by excessive false positives that weaken developer trust and demand costly manual triage. We present ZeroFalse, a framework that integrates static analysis with large language models (LLMs) to reduce false positives while preserving coverage. ZeroFalse treats static analyzer outputs as structured contracts, enriching them with flow-sensitive traces, contextual evidence, and CWE-specific knowledge before adjudication by an LLM. This design preserves the systematic reach of static analysis while leveraging the reasoning capabilities of LLMs. We evaluate ZeroFalse across both benchmarks and real-world projects using ten state-of-the-art LLMs. Our best-performing models achieve F1-scores of 0.912 on the OWASP Java Benchmark and 0.955 on the OpenVuln dataset, maintaining recall and precision above 90%. Results further show that CWE-specialized prompting consistently outperforms generic prompts, and reasoning-oriented LLMs provide the most reliable precision-recall balance. These findings position ZeroFalse as a practical and scalable approach for enhancing the reliability of SAST and supporting its integration into real-world CI/CD pipelines.

ZeroFalse: Improving Precision in Static Analysis with LLMs

TL;DR

Abstract

ZeroFalse: Improving Precision in Static Analysis with LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)