Table of Contents
Fetching ...

LLM-Driven Adaptive Source-Sink Identification and False Positive Mitigation for Static Analysis

Shiyin Lin

TL;DR

Static analysis often suffers from incomplete source–sink specifications and high false positives. AdaTaint couples an LLVM-based taint analyzer with LLM-driven adaptive source–sink inference and neuro-symbolic path validation to ground model outputs in program facts, reducing hallucinations and preserving determinism. Empirical results show substantial false-positive reductions and recall improvements over state-of-the-art baselines on Juliet, SV-COMP, and real-world projects, while maintaining manageable runtime overhead. This hybrid approach offers a practical path toward more accurate, scalable, and developer-trustworthy vulnerability analysis in real software pipelines.

Abstract

Static analysis is effective for discovering software vulnerabilities but notoriously suffers from incomplete source--sink specifications and excessive false positives (FPs). We present \textsc{AdaTaint}, an LLM-driven taint analysis framework that adaptively infers source/sink specifications and filters spurious alerts through neuro-symbolic reasoning. Unlike LLM-only detectors, \textsc{AdaTaint} grounds model suggestions in program facts and constraint validation, ensuring both adaptability and determinism. We evaluate \textsc{AdaTaint} on Juliet 1.3, SV-COMP-style C benchmarks, and three large real-world projects. Results show that \textsc{AdaTaint} reduces false positives by \textbf{43.7\%} on average and improves recall by \textbf{11.2\%} compared to state-of-the-art baselines (CodeQL, Joern, and LLM-only pipelines), while maintaining competitive runtime overhead. These findings demonstrate that combining LLM inference with symbolic validation offers a practical path toward more accurate and reliable static vulnerability analysis.

LLM-Driven Adaptive Source-Sink Identification and False Positive Mitigation for Static Analysis

TL;DR

Static analysis often suffers from incomplete source–sink specifications and high false positives. AdaTaint couples an LLVM-based taint analyzer with LLM-driven adaptive source–sink inference and neuro-symbolic path validation to ground model outputs in program facts, reducing hallucinations and preserving determinism. Empirical results show substantial false-positive reductions and recall improvements over state-of-the-art baselines on Juliet, SV-COMP, and real-world projects, while maintaining manageable runtime overhead. This hybrid approach offers a practical path toward more accurate, scalable, and developer-trustworthy vulnerability analysis in real software pipelines.

Abstract

Static analysis is effective for discovering software vulnerabilities but notoriously suffers from incomplete source--sink specifications and excessive false positives (FPs). We present \textsc{AdaTaint}, an LLM-driven taint analysis framework that adaptively infers source/sink specifications and filters spurious alerts through neuro-symbolic reasoning. Unlike LLM-only detectors, \textsc{AdaTaint} grounds model suggestions in program facts and constraint validation, ensuring both adaptability and determinism. We evaluate \textsc{AdaTaint} on Juliet 1.3, SV-COMP-style C benchmarks, and three large real-world projects. Results show that \textsc{AdaTaint} reduces false positives by \textbf{43.7\%} on average and improves recall by \textbf{11.2\%} compared to state-of-the-art baselines (CodeQL, Joern, and LLM-only pipelines), while maintaining competitive runtime overhead. These findings demonstrate that combining LLM inference with symbolic validation offers a practical path toward more accurate and reliable static vulnerability analysis.

Paper Structure

This paper contains 25 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Overview of the proposed framework.