Table of Contents
Fetching ...

From Trace to Line: LLM Agent for Real-World OSS Vulnerability Localization

Haoran Xi, Minghao Shao, Brendan Dolan-Gavitt, Muhammad Shafique, Ramesh Karri

TL;DR

The paper tackles the challenge of line-level vulnerability localization in real-world OSS using LLMs guided by runtime evidence. It introduces T2L-Agent with the Agentic Trace Analyzer (ATA), Divergence Tracing, and a two-stage refinement process to move from module-level detection to exact line-level localization. A new T2L-ARVO benchmark of 50 expert-verified cases across five crash families enables realistic evaluation. Results show up to 58% chunk-level detection and 54.8% exact line localization, demonstrating the viability of deployable, precision diagnostics for open-source software workflows.

Abstract

Large language models show promise for vulnerability discovery, yet prevailing methods inspect code in isolation, struggle with long contexts, and focus on coarse function or file level detections which offers limited actionable guidance to engineers who need precise line-level localization and targeted patches in real-world software development. We present T2L-Agent (Trace-to-Line Agent), a project-level, end-to-end framework that plans its own analysis and progressively narrows scope from modules to exact vulnerable lines. T2L-Agent couples multi-round feedback with an Agentic Trace Analyzer (ATA) that fuses run-time evidence such as crash points, stack traces, and coverage deltas with AST-based code chunking, enabling iterative refinement beyond single pass predictions and translating symptoms into actionable, line-level diagnoses. To benchmark line-level vulnerability discovery, we introduce T2L-ARVO, a diverse, expert-verified 50-case benchmark spanning five crash families and real-world projects. T2L-ARVO is specifically designed to support both coarse-grained detection and fine-grained localization, enabling rigorous evaluation of systems that aim to move beyond file-level predictions. On T2L-ARVO, T2L-Agent achieves up to 58.0% detection and 54.8% line-level localization, substantially outperforming baselines. Together, the framework and benchmark push LLM-based vulnerability detection from coarse identification toward deployable, robust, precision diagnostics that reduce noise and accelerate patching in open-source software workflows.

From Trace to Line: LLM Agent for Real-World OSS Vulnerability Localization

TL;DR

The paper tackles the challenge of line-level vulnerability localization in real-world OSS using LLMs guided by runtime evidence. It introduces T2L-Agent with the Agentic Trace Analyzer (ATA), Divergence Tracing, and a two-stage refinement process to move from module-level detection to exact line-level localization. A new T2L-ARVO benchmark of 50 expert-verified cases across five crash families enables realistic evaluation. Results show up to 58% chunk-level detection and 54.8% exact line localization, demonstrating the viability of deployable, precision diagnostics for open-source software workflows.

Abstract

Large language models show promise for vulnerability discovery, yet prevailing methods inspect code in isolation, struggle with long contexts, and focus on coarse function or file level detections which offers limited actionable guidance to engineers who need precise line-level localization and targeted patches in real-world software development. We present T2L-Agent (Trace-to-Line Agent), a project-level, end-to-end framework that plans its own analysis and progressively narrows scope from modules to exact vulnerable lines. T2L-Agent couples multi-round feedback with an Agentic Trace Analyzer (ATA) that fuses run-time evidence such as crash points, stack traces, and coverage deltas with AST-based code chunking, enabling iterative refinement beyond single pass predictions and translating symptoms into actionable, line-level diagnoses. To benchmark line-level vulnerability discovery, we introduce T2L-ARVO, a diverse, expert-verified 50-case benchmark spanning five crash families and real-world projects. T2L-ARVO is specifically designed to support both coarse-grained detection and fine-grained localization, enabling rigorous evaluation of systems that aim to move beyond file-level predictions. On T2L-ARVO, T2L-Agent achieves up to 58.0% detection and 54.8% line-level localization, substantially outperforming baselines. Together, the framework and benchmark push LLM-based vulnerability detection from coarse identification toward deployable, robust, precision diagnostics that reduce noise and accelerate patching in open-source software workflows.

Paper Structure

This paper contains 22 sections, 33 figures, 3 tables.

Figures (33)

  • Figure 1: T2L-Agent Framework overview.
  • Figure 2: Related Works
  • Figure 3: ATA Components
  • Figure 4: Partial T2L-Agent logs to show the how the three proposed technique on T2L-Agent work and help the task: Detection Refinement, Divergence Tracing and Agentic Trace Analyzer.
  • Figure 5: Crash types in T2L-ARVO Bench.
  • ...and 28 more figures