Table of Contents
Fetching ...

A Multi-Agent Approach to Fault Localization via Graph-Based Retrieval and Reflexion

Md Nakhla Rafi, Dong Jae Kim, Tse-Hsun Chen, Shaowei Wang

TL;DR

This work introduces LLM4FL, a three-agent fault localization framework that combines order-aware data division, Graph-RAG-based code navigation through an inter-procedural call graph, and verbal reinforcement learning to iteratively refine fault rankings. Evaluated on Defects4J v2.0.0, LLM4FL outperforms strong baselines including AutoFL and SoapFL in Top-1 accuracy and remains cost-effective, while ablation studies attribute the largest gains to coverage division and graph navigation. The approach addresses token limitations and repository-scale reasoning by decomposing large artifacts and guiding LLM reasoning with structured, graph-informed prompts. The results suggest that integrating multi-agent coordination with graphical code analysis and self-critique can meaningfully improve scalable fault localization in real-world Java projects.

Abstract

Identifying and resolving software faults remains a challenging and resource-intensive process. Traditional fault localization techniques, such as Spectrum-Based Fault Localization (SBFL), leverage statistical analysis of test coverage but often suffer from limited accuracy. While learning-based approaches improve fault localization, they demand extensive training datasets and high computational resources. Recent advances in Large Language Models (LLMs) offer new opportunities by enhancing code understanding and reasoning. However, existing LLM-based fault localization techniques face significant challenges, including token limitations, performance degradation with long inputs, and scalability issues in complex software systems. To overcome these obstacles, we propose LLM4FL, a multi-agent fault localization framework that utilizes three specialized LLM agents. First, the Context Extraction Agent applies an order-sensitive segmentation strategy to partition large coverage data within the LLM's token limit, analyze failure context, and prioritize failure-related methods. The Debugger Agent then processes the extracted data, which employs graph-based retrieval-augmented code navigation to reason about failure causes and rank suspicious methods. Finally, the Reviewer Agent re-evaluates the identified faulty methods using verbal reinforcement learning, engaging in self-criticism and iterative refinement. Evaluated on the Defects4J (V2.0.0) benchmark, which includes 675 faults from 14 Java projects, LLM4FL achieves an 18.55\% improvement in Top-1 accuracy over AutoFL and 4.82\% over SoapFL. It outperforms supervised techniques such as DeepFL and Grace, all without requiring task-specific training. Furthermore, its coverage segmentation and prompt chaining strategies enhance performance, increasing Top-1 accuracy by up to 22\%.

A Multi-Agent Approach to Fault Localization via Graph-Based Retrieval and Reflexion

TL;DR

This work introduces LLM4FL, a three-agent fault localization framework that combines order-aware data division, Graph-RAG-based code navigation through an inter-procedural call graph, and verbal reinforcement learning to iteratively refine fault rankings. Evaluated on Defects4J v2.0.0, LLM4FL outperforms strong baselines including AutoFL and SoapFL in Top-1 accuracy and remains cost-effective, while ablation studies attribute the largest gains to coverage division and graph navigation. The approach addresses token limitations and repository-scale reasoning by decomposing large artifacts and guiding LLM reasoning with structured, graph-informed prompts. The results suggest that integrating multi-agent coordination with graphical code analysis and self-critique can meaningfully improve scalable fault localization in real-world Java projects.

Abstract

Identifying and resolving software faults remains a challenging and resource-intensive process. Traditional fault localization techniques, such as Spectrum-Based Fault Localization (SBFL), leverage statistical analysis of test coverage but often suffer from limited accuracy. While learning-based approaches improve fault localization, they demand extensive training datasets and high computational resources. Recent advances in Large Language Models (LLMs) offer new opportunities by enhancing code understanding and reasoning. However, existing LLM-based fault localization techniques face significant challenges, including token limitations, performance degradation with long inputs, and scalability issues in complex software systems. To overcome these obstacles, we propose LLM4FL, a multi-agent fault localization framework that utilizes three specialized LLM agents. First, the Context Extraction Agent applies an order-sensitive segmentation strategy to partition large coverage data within the LLM's token limit, analyze failure context, and prioritize failure-related methods. The Debugger Agent then processes the extracted data, which employs graph-based retrieval-augmented code navigation to reason about failure causes and rank suspicious methods. Finally, the Reviewer Agent re-evaluates the identified faulty methods using verbal reinforcement learning, engaging in self-criticism and iterative refinement. Evaluated on the Defects4J (V2.0.0) benchmark, which includes 675 faults from 14 Java projects, LLM4FL achieves an 18.55\% improvement in Top-1 accuracy over AutoFL and 4.82\% over SoapFL. It outperforms supervised techniques such as DeepFL and Grace, all without requiring task-specific training. Furthermore, its coverage segmentation and prompt chaining strategies enhance performance, increasing Top-1 accuracy by up to 22\%.
Paper Structure (15 sections, 1 equation, 3 figures, 4 tables)

This paper contains 15 sections, 1 equation, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of LLM4FL for Multi-Agent Fault Localization, illustrating how agents collaborate to analyze software artifacts, extract failure reasoning, perform graph-based retrieval-augmented code navigation, and rank faulty methods using verbal reinforcement learning.
  • Figure 2: Context-Extraction Agent uses tool-chains to preprocess software artifacts for Lang-5 to emphasize the test failure context. For (i) stack-trace the agent prunes external libraries and (ii) test code, the agent prunes statements in the test code after the assertion failure.
  • Figure 3: Fault localization results when using different method sorting strategies during the segmentation process.