Table of Contents
Fetching ...

FlexFL: Flexible and Effective Fault Localization with Open-Source Large Language Models

Chuyang Xu, Zhongxin Liu, Xiaoxue Ren, Gehao Zhang, Ming Liang, David Lo

TL;DR

FlexFL tackles the fragility of prior LLM-based fault localization by enabling flexible input handling and open-source LLMs. It employs a two-stage design: a space-reduction stage with Agent4SR plus traditional FL methods to generate a diverse candidate set, followed by a localization refinement stage with Agent4LR that deeply reasons over code snippets. Empirical results on Defects4J show FlexFL surpasses GPT-3.5-based baselines in Top-N, MAP, and MRR, and generalizes across multiple open-source LLMs and the GHRB dataset, highlighting privacy, cost, and scalability benefits. The work provides open-source replication materials and demonstrates practical utility for real-world debugging tasks.

Abstract

Due to the impressive code comprehension ability of Large Language Models (LLMs), a few studies have proposed to leverage LLMs to locate bugs, i.e., LLM-based FL, and demonstrated promising performance. However, first, these methods are limited in flexibility. They rely on bug-triggering test cases to perform FL and cannot make use of other available bug-related information, e.g., bug reports. Second, they are built upon proprietary LLMs, which are, although powerful, confronted with risks in data privacy. To address these limitations, we propose a novel LLM-based FL framework named FlexFL, which can flexibly leverage different types of bug-related information and effectively work with open-source LLMs. FlexFL is composed of two stages. In the first stage, FlexFL reduces the search space of buggy code using state-of-the-art FL techniques of different families and provides a candidate list of bug-related methods. In the second stage, FlexFL leverages LLMs to delve deeper to double-check the code snippets of methods suggested by the first stage and refine fault localization results. In each stage, FlexFL constructs agents based on open-source LLMs, which share the same pipeline that does not postulate any type of bug-related information and can interact with function calls without the out-of-the-box capability. Extensive experimental results on Defects4J demonstrate that FlexFL outperforms the baselines and can work with different open-source LLMs. Specifically, FlexFL with a lightweight open-source LLM Llama3-8B can locate 42 and 63 more bugs than two state-of-the-art LLM-based FL approaches AutoFL and AgentFL that both use GPT-3.5.

FlexFL: Flexible and Effective Fault Localization with Open-Source Large Language Models

TL;DR

FlexFL tackles the fragility of prior LLM-based fault localization by enabling flexible input handling and open-source LLMs. It employs a two-stage design: a space-reduction stage with Agent4SR plus traditional FL methods to generate a diverse candidate set, followed by a localization refinement stage with Agent4LR that deeply reasons over code snippets. Empirical results on Defects4J show FlexFL surpasses GPT-3.5-based baselines in Top-N, MAP, and MRR, and generalizes across multiple open-source LLMs and the GHRB dataset, highlighting privacy, cost, and scalability benefits. The work provides open-source replication materials and demonstrates practical utility for real-world debugging tasks.

Abstract

Due to the impressive code comprehension ability of Large Language Models (LLMs), a few studies have proposed to leverage LLMs to locate bugs, i.e., LLM-based FL, and demonstrated promising performance. However, first, these methods are limited in flexibility. They rely on bug-triggering test cases to perform FL and cannot make use of other available bug-related information, e.g., bug reports. Second, they are built upon proprietary LLMs, which are, although powerful, confronted with risks in data privacy. To address these limitations, we propose a novel LLM-based FL framework named FlexFL, which can flexibly leverage different types of bug-related information and effectively work with open-source LLMs. FlexFL is composed of two stages. In the first stage, FlexFL reduces the search space of buggy code using state-of-the-art FL techniques of different families and provides a candidate list of bug-related methods. In the second stage, FlexFL leverages LLMs to delve deeper to double-check the code snippets of methods suggested by the first stage and refine fault localization results. In each stage, FlexFL constructs agents based on open-source LLMs, which share the same pipeline that does not postulate any type of bug-related information and can interact with function calls without the out-of-the-box capability. Extensive experimental results on Defects4J demonstrate that FlexFL outperforms the baselines and can work with different open-source LLMs. Specifically, FlexFL with a lightweight open-source LLM Llama3-8B can locate 42 and 63 more bugs than two state-of-the-art LLM-based FL approaches AutoFL and AgentFL that both use GPT-3.5.

Paper Structure

This paper contains 36 sections, 3 equations, 4 figures, 12 tables, 1 algorithm.

Figures (4)

  • Figure 1: The Overall Framework of FlexFL
  • Figure 2: The pipeline of agents. Bold text in <> indicates placeholders for input contents or description of function calls designed in Section \ref{['section:function_call']}
  • Figure 3: Overlap Analysis of FlexFL and (a) non-LLM-based FL techniques (b) LLM-based FL approaches
  • Figure 4: Comparison of FlexFL with LBFL approaches on Defects4J (v1.0)