Enhancing IR-based Fault Localization using Large Language Models
Shuai Shao, Tingting Yu
TL;DR
This paper tackles the challenge of IR-based fault localization by improving bug report analysis and query construction with large language models. It introduces LLmiRQ, which categorizes bug reports into programming entities, stack traces, and plain text, and tailors query construction accordingly, plus LLmiRQ+ for iterative, user-guided reformulation. A learning-to-rank model leveraging class name match and call graph features, along with traditional text-based features, ranks candidate files to locate faults more accurately. Evaluations on 46 projects and 6,340 bug reports show that LLmiRQ and LLmiRQ+ surpass seven state-of-the-art IRFL techniques, achieving $MRR=0.6770$ and $MAP=0.5118$, demonstrating significant practical impact for faster and more reliable fault localization.
Abstract
Information Retrieval-based Fault Localization (IRFL) techniques aim to identify source files containing the root causes of reported failures. While existing techniques excel in ranking source files, challenges persist in bug report analysis and query construction, leading to potential information loss. Leveraging large language models like GPT-4, this paper enhances IRFL by categorizing bug reports based on programming entities, stack traces, and natural language text. Tailored query strategies, the initial step in our approach (LLmiRQ), are applied to each category. To address inaccuracies in queries, we introduce a user and conversational-based query reformulation approach, termed LLmiRQ+. Additionally, to further enhance query utilization, we implement a learning-to-rank model that leverages key features such as class name match score and call graph score. This approach significantly improves the relevance and accuracy of queries. Evaluation on 46 projects with 6,340 bug reports yields an MRR of 0.6770 and MAP of 0.5118, surpassing seven state-of-the-art IRFL techniques, showcasing superior performance.
