A Quantitative and Qualitative Evaluation of LLM-Based Explainable Fault Localization
Sungmin Kang, Gabin An, Shin Yoo
TL;DR
AutoFL tackles the need for explainable fault localization by combining LLM-based reasoning with repository navigation via function calls to overcome context length limits. It outputs root-cause explanations and precise fault locations at the method level across Java and Python. The study shows AutoFL matching or surpassing SBFL/MBFL baselines on real-world bugs and demonstrates that developers value natural language explanations and prefer a few high-quality insights. It also introduces a confidence-aware mechanism to estimate result reliability and to filter outputs, pointing to practical adoption paths for explainable debugging.
Abstract
Fault Localization (FL), in which a developer seeks to identify which part of the code is malfunctioning and needs to be fixed, is a recurring challenge in debugging. To reduce developer burden, many automated FL techniques have been proposed. However, prior work has noted that existing techniques fail to provide rationales for the suggested locations, hindering developer adoption of these techniques. With this in mind, we propose AutoFL, a Large Language Model (LLM)-based FL technique that generates an explanation of the bug along with a suggested fault location. AutoFL prompts an LLM to use function calls to navigate a repository, so that it can effectively localize faults over a large software repository and overcome the limit of the LLM context length. Extensive experiments on 798 real-world bugs in Java and Python reveal AutoFL improves method-level acc@1 by up to 233.3% over baselines. Furthermore, developers were interviewed on their impression of AutoFL-generated explanations, showing that developers generally liked the natural language explanations of AutoFL, and that they preferred reading a few, high-quality explanations instead of many.
