Leveraging Large Language Model for Information Retrieval-based Bug Localization
Moumita Asad, Rafed Muhammad Yasir, Sam Malek
TL;DR
GenLoc tackles the vocabulary and metadata limitations of traditional IRBL by marrying semantic code-bug report retrieval with LLM-based iterative code exploration guided by external functions. It leverages embeddings, a vector database, and the ReAct framework to enable the model to reason over code and selectively examine relevant components, yielding superior accuracy and ranking quality on large real-world datasets and recent-bug benchmarks. The approach achieves notable improvements over both traditional IRBL and current LLM-based methods, while maintaining cost-effectiveness and practical runtimes, and demonstrates robustness to unseen bugs. Together with an ablation study and replication resources, GenLoc offers a strong, integrative direction for scalable, context-aware bug localization in real-world software engineering.
Abstract
Information Retrieval-based Bug Localization (IRBL) aims to identify buggy source files for a given bug report. Traditional and deep-learning-based IRBL techniques often suffer from vocabulary mismatch and dependence on project-specific metadata, while recent Large Language Model (LLM)-based approaches are limited by insufficient contextual information. To address these issues, we propose GenLoc, an LLM-based technique that combines semantic retrieval with code-exploration functions to iteratively analyze the code base and identify potential buggy files. We evaluate GenLoc on two diverse datasets: a benchmark of 9,097 bugs from six large open-source projects and the GHRB (GitHub Recent Bugs) dataset of 131 recent bugs across 16 projects. Results demonstrate that GenLoc substantially outperforms traditional IRBL, deep learning approaches and recent LLM-based methods, while also localizing bugs that other techniques fail to detect.
