LocAgent: Graph-Guided LLM Agents for Code Localization
Zhaoling Chen, Xiangru Tang, Gangda Deng, Fang Wu, Jialong Wu, Zhiwei Jiang, Viktor Prasanna, Arman Cohan, Xingyao Wang
TL;DR
This paper tackles code localization by bridging natural language issue descriptions to precise code changes through LocAgent, a framework that represents codebases as a heterogeneous directed graph of files, classes, and functions with rich dependency edges. It introduces a lightweight, sparse indexing scheme and a unified set of agent tools (SearchEntity, TraverseGraph, RetrieveEntity) to enable multi-hop reasoning by LLMs, yielding strong localization performance at reduced cost. To support robust evaluation, Loc-Bench is proposed as a diverse benchmark addressing contamination and maintenance-task coverage beyond bug fixes, demonstrating the practicality of open-source model fine-tuning (Qwen-2.5-Coder-Instruct) via LoRA to rival proprietary models. The results show improved localization accuracy and positive downstream effects on GitHub issue repair, underscoring the method’s potential for real-world software maintenance and scalable deployment.
Abstract
Code localization--identifying precisely where in a codebase changes need to be made--is a fundamental yet challenging task in software maintenance. Existing approaches struggle to efficiently navigate complex codebases when identifying relevant code sections. The challenge lies in bridging natural language problem descriptions with the appropriate code elements, often requiring reasoning across hierarchical structures and multiple dependencies. We introduce LocAgent, a framework that addresses code localization through graph-based representation. By parsing codebases into directed heterogeneous graphs, LocAgent creates a lightweight representation that captures code structures (files, classes, functions) and their dependencies (imports, invocations, inheritance), enabling LLM agents to effectively search and locate relevant entities through powerful multi-hop reasoning. Experimental results on real-world benchmarks demonstrate that our approach significantly enhances accuracy in code localization. Notably, our method with the fine-tuned Qwen-2.5-Coder-Instruct-32B model achieves comparable results to SOTA proprietary models at greatly reduced cost (approximately 86% reduction), reaching up to 92.7% accuracy on file-level localization while improving downstream GitHub issue resolution success rates by 12% for multiple attempts (Pass@10). Our code is available at https://github.com/gersteinlab/LocAgent.
