Table of Contents
Fetching ...

Improving Code Localization with Repository Memory

Boshi Wang, Weijian Xu, Yunsheng Li, Mei Gao, Yujia Xie, Huan Sun, Dongdong Chen

TL;DR

This work tackles the lack of long-term memory in repository-level code localization by introducing repository memory built from a project's commit history. It defines two memory stores—episodic memory of past commits and semantic memory of active code functionality—and integrates them with LocAgent to form memory-guided localization workflows. Empirical results on SWE-bench-verified and SWE-bench-live benchmarks show that memory-augmented localization significantly improves accuracy, with combined episodic and semantic memory offering the strongest gains. The findings demonstrate the practical value of long-term, repository-specific memory for expert-like reasoning in software engineering tasks and point to future work on adaptive memory usage and interface design.

Abstract

Code localization is a fundamental challenge in repository-level software engineering tasks such as bug fixing. While existing methods equip language agents with comprehensive tools/interfaces to fetch information from the repository, they overlook the critical aspect of memory, where each instance is typically handled from scratch assuming no prior repository knowledge. In contrast, human developers naturally build long-term repository memory, such as the functionality of key modules and associations between various bug types and their likely fix locations. In this work, we augment language agents with such memory by leveraging a repository's commit history - a rich yet underutilized resource that chronicles the codebase's evolution. We introduce tools that allow the agent to retrieve from a non-parametric memory encompassing recent historical commits and linked issues, as well as functionality summaries of actively evolving parts of the codebase identified via commit patterns. We demonstrate that augmenting such a memory can significantly improve LocAgent, a state-of-the-art localization framework, on both SWE-bench-verified and the more recent SWE-bench-live benchmarks. Our research contributes towards developing agents that can accumulate and leverage past experience for long-horizon tasks, more closely emulating the expertise of human developers.

Improving Code Localization with Repository Memory

TL;DR

This work tackles the lack of long-term memory in repository-level code localization by introducing repository memory built from a project's commit history. It defines two memory stores—episodic memory of past commits and semantic memory of active code functionality—and integrates them with LocAgent to form memory-guided localization workflows. Empirical results on SWE-bench-verified and SWE-bench-live benchmarks show that memory-augmented localization significantly improves accuracy, with combined episodic and semantic memory offering the strongest gains. The findings demonstrate the practical value of long-term, repository-specific memory for expert-like reasoning in software engineering tasks and point to future work on adaptive memory usage and interface design.

Abstract

Code localization is a fundamental challenge in repository-level software engineering tasks such as bug fixing. While existing methods equip language agents with comprehensive tools/interfaces to fetch information from the repository, they overlook the critical aspect of memory, where each instance is typically handled from scratch assuming no prior repository knowledge. In contrast, human developers naturally build long-term repository memory, such as the functionality of key modules and associations between various bug types and their likely fix locations. In this work, we augment language agents with such memory by leveraging a repository's commit history - a rich yet underutilized resource that chronicles the codebase's evolution. We introduce tools that allow the agent to retrieve from a non-parametric memory encompassing recent historical commits and linked issues, as well as functionality summaries of actively evolving parts of the codebase identified via commit patterns. We demonstrate that augmenting such a memory can significantly improve LocAgent, a state-of-the-art localization framework, on both SWE-bench-verified and the more recent SWE-bench-live benchmarks. Our research contributes towards developing agents that can accumulate and leverage past experience for long-horizon tasks, more closely emulating the expertise of human developers.

Paper Structure

This paper contains 14 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: An overview of our repository memory design. (a) We construct the memory by leveraging the recent commit history of the repository. This involves creating a searchable database of past commits and their linked issues, and identifying frequently edited files to let LLMs generate high-level functionality summaries. (b) The memory is accessed by the language agent via a set of tools that perform search based on custom queries and support closer examination of individual memory entries. Details in §\ref{['sec:repomem']}.
  • Figure 2: (Left) Localization trajectory of a failure case of LocAgent on SWE-bench-verified (django__django-14580). While the agent successfully traces some initial key entities, it fails to reason in greater depth and granularity to pinpoint the error source, resulting in wrong localizations. (Right) The original issue description (top), accompanying history commits obtained via simple keyword search on commit messages (middle), the source code and LLM-generated functionality summary of the ground truth target file containing the error source (bottom).
  • Figure 3: Tool use distribution for LocAgent vs. RepoMem. The introduction of memory-based tools drastically alters agent behavior. RepoMem significantly reduces its reliance on exhaustive exploration tools like TraverseGraph and direct code reading (RetrieveEntity), indicating a strategic shift from brute-force navigation to a more targeted, hypothesis-driven investigation guided by memory.
  • Figure 4: Per-example cost comparison (LA: LocAgent, RM: RepoMem). This scatter plot shows the LLM API cost for each example, where the $x$ and $y$ coordinates correspond to the cost of LocAgent and RepoMem, respectively. Points below the diagonal line indicate RepoMem was cheaper, while points above indicate it was more expensive. The high variance reveals that the efficiency impact of integrating memory is problem-dependent: it provides significant savings on some tasks but incurs overhead on others, a nuance missed by average cost metrics.
  • Figure 5: Documentation and example outputs from the memory tools.
  • ...and 3 more figures