Prometheus: Towards Long-Horizon Codebase Navigation for Repository-Level Problem Solving
Yue Pan, Zimin Chen, Siyu Lu, Zhaoyang Chu, Xiang Li, Han Li, Yang Feng, Claire Le Goues, Federica Sarro, Martin Monperrus, He Ye
TL;DR
Prometheus tackles the challenge of long-horizon codebase navigation by introducing a memory-centric framework that converts a code repository into a unified knowledge graph and augments context retrieval with working memory. This combination enables persistent, coherent reasoning across multiple steps and tasks, orchestrated through a four-agent pipeline for issue classification, bug reproduction, patch generation, and patch verification. Empirical evaluation on SWE-bench Verified and SWE-PolyBench Verified with GPT-5 shows state-of-the-art performance, including 74.4% and 33.8% resolution rates and strong multilingual generalization, with ablations highlighting the importance of memory, multi-patch selection, reproduction, and regression testing. The approach demonstrates that repository-level knowledge and memory-augmented retrieval can significantly improve reliability and efficiency in automated software maintenance, signaling practical benefits for real-world development workflows.
Abstract
Large Language Models (LLMs) have shown remarkable capabilities in automating software engineering tasks, spurring the emergence of coding agents that scaffold LLMs with external tools to resolve repository-level problems. However, existing agents still struggle to navigate large-scale codebases, as the Needle-in-a-Haystack problem persists even with million-token context windows, where relevant evidence is often overwhelmed by large volumes of irrelevant code and documentation. Prior codebase navigation approaches, including embedding-based retrieval, file-system exploration, and graph-based retrieval, address parts of this challenge but fail to capture the temporal continuity of agent reasoning, rendering agents stateless and causing repeated repository traversals that hinder scalable planning and reasoning. To address these limitations, we present Prometheus, a memory-centric coding agent framework for long-horizon codebase navigation. Prometheus represents the repository as a unified knowledge graph to encode semantic dependencies and employs a context engine augmented with working memory that retains and reuses previously explored contexts to ensure continuity across reasoning steps. Built upon this engine, Prometheus integrates memory-enhanced navigation into a multi-agent system for automated issue resolution, encompassing issue classification, bug reproduction, patch generation, and verification. Comprehensive experiments are conducted on two widely used issue resolution benchmarks, i.e., SWE-bench Verified and SWE-PolyBench Verified. Powered by GPT-5, Prometheus achieves state-of-the-art performance with 74.4% and 33.8% resolution rates on the two benchmarks, ranking Top-6 and Top-1 among open-source agent systems, respectively. Our data and code are available at https://github.com/EuniAI/Prometheus.
