Table of Contents
Fetching ...

Your Code Agent Can Grow Alongside You with Structured Memory

Yi-Xuan Deng, Xiaoqin Liu, Yi Zhang, Guo-Wei Yang, Shuojin Yang

Abstract

While "Intent-oriented programming" (or "Vibe Coding") redefines software engineering, existing code agents remain tethered to static code snapshots. Consequently, they struggle to model the critical information embedded in the temporal evolution of projects, failing to leverage the "reasoning trajectories" implicit in past successful practices. This limitation results in rigid behavioral logic and a lack of autonomous adaptability, ultimately hindering their ability to tackle complex, repository-level problems. To bridge this static-dynamic mismatch, we propose MemCoder, a framework designed to enable continual human-AI co-evolution. MemCoder first structures historical human experience to distill latent intent-to-code mappings from past commits. It then employs a self-refinement mechanism driven by verification feedback to correct agent behavior in real-time. Crucially, an experience self-internalization mechanism is introduced to crystallize human-validated solutions into long-term knowledge, thereby supporting sustained evolution. Experimental results on SWE-bench Verified demonstrate that MemCoder not only achieves State-of-the-Art (SOTA) performance but also delivers a 9.4% improvement in resolved rate over the general foundation model DeepSeek-V3.2. These findings indicate that equipping agents with the capability to co-evolve with humans via project history and real-time feedback effectively unlocks the potential of general models in complex software engineering tasks.

Your Code Agent Can Grow Alongside You with Structured Memory

Abstract

While "Intent-oriented programming" (or "Vibe Coding") redefines software engineering, existing code agents remain tethered to static code snapshots. Consequently, they struggle to model the critical information embedded in the temporal evolution of projects, failing to leverage the "reasoning trajectories" implicit in past successful practices. This limitation results in rigid behavioral logic and a lack of autonomous adaptability, ultimately hindering their ability to tackle complex, repository-level problems. To bridge this static-dynamic mismatch, we propose MemCoder, a framework designed to enable continual human-AI co-evolution. MemCoder first structures historical human experience to distill latent intent-to-code mappings from past commits. It then employs a self-refinement mechanism driven by verification feedback to correct agent behavior in real-time. Crucially, an experience self-internalization mechanism is introduced to crystallize human-validated solutions into long-term knowledge, thereby supporting sustained evolution. Experimental results on SWE-bench Verified demonstrate that MemCoder not only achieves State-of-the-Art (SOTA) performance but also delivers a 9.4% improvement in resolved rate over the general foundation model DeepSeek-V3.2. These findings indicate that equipping agents with the capability to co-evolve with humans via project history and real-time feedback effectively unlocks the potential of general models in complex software engineering tasks.
Paper Structure (27 sections, 10 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 27 sections, 10 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison of MemCoder with existing methods. MemCoder facilitates evolution by learning the intrinsic mapping from high-level intent to concrete code implementation, derived from structured memory.
  • Figure 2: Architectural overview of MemCoder, illustrating a closed-loop human--AI co-evolution paradigm.In Stage 1, MemCoder reconstructs developer cognition by distilling raw commit histories into structured long-term memory, capturing latent intent-to-code mappings from historical human practices. In Stage 2, the agent performs context-aware dual-stage retrieval to access relevant experience, while a Refining Sub-agent enables execution-time self-refinement through prompt concretization, automated test generation, and verification feedback. Crucially, human-validated solutions are subsequently internalized into long-term memory, closing the evolutionary loop and enabling the agent to progressively align with repository-specific conventions across iterations.
  • Figure 3: Comparison of MemCoder with the top 6 methods on the SWE-bench Verified leaderboard as of January 20, 2026.
  • Figure 4: Performance of MemCoder across various top-k values. The metrics include resolved rate, average number of retrieved historical experiences, and average tool call frequency. All experiments are conducted using DeepSeek-V3.2 as the backbone model and evaluated on a randomly sampled 200-instance subset of the SWE-bench Verified dataset.
  • Figure 5: Performance of MemCoder across various top-k values with the retrieval frequency restricted to one. The metric displayed is the Resolve Rate (%). All experiments are conducted using DeepSeek-V3.2 as the backbone model and evaluated on a randomly sampled 200-instance subset of the SWE-bench Verified dataset.