Table of Contents
Fetching ...

Scaling Teams or Scaling Time? Memory Enabled Lifelong Learning in LLM Multi-Agent Systems

Shanglin Wu, Yuyang Luo, Yueqing Liang, Kaiwen Shi, Yanfang Ye, Ali Payani, Kai Shu

Abstract

Large language model (LLM) multi-agent systems can scale along two distinct dimensions: by increasing the number of agents and by improving through accumulated experience over time. Although prior work has studied these dimensions separately, their interaction under realistic cost constraints remains unclear. In this paper, we introduce a conceptual scaling view of multi-agent systems that jointly considers team size and lifelong learning ability, and we study how memory design shares this landscape. To this end, we propose \textbf{LLMA-Mem}, a lifelong memory framework for LLM multi-agent systems under flexible memory topologies. We evaluate LLMA-Mem on \textsc{MultiAgentBench} across coding, research, and database environments. Empirically, LLMA-Mem consistently improves long-horizon performance over baselines while reducing cost. Our analysis further reveals a non-monotonic scaling landscape: larger teams do not always produce better long-term performance, and smaller teams can outperform larger ones when memory better supports the reuse of experience. These findings position memory design as a practical path for scaling multi-agent systems more effectively and more efficiently over time.

Scaling Teams or Scaling Time? Memory Enabled Lifelong Learning in LLM Multi-Agent Systems

Abstract

Large language model (LLM) multi-agent systems can scale along two distinct dimensions: by increasing the number of agents and by improving through accumulated experience over time. Although prior work has studied these dimensions separately, their interaction under realistic cost constraints remains unclear. In this paper, we introduce a conceptual scaling view of multi-agent systems that jointly considers team size and lifelong learning ability, and we study how memory design shares this landscape. To this end, we propose \textbf{LLMA-Mem}, a lifelong memory framework for LLM multi-agent systems under flexible memory topologies. We evaluate LLMA-Mem on \textsc{MultiAgentBench} across coding, research, and database environments. Empirically, LLMA-Mem consistently improves long-horizon performance over baselines while reducing cost. Our analysis further reveals a non-monotonic scaling landscape: larger teams do not always produce better long-term performance, and smaller teams can outperform larger ones when memory better supports the reuse of experience. These findings position memory design as a practical path for scaling multi-agent systems more effectively and more efficiently over time.

Paper Structure

This paper contains 29 sections, 8 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Scaling space and cost comparison of LLMA-Mem. Top: LLMA-Mem enabled scaling space for multi-agent systems. Bottom: Average token usage per task, where LLMA-Mem shows substantial cost reduction compared to baselines across models.
  • Figure 2: CMA curves for Claude-Sonnet-4.5 and Qwen3-next-80B, showing limited long-horizon adaptation.
  • Figure 3: LLMA-Mem maintains three memory components: episodic memory for task experiences, procedural memory for consolidated reusable strategies, and transactive memory for agent capabilities and team coordination. The right panel illustrates three memory topology configurations that determine how memory is distributed and accessed across agents.
  • Figure 4: During task execution, the system retrieves relevant memories using a relevance–importance score, updates episodic and transactive statistics after each task, and periodically consolidates episodic experiences into reusable procedural knowledge.
  • Figure 5: Cumulative moving average (CMA) curves on representative settings. LLMA-Mem shows more stable long-horizon improvement than MARBLE and A-Mem, with especially margins represented by DeepSeek-v3.2 and Qwen3-32B-Instruct.
  • ...and 1 more figures