Table of Contents
Fetching ...

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

Siru Ouyang, Jun Yan, I-Hung Hsu, Yanfei Chen, Ke Jiang, Zifeng Wang, Rujun Han, Long T. Le, Samira Daruki, Xiangru Tang, Vishy Tirumalashetty, George Lee, Mahsan Rofouei, Hangfei Lin, Jiawei Han, Chen-Yu Lee, Tomas Pfister

TL;DR

This paper tackles the problem of agents forgetting accumulated experience across tasks and proposes ReasoningBank to distill transferable reasoning from both successes and failures. It couples ReasoningBank with MaTTS, enabling memory-guided, test-time scaling that generates diverse, contrastive experiences to refine memories. Across web-browsing and software-engineering benchmarks, the approach yields systematic gains in effectiveness and efficiency, with a strong synergy between memory quality and scaling. The work highlights memory-driven experience scaling as a new dimension for building self-evolving, emergent-agent behaviors.

Abstract

With the growing adoption of large language model agents in persistent real-world roles, they naturally encounter continuous streams of tasks. A key limitation, however, is their failure to learn from the accumulated interaction history, forcing them to discard valuable insights and repeat past errors. We propose ReasoningBank, a novel memory framework that distills generalizable reasoning strategies from an agent's self-judged successful and failed experiences. At test time, an agent retrieves relevant memories from ReasoningBank to inform its interaction and then integrates new learnings back, enabling it to become more capable over time. Building on this powerful experience learner, we further introduce memory-aware test-time scaling (MaTTS), which accelerates and diversifies this learning process by scaling up the agent's interaction experience. By allocating more compute to each task, the agent generates abundant, diverse experiences that provide rich contrastive signals for synthesizing higher-quality memory. The better memory in turn guides more effective scaling, establishing a powerful synergy between memory and test-time scaling. Across web browsing and software engineering benchmarks, ReasoningBank consistently outperforms existing memory mechanisms that store raw trajectories or only successful task routines, improving both effectiveness and efficiency; MaTTS further amplifies these gains. These findings establish memory-driven experience scaling as a new scaling dimension, enabling agents to self-evolve with emergent behaviors naturally arise.

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

TL;DR

This paper tackles the problem of agents forgetting accumulated experience across tasks and proposes ReasoningBank to distill transferable reasoning from both successes and failures. It couples ReasoningBank with MaTTS, enabling memory-guided, test-time scaling that generates diverse, contrastive experiences to refine memories. Across web-browsing and software-engineering benchmarks, the approach yields systematic gains in effectiveness and efficiency, with a strong synergy between memory quality and scaling. The work highlights memory-driven experience scaling as a new dimension for building self-evolving, emergent-agent behaviors.

Abstract

With the growing adoption of large language model agents in persistent real-world roles, they naturally encounter continuous streams of tasks. A key limitation, however, is their failure to learn from the accumulated interaction history, forcing them to discard valuable insights and repeat past errors. We propose ReasoningBank, a novel memory framework that distills generalizable reasoning strategies from an agent's self-judged successful and failed experiences. At test time, an agent retrieves relevant memories from ReasoningBank to inform its interaction and then integrates new learnings back, enabling it to become more capable over time. Building on this powerful experience learner, we further introduce memory-aware test-time scaling (MaTTS), which accelerates and diversifies this learning process by scaling up the agent's interaction experience. By allocating more compute to each task, the agent generates abundant, diverse experiences that provide rich contrastive signals for synthesizing higher-quality memory. The better memory in turn guides more effective scaling, establishing a powerful synergy between memory and test-time scaling. Across web browsing and software engineering benchmarks, ReasoningBank consistently outperforms existing memory mechanisms that store raw trajectories or only successful task routines, improving both effectiveness and efficiency; MaTTS further amplifies these gains. These findings establish memory-driven experience scaling as a new scaling dimension, enabling agents to self-evolve with emergent behaviors naturally arise.

Paper Structure

This paper contains 40 sections, 15 figures, 4 tables.

Figures (15)

  • Figure 1: ReasoningBank induces reusable reasoning strategies, making memory items more transferrable for future use. This enables agents to continuously evolve and achieve higher accumulative success rates than the "No Memory" baseline on the WebArena-Admin subset.
  • Figure 2: Overview of ReasoningBank. Experiences are distilled into structured memory items with a title, description, and content. For each new task, the agent retrieves relevant items to interact with the environment, and constructs new ones from both successful and failed trajectories. These items are then consolidated into ReasoningBank, forming a closed-loop memory process.
  • Figure 3: Comparison of (a) vanilla TTS and MaTTS with (b) parallel scaling, where self-contrast across multiple trajectories curates reliable memory, and (c) sequential scaling, where self-refinement enriches memory with intermediate reasoning signals.
  • Figure 4: Effect of scaling factor $k$ for MaTTS under with ReasoningBank on WebArena-Shopping subset. We compare (a) parallel and (b) sequential test-time scaling.
  • Figure 5: Snapshot of MaTTS on WebArena-Shopping subset with different memory mechanisms with $k=3$. We compute BoN for all $3$ trajectories and Pass@1 with one randomly selected trajectory.
  • ...and 10 more figures