Table of Contents
Fetching ...

From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory

Siyu Xia, Zekun Xu, Jiajun Chai, Wentian Fan, Yan Song, Xiaohan Wang, Guojun Yin, Wei Lin, Haifeng Zhang, Jun Wang

TL;DR

This work tackles unstable decision-making and limited experience reuse in LLM-based agents by introducing a trainable, multi-layer graph memory that encodes past queries, decision trajectories, and high-level meta-cognition as a memory-prior. It builds a heterogeneous three-layer graph (queries, canonical transition paths from an FSM, and meta-cognition) and jointly optimizes edge utilities via reinforcement learning, integrating top-k memory strategies into the training loop as prompt augmentations. The approach yields robust cross-task generalization and improved performance, especially for smaller models, across seven QA benchmarks, and demonstrates efficiency gains in RL training by providing transferable strategic priors. Overall, the framework enables adaptive, strategy-aware agents that learn from their own experiences with explicit, interpretable memory that guides learning and reasoning in open-ended tasks.

Abstract

Large Language Models (LLMs) based agents have demonstrated remarkable potential in autonomous task-solving across complex, open-ended environments. A promising approach for improving the reasoning capabilities of LLM agents is to better utilize prior experiences in guiding current decisions. However, LLMs acquire experience either through implicit memory via training, which suffers from catastrophic forgetting and limited interpretability, or explicit memory via prompting, which lacks adaptability. In this paper, we introduce a novel agent-centric, trainable, multi-layered graph memory framework and evaluate how context memory enhances the ability of LLMs to utilize parametric information. The graph abstracts raw agent trajectories into structured decision paths in a state machine and further distills them into high-level, human-interpretable strategic meta-cognition. In order to make memory adaptable, we propose a reinforcement-based weight optimization procedure that estimates the empirical utility of each meta-cognition based on reward feedback from downstream tasks. These optimized strategies are then dynamically integrated into the LLM agent's training loop through meta-cognitive prompting. Empirically, the learnable graph memory delivers robust generalization, improves LLM agents' strategic reasoning performance, and provides consistent benefits during Reinforcement Learning (RL) training.

From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory

TL;DR

This work tackles unstable decision-making and limited experience reuse in LLM-based agents by introducing a trainable, multi-layer graph memory that encodes past queries, decision trajectories, and high-level meta-cognition as a memory-prior. It builds a heterogeneous three-layer graph (queries, canonical transition paths from an FSM, and meta-cognition) and jointly optimizes edge utilities via reinforcement learning, integrating top-k memory strategies into the training loop as prompt augmentations. The approach yields robust cross-task generalization and improved performance, especially for smaller models, across seven QA benchmarks, and demonstrates efficiency gains in RL training by providing transferable strategic priors. Overall, the framework enables adaptive, strategy-aware agents that learn from their own experiences with explicit, interpretable memory that guides learning and reasoning in open-ended tasks.

Abstract

Large Language Models (LLMs) based agents have demonstrated remarkable potential in autonomous task-solving across complex, open-ended environments. A promising approach for improving the reasoning capabilities of LLM agents is to better utilize prior experiences in guiding current decisions. However, LLMs acquire experience either through implicit memory via training, which suffers from catastrophic forgetting and limited interpretability, or explicit memory via prompting, which lacks adaptability. In this paper, we introduce a novel agent-centric, trainable, multi-layered graph memory framework and evaluate how context memory enhances the ability of LLMs to utilize parametric information. The graph abstracts raw agent trajectories into structured decision paths in a state machine and further distills them into high-level, human-interpretable strategic meta-cognition. In order to make memory adaptable, we propose a reinforcement-based weight optimization procedure that estimates the empirical utility of each meta-cognition based on reward feedback from downstream tasks. These optimized strategies are then dynamically integrated into the LLM agent's training loop through meta-cognitive prompting. Empirically, the learnable graph memory delivers robust generalization, improves LLM agents' strategic reasoning performance, and provides consistent benefits during Reinforcement Learning (RL) training.

Paper Structure

This paper contains 40 sections, 13 equations, 5 figures, 4 tables, 2 algorithms.

Figures (5)

  • Figure 1: Our method and existing approach Expel zhao2024expelllmagentsexperiential.
  • Figure 2: The framework of the proposed trainable memory. Stage 1 builds a graph from LLM trajectories, encoding queries, decision paths, and meta-cognition. Stage 2 estimates strategy utility via counterfactual rewards and updates graph weights. Stage 3 injects top-k strategies into RL training for policy optimization.
  • Figure 3: (a) Training curve of 4B models. (b) Training curve of 8B models.
  • Figure 4: Ablation studies of the structured memory framework. (a) and (b) show the effect of disabling weight optimization. (c) varying the number of meta-cognition $k$. (d) generalization across LLM backends.
  • Figure 5: Finite State Machine