Table of Contents
Fetching ...

Think Before You Act: Decision Transformers with Working Memory

Jikun Kang, Romain Laroche, Xingdi Yuan, Adam Trischler, Xue Liu, Jie Fu

TL;DR

The paper addresses inefficiency and forgetting in multi-task offline RL by integrating an explicit working memory module into a Decision Transformer (DT-Mem). It introduces a content-addressable memory that stores, blends, and retrieves task-relevant information, with memory updates guided by attention and retrieval by content addressing, and uses LoRA to fine-tune memory with limited data. Empirical results on Atari and Meta-World show that DT-Mem achieves better generalization with fewer parameters and faster training than prior memory-based DT methods, and that memory fine-tuning yields strong task adaptability. Overall, the approach enhances cross-task transfer and efficiency, suggesting a practical path toward scalable memory-augmented decision-making.

Abstract

Decision Transformer-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance relies on massive data and computation. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model's performance on previous tasks. In contrast to LLMs' implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Inspired by this, we propose a working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in Atari games and Meta-World object manipulation tasks. Moreover, we demonstrate that memory fine-tuning further enhances the adaptability of the proposed architecture.

Think Before You Act: Decision Transformers with Working Memory

TL;DR

The paper addresses inefficiency and forgetting in multi-task offline RL by integrating an explicit working memory module into a Decision Transformer (DT-Mem). It introduces a content-addressable memory that stores, blends, and retrieves task-relevant information, with memory updates guided by attention and retrieval by content addressing, and uses LoRA to fine-tune memory with limited data. Empirical results on Atari and Meta-World show that DT-Mem achieves better generalization with fewer parameters and faster training than prior memory-based DT methods, and that memory fine-tuning yields strong task adaptability. Overall, the approach enhances cross-task transfer and efficiency, suggesting a practical path toward scalable memory-augmented decision-making.

Abstract

Decision Transformer-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance relies on massive data and computation. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model's performance on previous tasks. In contrast to LLMs' implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Inspired by this, we propose a working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in Atari games and Meta-World object manipulation tasks. Moreover, we demonstrate that memory fine-tuning further enhances the adaptability of the proposed architecture.
Paper Structure (34 sections, 1 equation, 8 figures, 10 tables, 3 algorithms)

This paper contains 34 sections, 1 equation, 8 figures, 10 tables, 3 algorithms.

Figures (8)

  • Figure 1: Illustrating how a robot can use its memory to guide its playing strategy.
  • Figure 2: An overview of the proposed DT-Mem architecture. The input of the encoder is a fixed-length sequence of trajectories. The encoder with positional encoder module embeds the inputs and persists the temporal correlations between states and actions. The primary role of the attention module is to capture dependencies and relationships between states, actions, and returns in a sequence. Note that there are multiple attention modules stack together. Our design deconstructs this module and manages the memory flows between the attention module within each block. The output from attention blocks flows to the action decoder, which decodes back to the real actions.
  • Figure 3: Scaling of IQM scores
  • Figure 4: Top: Fine-tuning performance on 10% of dataset in unseen Atari games. For better visualization, the y-axis is the logarithm of DQN-normalized score. Bottom: The performance improvement for the training dataset.
  • Figure 5: This graph shows the prediction accuracy during training. Each curve represents three runs with different random seeds. For better visualization, MDT-200M is displayed in a separate figure.
  • ...and 3 more figures