Table of Contents
Fetching ...

Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning

Hung Le, Kien Do, Dung Nguyen, Sunil Gupta, Svetha Venkatesh

TL;DR

This work introduces Stable Hadamard Memory (SHM), a memory-augmented framework for reinforcement learning that uses Hadamard-calibrated updates to selectively erase outdated memories and reinforce relevant ones. By formulating memory writing as $M_t = M_{t-1} \odot C_t + U_t$ and proposing a stability-driven calibration $C_{\theta}(x_t) = 1 + \tanh(\theta_t \otimes v_c(x_t))$, SHM achieves bounded memory growth and mitigates gradient vanishing/explosion. The authors provide a closed-form memory update, theoretical propositions about gradient behavior, and empirical demonstrations across meta-RL, long-horizon credit assignment, and POPGym showing consistent performance gains with modest computational overhead. The approach offers a scalable, parallelizable memory mechanism with strong generalization to evolving contexts, advancing the practicality of memory-augmented agents in challenging POMDPs.

Abstract

Effective decision-making in partially observable environments demands robust memory management. Despite their success in supervised learning, current deep-learning memory models struggle in reinforcement learning environments that are partially observable and long-term. They fail to efficiently capture relevant past information, adapt flexibly to changing observations, and maintain stable updates over long episodes. We theoretically analyze the limitations of existing memory models within a unified framework and introduce the Stable Hadamard Memory, a novel memory model for reinforcement learning agents. Our model dynamically adjusts memory by erasing no longer needed experiences and reinforcing crucial ones computationally efficiently. To this end, we leverage the Hadamard product for calibrating and updating memory, specifically designed to enhance memory capacity while mitigating numerical and learning challenges. Our approach significantly outperforms state-of-the-art memory-based methods on challenging partially observable benchmarks, such as meta-reinforcement learning, long-horizon credit assignment, and POPGym, demonstrating superior performance in handling long-term and evolving contexts.

Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning

TL;DR

This work introduces Stable Hadamard Memory (SHM), a memory-augmented framework for reinforcement learning that uses Hadamard-calibrated updates to selectively erase outdated memories and reinforce relevant ones. By formulating memory writing as and proposing a stability-driven calibration , SHM achieves bounded memory growth and mitigates gradient vanishing/explosion. The authors provide a closed-form memory update, theoretical propositions about gradient behavior, and empirical demonstrations across meta-RL, long-horizon credit assignment, and POPGym showing consistent performance gains with modest computational overhead. The approach offers a scalable, parallelizable memory mechanism with strong generalization to evolving contexts, advancing the practicality of memory-augmented agents in challenging POMDPs.

Abstract

Effective decision-making in partially observable environments demands robust memory management. Despite their success in supervised learning, current deep-learning memory models struggle in reinforcement learning environments that are partially observable and long-term. They fail to efficiently capture relevant past information, adapt flexibly to changing observations, and maintain stable updates over long episodes. We theoretically analyze the limitations of existing memory models within a unified framework and introduce the Stable Hadamard Memory, a novel memory model for reinforcement learning agents. Our model dynamically adjusts memory by erasing no longer needed experiences and reinforcing crucial ones computationally efficiently. To this end, we leverage the Hadamard product for calibrating and updating memory, specifically designed to enhance memory capacity while mitigating numerical and learning challenges. Our approach significantly outperforms state-of-the-art memory-based methods on challenging partially observable benchmarks, such as meta-reinforcement learning, long-horizon credit assignment, and POPGym, demonstrating superior performance in handling long-term and evolving contexts.

Paper Structure

This paper contains 24 sections, 3 theorems, 37 equations, 4 figures, 4 tables, 1 algorithm.

Key Result

Proposition 3

If calibration is enabled $C_{\theta}\left(x_{t}\right)\neq\boldsymbol{1}$, yet the calibration matrix is fixed, independent of the input $x_{t}$ ($\forall t:C_{\theta}\left(x_{t}\right)=\theta\in\mathbb{R}^{H\times H}$), numerical instability or learning difficulty will arise.

Figures (4)

  • Figure 1: Meta-RL: Wind and Point Robot learning curves. Mean $\pm$ std. over 5 runs.
  • Figure 2: Credit Assignment: Visual Match, Key-to-Door learning curves. Mean $\pm$ std. over 3 runs.
  • Figure 3: (a) Left: Return of calibration designs over 3 runs; Right: Calibration matrix cumulative product over 100 episodes. (b) Return of memory sizes $H$ on Autoencode-Easy (left) and Battleship-Easy (right). (c) Memory ($M$, top) and calibration ($C$, bottom) matrices over timesteps in Visual Match: SHM erases memory that are no longer required and strengthens the important ones.
  • Figure 4: POPGym learning curves: Mean $\pm$ std. over 3 runs.

Theorems & Definitions (13)

  • Remark 1
  • Remark 2
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • Proposition 5
  • proof
  • proof
  • Definition 6
  • ...and 3 more