Table of Contents
Fetching ...

Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions

Frank Wu, Mengye Ren

TL;DR

This work tackles the limitations of backpropagation for biologically plausible reinforcement learning by extending the Forward-Forward paradigm to RL through ARQ, an Action-conditioned Root mean squared Q-function. ARQ estimates $Q_ heta(s,a)$ using a vector-based goodness measure applied to hidden activations and conditions on actions at the model input, enabling backprop-free local learning with arbitrary hidden dimensions. Empirically, ARQ outperforms state-of-the-art backprop-free methods and often exceeds backprop-based baselines on MinAtar and the DeepMind Control Suite, with ablations showing the importance of input-based action conditioning and RMS goodness. The results suggest that decentralized, reward-centered learning with local value estimates can achieve strong decision-making while aligning with biological learning principles, potentially guiding future research at the intersection of RL and neuroscience.

Abstract

The Forward-Forward (FF) Algorithm is a recently proposed learning procedure for neural networks that employs two forward passes instead of the traditional forward and backward passes used in backpropagation. However, FF remains largely confined to supervised settings, leaving a gap at domains where learning signals can be yielded more naturally such as RL. In this work, inspired by FF's goodness function using layer activity statistics, we introduce Action-conditioned Root mean squared Q-Functions (ARQ), a novel value estimation method that applies a goodness function and action conditioning for local RL using temporal difference learning. Despite its simplicity and biological grounding, our approach achieves superior performance compared to state-of-the-art local backprop-free RL methods in the MinAtar and the DeepMind Control Suite benchmarks, while also outperforming algorithms trained with backpropagation on most tasks. Code can be found at https://github.com/agentic-learning-ai-lab/arq.

Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions

TL;DR

This work tackles the limitations of backpropagation for biologically plausible reinforcement learning by extending the Forward-Forward paradigm to RL through ARQ, an Action-conditioned Root mean squared Q-function. ARQ estimates using a vector-based goodness measure applied to hidden activations and conditions on actions at the model input, enabling backprop-free local learning with arbitrary hidden dimensions. Empirically, ARQ outperforms state-of-the-art backprop-free methods and often exceeds backprop-based baselines on MinAtar and the DeepMind Control Suite, with ablations showing the importance of input-based action conditioning and RMS goodness. The results suggest that decentralized, reward-centered learning with local value estimates can achieve strong decision-making while aligning with biological learning principles, potentially guiding future research at the intersection of RL and neuroscience.

Abstract

The Forward-Forward (FF) Algorithm is a recently proposed learning procedure for neural networks that employs two forward passes instead of the traditional forward and backward passes used in backpropagation. However, FF remains largely confined to supervised settings, leaving a gap at domains where learning signals can be yielded more naturally such as RL. In this work, inspired by FF's goodness function using layer activity statistics, we introduce Action-conditioned Root mean squared Q-Functions (ARQ), a novel value estimation method that applies a goodness function and action conditioning for local RL using temporal difference learning. Despite its simplicity and biological grounding, our approach achieves superior performance compared to state-of-the-art local backprop-free RL methods in the MinAtar and the DeepMind Control Suite benchmarks, while also outperforming algorithms trained with backpropagation on most tasks. Code can be found at https://github.com/agentic-learning-ai-lab/arq.

Paper Structure

This paper contains 27 sections, 9 equations, 6 figures, 3 tables, 2 algorithms.

Figures (6)

  • Figure 1: Local learning paradigms inspired by the Forward-Forward (FF) algorithm hinton2022forward. a) The original FF is designed for supervised learning, where each layer models the "goodness" between image $x$ and label $y$. Information is carried forward only through bottom-up and optionally top-down connections without backpropagation. b) We extend FF local learning for reinforcement learning---each layer takes a state observation $x$ and an action candidate $a$ as input, and estimates the Q value by taking the root mean squared function of the hidden vector.
  • Figure 2: High-level computation diagram between guan2024ad and ARQ. Key implementations of ARQ are highlighted in red. AD cells take activations (highlighted in blue, darker color means earlier layer) and the state observation as input and produces a vector of size $n_a$, each indicating the value prediction of an action candidate. Our ARQ takes activations, the state observation, and the action candidate as input, and produces a hidden vector of arbitrary size, before passing it through a root mean squared function to yield a scalar prediction.
  • Figure 3: AD guan2024ad
  • Figure 4: Training performance on the MinAtar games, compared between DQN (blue), AD (orange), and ARQ(green). The x-axis denotes the number of training steps (in millions), and the y-axis indicates average episodic returns. Shaded regions represent standard deviations across 3 seeds. We find that ARQ consistently outperforms AD in all MinAtar games, while outperforming DQN in most games.
  • Figure 5: Ablation on action conditioning for AD and ARQ. Action conditioning substantially improves performance. Note that this improvement is particularly significant for ARQ, with average returns of $\sim$85 vs. $\sim$55, a 50$\%$ improvement. This indicates that the combination of the RMS function and action conditioning makes ARQ effective.
  • ...and 1 more figures