Real-Time Recurrent Reinforcement Learning
Julian Lemmel, Radu Grosu
TL;DR
This work tackles learning in partially observable environments by proposing Real-Time Recurrent Reinforcement Learning (RTRRL), a biologically plausible RL framework that operates online without backpropagation through time. RTRRL combines a Meta-RL RNN backbone with a TD($\lambda$) actor-critic loop and online gradient computation via RFLO or RTRL (or their LRUs), using random feedback alignment to avoid weight transport. The approach demonstrates competitive performance across POMDP benchmarks, memory tasks, and physics simulations, while offering insights into basal ganglia-like reward pathways and potential energy-efficient neuromorphic implementations. Overall, RTRRL provides a principled, online, neuroscience-inspired alternative to BPTT-based RL that remains effective in partially observable settings and aligns with biological learning principles.
Abstract
We introduce a biologically plausible RL framework for solving tasks in partially observable Markov decision processes (POMDPs). The proposed algorithm combines three integral parts: (1) A Meta-RL architecture, resembling the mammalian basal ganglia; (2) A biologically plausible reinforcement learning algorithm, exploiting temporal difference learning and eligibility traces to train the policy and the value-function; (3) An online automatic differentiation algorithm for computing the gradients with respect to parameters of a shared recurrent network backbone. Our experimental results show that the method is capable of solving a diverse set of partially observable reinforcement learning tasks. The algorithm we call real-time recurrent reinforcement learning (RTRRL) serves as a model of learning in biological neural networks, mimicking reward pathways in the basal ganglia.
