Exploring the Promise and Limits of Real-Time Recurrent Learning
Kazuki Irie, Anand Gopalakrishnan, Jürgen Schmidhuber
TL;DR
This paper investigates the practical potential of Real-Time Recurrent Learning (RTRL) for real-world sequence processing by enforcing tractability through an element-wise recurrent architecture (eLSTM) and integrating it into an online actor-critic framework (R2AC). It demonstrates that exact RTRL, when restricted to a one-layer eLSTM, can scale to challenging RL benchmarks and outperform certain baselines on memory tasks while using substantially fewer environment frames. The study also candidly discusses limitations, notably the intractability of multi-layer RTRL and dependencies on TBPTT for the vision front-end, outlining directions for hardware optimization and hybrid approaches. Overall, the results show that RTRL can offer practical benefits in realistic tasks, informing future research in architectural design and efficient online gradient computation.
Abstract
Real-time recurrent learning (RTRL) for sequence-processing recurrent neural networks (RNNs) offers certain conceptual advantages over backpropagation through time (BPTT). RTRL requires neither caching past activations nor truncating context, and enables online learning. However, RTRL's time and space complexity make it impractical. To overcome this problem, most recent work on RTRL focuses on approximation theories, while experiments are often limited to diagnostic settings. Here we explore the practical promise of RTRL in more realistic settings. We study actor-critic methods that combine RTRL and policy gradients, and test them in several subsets of DMLab-30, ProcGen, and Atari-2600 environments. On DMLab memory tasks, our system trained on fewer than 1.2 B environmental frames is competitive with or outperforms well-known IMPALA and R2D2 baselines trained on 10 B frames. To scale to such challenging tasks, we focus on certain well-known neural architectures with element-wise recurrence, allowing for tractable RTRL without approximation. Importantly, we also discuss rarely addressed limitations of RTRL in real-world applications, such as its complexity in the multi-layer case.
