Table of Contents
Fetching ...

Deep reinforcement learning with time-scale invariant memory

Md Rysul Kabir, James Mochizuki-Freeman, Zoran Tiganj

TL;DR

This work addresses how artificial agents can learn temporal relationships across diverse time scales by introducing a scale-invariant memory mechanism. It builds a CogRNN that uses a real-domain Laplace transform and a learned inverse transform to produce a log-compressed, time-cell–like memory, enabling robust, scale-free learning in deep RL. Across interval timing, discrimination, DMS, and reproduction tasks, CogRNN matches or exceeds LSTM/RNN performance and generalizes across temporal scales, with neural activity mirroring biological time cells. The approach offers a principled bridge between neuroscience and AI, suggesting scale-invariant representations as a path to rapid adaptation in changing temporal environments.

Abstract

The ability to estimate temporal relationships is critical for both animals and artificial agents. Cognitive science and neuroscience provide remarkable insights into behavioral and neural aspects of temporal credit assignment. In particular, scale invariance of learning dynamics, observed in behavior and supported by neural data, is one of the key principles that governs animal perception: proportional rescaling of temporal relationships does not alter the overall learning efficiency. Here we integrate a computational neuroscience model of scale invariant memory into deep reinforcement learning (RL) agents. We first provide a theoretical analysis and then demonstrate through experiments that such agents can learn robustly across a wide range of temporal scales, unlike agents built with commonly used recurrent memory architectures such as LSTM. This result illustrates that incorporating computational principles from neuroscience and cognitive science into deep neural networks can enhance adaptability to complex temporal dynamics, mirroring some of the core properties of human learning.

Deep reinforcement learning with time-scale invariant memory

TL;DR

This work addresses how artificial agents can learn temporal relationships across diverse time scales by introducing a scale-invariant memory mechanism. It builds a CogRNN that uses a real-domain Laplace transform and a learned inverse transform to produce a log-compressed, time-cell–like memory, enabling robust, scale-free learning in deep RL. Across interval timing, discrimination, DMS, and reproduction tasks, CogRNN matches or exceeds LSTM/RNN performance and generalizes across temporal scales, with neural activity mirroring biological time cells. The approach offers a principled bridge between neuroscience and AI, suggesting scale-invariant representations as a path to rapid adaptation in changing temporal environments.

Abstract

The ability to estimate temporal relationships is critical for both animals and artificial agents. Cognitive science and neuroscience provide remarkable insights into behavioral and neural aspects of temporal credit assignment. In particular, scale invariance of learning dynamics, observed in behavior and supported by neural data, is one of the key principles that governs animal perception: proportional rescaling of temporal relationships does not alter the overall learning efficiency. Here we integrate a computational neuroscience model of scale invariant memory into deep reinforcement learning (RL) agents. We first provide a theoretical analysis and then demonstrate through experiments that such agents can learn robustly across a wide range of temporal scales, unlike agents built with commonly used recurrent memory architectures such as LSTM. This result illustrates that incorporating computational principles from neuroscience and cognitive science into deep neural networks can enhance adaptability to complex temporal dynamics, mirroring some of the core properties of human learning.

Paper Structure

This paper contains 28 sections, 5 equations, 18 figures, 2 tables.

Figures (18)

  • Figure 1: A. The six temporal intervals used in the task. At each trial, a random interval is selected and the agent has to indicate whether the interval was long or short. B. Schematic of the environment. After the agent crosses the start line, one of six delay intervals is presented.
  • Figure 2: Architecture of the (RL) agent. Observations from the environment are processed by a convolutional neural network to extract feature representations. These features are then passed to a recurrent memory module (simple RNN, LSTM or CogRNN), which captures temporal dependencies and provides context for the policy network ($\pi$) and value network ($V$).
  • Figure 3: A. Response of the CogRNN to $\delta$ pulses. Neurons in $F_{s;t}$ decay exponentially at a spectrum of time constants $\mathbf{s}$ implementing a discrete approximation of a real-domain Laplace transform. Neurons in $\tilde{f}_{\overset{*}{\tau};s}$ activate sequentially, resembling time cells. B. Log-compressed memory (bottom) of three signals that are rescaled versions of each other (top) at time $t=250$. Each circle represents the activity of individual neurons and their position along the x-axis corresponds to their peak time. C. Log-compressed memory turns rescaling into translation. The top plot is the same as the bottom plot in B, but with the x-axis corresponding to the neuron index instead of the peak time.
  • Figure 4: The performance (mean with standard error over five runs) across the four tasks for CogRNN and LSTM agents.
  • Figure 5: A. Output of convolution and pooling operations for three signals from Fig. \ref{['fig:Cog-RNN']}B. B. Performance of CogRNN ($\tilde{f}$) and RNN agents trained on the 1D interval timing task. The agents were trained on scale 1 and evaluated on scales 1, 2 and 4.
  • ...and 13 more figures