Quantifying Memory Use in Reinforcement Learning with Temporal Range

Rodney Lafuente-Mercado; Daniela Rus; T. Konstantin Rusch

Quantifying Memory Use in Reinforcement Learning with Temporal Range

Rodney Lafuente-Mercado, Daniela Rus, T. Konstantin Rusch

TL;DR

This work introduces Temporal Range, a model-agnostic metric that quantifies how far back a trained RL policy effectively looks by aggregating first-order temporal Jacobian norms into a magnitude-weighted average lag. It provides an axiomatic foundation for vector-output linear maps that yields unique unnormalized and normalized forms, and applies these ideas to nonlinear policies via local linearization; Temporal Range is invariant to uniform input and output rescalings. Empirically, TR is validated across POPGym diagnostics and control tasks on architectures including LEM, GRU, LSTM, and LinOSS, with results showing small ranges for fully observed control, larger ranges for memory-demanding tasks like Copy-$k$, and alignment with the minimal history window needed for near-optimal return. A compact LEM proxy enables TR computation when gradients are inaccessible and supports memory-efficient deployment by guiding the choice of minimal sufficient context. Overall, Temporal Range offers a practical, per-sequence readout of memory dependence that can inform architecture design, environment analysis, and resource-efficient deployment in reinforcement learning.

Abstract

How much does a trained RL policy actually use its past observations? We propose \emph{Temporal Range}, a model-agnostic metric that treats first-order sensitivities of multiple vector outputs across a temporal window to the input sequence as a temporal influence profile and summarizes it by the magnitude-weighted average lag. Temporal Range is computed via reverse-mode automatic differentiation from the Jacobian blocks $\partial y_s/\partial x_t\in\mathbb{R}^{c\times d}$ averaged over final timesteps $s\in\{t+1,\dots,T\}$ and is well-characterized in the linear setting by a small set of natural axioms. Across diagnostic and control tasks (POPGym; flicker/occlusion; Copy-$k$) and architectures (MLPs, RNNs, SSMs), Temporal Range (i) remains small in fully observed control, (ii) scales with the task's ground-truth lag in Copy-$k$, and (iii) aligns with the minimum history window required for near-optimal return as confirmed by window ablations. We also report Temporal Range for a compact Long Expressive Memory (LEM) policy trained on the task, using it as a proxy readout of task-level memory. Our axiomatic treatment draws on recent work on range measures, specialized here to temporal lag and extended to vector-valued outputs in the RL setting. Temporal Range thus offers a practical per-sequence readout of memory dependence for comparing agents and environments and for selecting the shortest sufficient context.

Quantifying Memory Use in Reinforcement Learning with Temporal Range

TL;DR

, and alignment with the minimal history window needed for near-optimal return. A compact LEM proxy enables TR computation when gradients are inaccessible and supports memory-efficient deployment by guiding the choice of minimal sufficient context. Overall, Temporal Range offers a practical, per-sequence readout of memory dependence that can inform architecture design, environment analysis, and resource-efficient deployment in reinforcement learning.

Abstract

averaged over final timesteps

and is well-characterized in the linear setting by a small set of natural axioms. Across diagnostic and control tasks (POPGym; flicker/occlusion; Copy-

) and architectures (MLPs, RNNs, SSMs), Temporal Range (i) remains small in fully observed control, (ii) scales with the task's ground-truth lag in Copy-

, and (iii) aligns with the minimum history window required for near-optimal return as confirmed by window ablations. We also report Temporal Range for a compact Long Expressive Memory (LEM) policy trained on the task, using it as a proxy readout of task-level memory. Our axiomatic treatment draws on recent work on range measures, specialized here to temporal lag and extended to vector-valued outputs in the RL setting. Temporal Range thus offers a practical per-sequence readout of memory dependence for comparing agents and environments and for selecting the shortest sufficient context.

Quantifying Memory Use in Reinforcement Learning with Temporal Range

TL;DR

Abstract

Quantifying Memory Use in Reinforcement Learning with Temporal Range

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (5)