Learnability Window in Gated Recurrent Neural Networks
Authors
Lorenzo Livi
Abstract
We develop a theoretical framework that explains how gating mechanisms determine the learnability window of recurrent neural networks, defined as the largest temporal horizon over which gradient information remains statistically recoverable. While classical analyses emphasize numerical stability of Jacobian products, we show that stability alone is insufficient: learnability is governed instead by the \emph{effective learning rates} , per-lag and per-neuron quantities obtained from first-order expansions of gate-induced Jacobian products in Backpropagation Through Time. These effective learning rates act as multiplicative filters that control both the magnitude and anisotropy of gradient transport. Under heavy-tailed (-stable) gradient noise, we prove that the minimal sample size required to detect a dependency at lag~ satisfies , where is the effective learning rate envelope. This leads to an explicit formula for and closed-form scaling laws for logarithmic, polynomial, and exponential decay of . The theory shows that the time-scale spectra induced by the effective learning rates are the dominant determinants of learnability. Broader or more heterogeneous spectra slow the decay of , enlarging the learnability window, while heavy-tailed noise compresses by limiting statistical concentration. By integrating gate-induced time-scale geometry with gradient noise and sample complexity, the framework identifies the effective learning rates as the primary objects that determine whether, when, and over what horizons recurrent networks can learn long-range temporal dependencies.