Revisiting Glorot Initialization for Long-Range Linear Recurrences
Noga Bar, Mariia Seleznova, Yotam Alexander, Gitta Kutyniok, Raja Giryes
TL;DR
The paper analyzes the stability of Glorot initialization for linear RNNs in the long-sequence regime and shows that, despite the bulk spectrum lying within the unit disk, the spectral edge typically lies above one, causing hidden-state explosion under double-scaling (infinite width and length). It then introduces a simple, dimension-aware rescaling of Glorot that shifts the spectral radius slightly below one using a calibrated parameter, providing both theoretical guarantees and empirical validation of improved stability. The authors develop a variance-based lower bound on hidden-state growth for finite width and length, show distinct behaviors in finite versus double-scaling regimes, and demonstrate that the rescaled initializer maintains trainability on long-range benchmarks. These results motivate a separate theoretical treatment of recurrent initialization under long sequences and offer a practical baseline for stable long-range sequence modeling.
Abstract
Proper initialization is critical for Recurrent Neural Networks (RNNs), particularly in long-range reasoning tasks, where repeated application of the same weight matrix can cause vanishing or exploding signals. A common baseline for linear recurrences is Glorot initialization, designed to ensure stable signal propagation--but derived under the infinite-width, fixed-length regime--an unrealistic setting for RNNs processing long sequences. In this work, we show that Glorot initialization is in fact unstable: small positive deviations in the spectral radius are amplified through time and cause the hidden state to explode. Our theoretical analysis demonstrates that sequences of length $t = O(\sqrt{n})$, where $n$ is the hidden width, are sufficient to induce instability. To address this, we propose a simple, dimension-aware rescaling of Glorot that shifts the spectral radius slightly below one, preventing rapid signal explosion or decay. These results suggest that standard initialization schemes may break down in the long-sequence regime, motivating a separate line of theory for stable recurrent initialization.
