Why Deep Jacobian Spectra Separate: Depth-Induced Scaling and Singular-Vector Alignment
Nathanaël Haas, Francçois Gatine, Augustin M Cosse, Zied Bouraoui
TL;DR
The paper identifies depth-induced exponential scaling of ordered Jacobian singular values and spectral separation as key signatures governing the dynamics of Jacobian spectra in deep networks. By introducing Fixed-Gates Linear Networks and gated products, it proves the existence of Lyapunov exponents for the top singular values at initialization and shows how spectral separation enforces alignment of dominant singular directions across products, enabling an approximate deep-linear-like, mode-wise singular-value evolution without balancing. The authors provide a rigorous theoretical framework complemented by experiments demonstrating depth scaling and alignment in fixed-gates models trained on MNIST, suggesting a mechanistic basis for emergent low-rank Jacobian structure and implicit bias. Overall, depth scaling coupled with spectral separation offers a tractable path to understanding gradient-based training biases in deep architectures and informs potential strategies for analyzing generalization in practice.
Abstract
Understanding why gradient-based training in deep networks exhibits strong implicit bias remains challenging, in part because tractable singular-value dynamics are typically available only for balanced deep linear models. We propose an alternative route based on two theoretically grounded and empirically testable signatures of deep Jacobians: depth-induced exponential scaling of ordered singular values and strong spectral separation. Adopting a fixed-gates view of piecewise-linear networks, where Jacobians reduce to products of masked linear maps within a single activation region, we prove the existence of Lyapunov exponents governing the top singular values at initialization, give closed-form expressions in a tractable masked model, and quantify finite-depth corrections. We further show that sufficiently strong separation forces singular-vector alignment in matrix products, yielding an approximately shared singular basis for intermediate Jacobians. Together, these results motivate an approximation regime in which singular-value dynamics become effectively decoupled, mirroring classical balanced deep-linear analyses without requiring balancing. Experiments in fixed-gates settings validate the predicted scaling, alignment, and resulting dynamics, supporting a mechanistic account of emergent low-rank Jacobian structure as a driver of implicit bias.
