Table of Contents
Fetching ...

A Generalized Spectral Framework to Expain Neural Scaling and Compression Dynamics

Yizhou Zhang

TL;DR

This work introduces a generalized spectral framework that unifies neural learning dynamics and compression phenomena through a flexible evolution function g(λ,t;β) parameterized by elasticity ρ. By linking the learning frontier, loss decay, and compression robustness within a single template, it recovers kernel/NTK and feature-learning limits as special cases and derives a universal complementarity between learning and compression. The theory recasts pruning as spectral truncation and quantization as spectral perturbation, predicting a consistent loss- and density-based scaling and a Densing Law where effective spectral density grows with compute. The framework offers concrete predictions for how model density, loss decay, and compression sensitivity evolve under compute, and outlines key open questions for extending the theory to multi-modal spectra and inverse identification from data.

Abstract

Empirical scaling laws describe how test loss and other performance metrics depend on model size, dataset size, and compute. While such laws are consistent within specific regimes, apparently distinct scaling behaviors have been reported for related settings such as model compression. Motivated by recent progress in spectral analyses of neural representations, this paper develops a \emph{generalized spectral framework} that unifies learning dynamics and compression phenomena under a common functional ansatz. We generalize the spectral evolution function from the linear kernel form $g(λt)=λt$ to an asymptotically polynomial function $g(λ,t;β)$, characterized by an effective spectral--temporal elasticity $ρ(β)$. This framework recovers existing lazy and feature-learning theories as special cases and yields an invariant relation between learning and compression

A Generalized Spectral Framework to Expain Neural Scaling and Compression Dynamics

TL;DR

This work introduces a generalized spectral framework that unifies neural learning dynamics and compression phenomena through a flexible evolution function g(λ,t;β) parameterized by elasticity ρ. By linking the learning frontier, loss decay, and compression robustness within a single template, it recovers kernel/NTK and feature-learning limits as special cases and derives a universal complementarity between learning and compression. The theory recasts pruning as spectral truncation and quantization as spectral perturbation, predicting a consistent loss- and density-based scaling and a Densing Law where effective spectral density grows with compute. The framework offers concrete predictions for how model density, loss decay, and compression sensitivity evolve under compute, and outlines key open questions for extending the theory to multi-modal spectra and inverse identification from data.

Abstract

Empirical scaling laws describe how test loss and other performance metrics depend on model size, dataset size, and compute. While such laws are consistent within specific regimes, apparently distinct scaling behaviors have been reported for related settings such as model compression. Motivated by recent progress in spectral analyses of neural representations, this paper develops a \emph{generalized spectral framework} that unifies learning dynamics and compression phenomena under a common functional ansatz. We generalize the spectral evolution function from the linear kernel form to an asymptotically polynomial function , characterized by an effective spectral--temporal elasticity . This framework recovers existing lazy and feature-learning theories as special cases and yields an invariant relation between learning and compression

Paper Structure

This paper contains 36 sections, 5 theorems, 100 equations.

Key Result

Proposition 2

Under standard regularity assumptions ($\lambda_k\!\sim\!k^{-b}$, $\lambda_k w_k^2\!\sim\!k^{-a}$, $a,b>1$) and for any $g$ satisfying the growth condition eq:g_growth, the loss eq:loss_def obeys where $\lambda_*(t)$ is defined by eq:frontier_def and $\lambda_{k_*(t)}\!\asymp\!\lambda_*(t)$. Hence the asymptotic decay of $L(t)$ is governed by the spectral tail beyond the frontier.

Theorems & Definitions (6)

  • Proposition 2: Tail dominance of the loss
  • Theorem 3: Combined Perturbation-Induced Loss Growth under Generalized Spectral Dynamics
  • Corollary 1: Complementary Relationship Between Compression Loss and Original Test Loss
  • Proposition 4: Rigorous expression of the tail dominance around the frontier
  • Lemma 5: Learning frontier induced by Condition \ref{['cond:poly']}
  • proof