Table of Contents
Fetching ...

When Does Learning Renormalize? Sufficient Conditions for Power Law Spectral Dynamics

Yizhou Zhang

TL;DR

The paper investigates when neural learning dynamics exhibit power-law scaling by embedding training in the Generalized Resolution–Shell Dynamics framework. It derives a set of sufficient conditions—local Jacobian propagation, weak initialization incoherence, controlled Jacobian evolution, and log-shift invariance of renormalized couplings—under which the GRSD shell dynamics close renormalizably and yield a power-law velocity v(λ,t) in spectral scale, via a rigidity argument that also invokes gradient-flow time rescaling. A central contribution is showing that residual learning can structurally realize log-shift invariance, enabling the required spectral homogeneity when networks are sufficiently deep. The work clarifies how architectural features interact with dynamical stability to permit or obstruct power-law scaling, offering a principled lens to interpret why scaling laws appear in some regimes and fail in others.

Abstract

Empirical power--law scaling has been widely observed across modern deep learning systems, yet its theoretical origins and scope of validity remain incompletely understood. The Generalized Resolution--Shell Dynamics (GRSD) framework models learning as spectral energy transport across logarithmic resolution shells, providing a coarse--grained dynamical description of training. Within GRSD, power--law scaling corresponds to a particularly simple renormalized shell dynamics; however, such behavior is not automatic and requires additional structural properties of the learning process. In this work, we identify a set of sufficient conditions under which the GRSD shell dynamics admits a renormalizable coarse--grained description. These conditions constrain the learning configuration at multiple levels, including boundedness of gradient propagation in the computation graph, weak functional incoherence at initialization, controlled Jacobian evolution along training, and log--shift invariance of renormalized shell couplings. We further show that power--law scaling does not follow from renormalizability alone, but instead arises as a rigidity consequence: once log--shift invariance is combined with the intrinsic time--rescaling covariance of gradient flow, the renormalized GRSD velocity field is forced into a power--law form.

When Does Learning Renormalize? Sufficient Conditions for Power Law Spectral Dynamics

TL;DR

The paper investigates when neural learning dynamics exhibit power-law scaling by embedding training in the Generalized Resolution–Shell Dynamics framework. It derives a set of sufficient conditions—local Jacobian propagation, weak initialization incoherence, controlled Jacobian evolution, and log-shift invariance of renormalized couplings—under which the GRSD shell dynamics close renormalizably and yield a power-law velocity v(λ,t) in spectral scale, via a rigidity argument that also invokes gradient-flow time rescaling. A central contribution is showing that residual learning can structurally realize log-shift invariance, enabling the required spectral homogeneity when networks are sufficiently deep. The work clarifies how architectural features interact with dynamical stability to permit or obstruct power-law scaling, offering a principled lens to interpret why scaling laws appear in some regimes and fail in others.

Abstract

Empirical power--law scaling has been widely observed across modern deep learning systems, yet its theoretical origins and scope of validity remain incompletely understood. The Generalized Resolution--Shell Dynamics (GRSD) framework models learning as spectral energy transport across logarithmic resolution shells, providing a coarse--grained dynamical description of training. Within GRSD, power--law scaling corresponds to a particularly simple renormalized shell dynamics; however, such behavior is not automatic and requires additional structural properties of the learning process. In this work, we identify a set of sufficient conditions under which the GRSD shell dynamics admits a renormalizable coarse--grained description. These conditions constrain the learning configuration at multiple levels, including boundedness of gradient propagation in the computation graph, weak functional incoherence at initialization, controlled Jacobian evolution along training, and log--shift invariance of renormalized shell couplings. We further show that power--law scaling does not follow from renormalizability alone, but instead arises as a rigidity consequence: once log--shift invariance is combined with the intrinsic time--rescaling covariance of gradient flow, the renormalized GRSD velocity field is forced into a power--law form.

Paper Structure

This paper contains 50 sections, 6 theorems, 75 equations.

Key Result

Theorem 1

Fix a learning configuration and consider the induced GRSD shell dynamics defined on logarithmic spectral shells. Suppose Conditions cond:banded-jacobian--cond:log-shift-invariance hold on a training horizon $t \in [0,T]$. Assume in addition the standard GRSD structural properties: (i) antisymmetry is necessarily of power--law form, for some exponent $a \in \mathbb{R}$ and scalar coefficient $c(

Theorems & Definitions (12)

  • Theorem 1: Power--law renormalizability of GRSD shell dynamics
  • Theorem 2: Residual learning induces log--shift invariance beyond a depth threshold
  • proof : Proof sketch
  • Proposition 1: Depth averaging yields Condition \ref{['cond:log-shift-invariance']}
  • proof : Proof sketch
  • Proposition 2: RWKV/SSM implies an (effective) graph--banded Jacobian path
  • proof : Proof sketch
  • Theorem 3: Power--law renormalizability of GRSD shell dynamics
  • proof
  • Lemma 1: Time--rescaling covariance of gradient flow
  • ...and 2 more