Grokking as Dimensional Phase Transition in Neural Networks

Ping Wang

Grokking as Dimensional Phase Transition in Neural Networks

Ping Wang

Abstract

Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that grokking is a \textit{dimensional phase transition}: effective dimensionality~$D$ crosses from sub-diffusive (subcritical, $D < 1$) to super-diffusive (supercritical, $D > 1$) at generalization onset, exhibiting self-organized criticality (SOC). Crucially, $D$ reflects \textbf{gradient field geometry}, not network architecture: synthetic i.i.d.\ Gaussian gradients maintain $D \approx 1$ regardless of graph topology, while real training exhibits dimensional excess from backpropagation correlations. The grokking-localized $D(t)$ crossing -- robust across topologies -- offers new insight into the trainability of overparameterized networks.

Grokking as Dimensional Phase Transition in Neural Networks

Abstract

crosses from sub-diffusive (subcritical,

) to super-diffusive (supercritical,

) at generalization onset, exhibiting self-organized criticality (SOC). Crucially,

reflects \textbf{gradient field geometry}, not network architecture: synthetic i.i.d.\ Gaussian gradients maintain

regardless of graph topology, while real training exhibits dimensional excess from backpropagation correlations. The grokking-localized

crossing -- robust across topologies -- offers new insight into the trainability of overparameterized networks.

Grokking as Dimensional Phase Transition in Neural Networks

Abstract

Grokking as Dimensional Phase Transition in Neural Networks

Abstract

Paper Structure

Figures (3)