Renormalization group for deep neural networks: Universality of learning and scaling laws
Gorka Peraza Coppola, Moritz Helias, Zohar Ringel
TL;DR
This work introduces a Wilsonian renormalization group framework for deep neural networks trained on power-law data, linking learning curves to self-similar structure in the data and kernel spectra. Starting from the infinite-width Gaussian Process limit, it adds weak non-Gaussian interactions to capture finite-width effects and develops a continuum, scale-aware RG in kernel (momentum) space that accounts for spectral discreteness and lack of translation invariance. A key result is the replacement of traditional scaling dimensions with scaling intervals, along with a Gaussian-process–like UV fixed point that governs large-data universality and leads to explicit neural-scaling laws with calculable corrections. The framework yields concrete predictions for mean predictors, variances, hyperparameter transfer across model sizes, and the dependence of learning curves on dataset size, with non-Gaussian corrections decaying as $P$ grows, thereby connecting critical phenomena methods to practical learning dynamics and resource estimates.
Abstract
Self-similarity, where observables at different length scales exhibit similar behavior, is ubiquitous in natural systems. Such systems are typically characterized by power-law correlations and universality, and are studied using the powerful framework of the renormalization group (RG). Intriguingly, power laws and weak forms of universality also pervade real-world datasets and deep learning models, motivating the application of RG ideas to the analysis of deep learning. In this work, we develop an RG framework to analyze self-similarity and its breakdown in learning curves for a class of weakly non-linear (non-lazy) neural networks trained on power-law distributed data. Features often neglected in standard treatments -- such as spectrum discreteness and lack of translation invariance -- lead to both quantitative and qualitative departures from conventional perturbative RG. In particular, we find that the concept of scaling intervals naturally replaces that of scaling dimensions. Despite these differences, the framework retains key RG features: it enables the classification of perturbations as relevant or irrelevant, and reveals a form of universality at large data limits, governed by a Gaussian Process-like UV fixed point.
