On the inductive bias of infinite-depth ResNets and the bottleneck rank
Enric Boix-Adsera
TL;DR
This work analyzes the inductive bias of deep ResNets by framing the cost of representing a function as the minimum squared-norm of weights. For linear ResNets, the minimum-norm cost decomposes across singular values and interpolates between nuclear-norm minimization (large regularization) and rank minimization (small regularization) in the infinite-depth limit; embedding/unembedding layers crucially enable a bias toward low bottleneck rank. Extending to nonlinear ResNets, the authors define nonlinear ranks (Jacobian rank and bottleneck rank) for piecewise-linear functions and show the infinite-depth cost is bounded below by the Jacobian rank and above by the bottleneck rank as the regularization parameter vanishes, implying a bias toward low bottleneck structure under certain hyperparameters. The results connect linear theory with nonlinear generalization behavior, providing a principled view of why ResNets—despite skip connections—may favor compact, low-rank or low-bottleneck representations, with implications for generalization and architecture design. Overall, the paper contributes explicit cost formulas, rigorous bounds, and a unifying perspective on inductive bias across linear and nonlinear deep residual architectures.
Abstract
We compute the minimum-norm weights of a deep linear ResNet, and find that the inductive bias of this architecture lies between minimizing nuclear norm and rank. This implies that, with appropriate hyperparameters, deep nonlinear ResNets have an inductive bias towards minimizing bottleneck rank.
