Table of Contents
Fetching ...

On the inductive bias of infinite-depth ResNets and the bottleneck rank

Enric Boix-Adsera

TL;DR

This work analyzes the inductive bias of deep ResNets by framing the cost of representing a function as the minimum squared-norm of weights. For linear ResNets, the minimum-norm cost decomposes across singular values and interpolates between nuclear-norm minimization (large regularization) and rank minimization (small regularization) in the infinite-depth limit; embedding/unembedding layers crucially enable a bias toward low bottleneck rank. Extending to nonlinear ResNets, the authors define nonlinear ranks (Jacobian rank and bottleneck rank) for piecewise-linear functions and show the infinite-depth cost is bounded below by the Jacobian rank and above by the bottleneck rank as the regularization parameter vanishes, implying a bias toward low bottleneck structure under certain hyperparameters. The results connect linear theory with nonlinear generalization behavior, providing a principled view of why ResNets—despite skip connections—may favor compact, low-rank or low-bottleneck representations, with implications for generalization and architecture design. Overall, the paper contributes explicit cost formulas, rigorous bounds, and a unifying perspective on inductive bias across linear and nonlinear deep residual architectures.

Abstract

We compute the minimum-norm weights of a deep linear ResNet, and find that the inductive bias of this architecture lies between minimizing nuclear norm and rank. This implies that, with appropriate hyperparameters, deep nonlinear ResNets have an inductive bias towards minimizing bottleneck rank.

On the inductive bias of infinite-depth ResNets and the bottleneck rank

TL;DR

This work analyzes the inductive bias of deep ResNets by framing the cost of representing a function as the minimum squared-norm of weights. For linear ResNets, the minimum-norm cost decomposes across singular values and interpolates between nuclear-norm minimization (large regularization) and rank minimization (small regularization) in the infinite-depth limit; embedding/unembedding layers crucially enable a bias toward low bottleneck rank. Extending to nonlinear ResNets, the authors define nonlinear ranks (Jacobian rank and bottleneck rank) for piecewise-linear functions and show the infinite-depth cost is bounded below by the Jacobian rank and above by the bottleneck rank as the regularization parameter vanishes, implying a bias toward low bottleneck structure under certain hyperparameters. The results connect linear theory with nonlinear generalization behavior, providing a principled view of why ResNets—despite skip connections—may favor compact, low-rank or low-bottleneck representations, with implications for generalization and architecture design. Overall, the paper contributes explicit cost formulas, rigorous bounds, and a unifying perspective on inductive bias across linear and nonlinear deep residual architectures.

Abstract

We compute the minimum-norm weights of a deep linear ResNet, and find that the inductive bias of this architecture lies between minimizing nuclear norm and rank. This implies that, with appropriate hyperparameters, deep nonlinear ResNets have an inductive bias towards minimizing bottleneck rank.

Paper Structure

This paper contains 9 sections, 7 theorems, 44 equations.

Key Result

Theorem 2.1

For any linear transformation $A \in \mathbb{R}^{d_{out} \times d_{in}}$, any parameter $\lambda \in (0,\infty)$, any depth $L \geq 1$, and any width $n \geq \mathop{\mathrm{rank}}\nolimits(A)$, we have

Theorems & Definitions (15)

  • Theorem 2.1: Explicit formula for cost in linear case
  • Corollary 2.2: Cost of infinite-depth linear residual network
  • proof
  • Corollary 2.3: Cost of infinite-depth residual network interpolates between nuclear norm and rank
  • proof
  • Proposition 2.4: Proved in gel1950relation; see Corollary 2.4 of li1999lidskii or Theorem III.4.5 in bhatia1996matrix
  • Lemma 2.5
  • proof
  • Lemma 2.6
  • proof
  • ...and 5 more