Table of Contents
Fetching ...

On Generalization Bounds for Neural Networks with Low Rank Layers

Andrea Pinto, Akshay Rangamani, Tomaso Poggio

TL;DR

Maurer's chain rule for Gaussian complexity is applied to analyze how low-rank layers in deep networks can prevent the accumulation of rank and dimensionality factors that typically multiply across layers, yielding generalization bounds for rank and spectral norm constrained networks.

Abstract

While previous optimization results have suggested that deep neural networks tend to favour low-rank weight matrices, the implications of this inductive bias on generalization bounds remain underexplored. In this paper, we apply Maurer's chain rule for Gaussian complexity to analyze how low-rank layers in deep networks can prevent the accumulation of rank and dimensionality factors that typically multiply across layers. This approach yields generalization bounds for rank and spectral norm constrained networks. We compare our results to prior generalization bounds for deep networks, highlighting how deep networks with low-rank layers can achieve better generalization than those with full-rank layers. Additionally, we discuss how this framework provides new perspectives on the generalization capabilities of deep networks exhibiting neural collapse.

On Generalization Bounds for Neural Networks with Low Rank Layers

TL;DR

Maurer's chain rule for Gaussian complexity is applied to analyze how low-rank layers in deep networks can prevent the accumulation of rank and dimensionality factors that typically multiply across layers, yielding generalization bounds for rank and spectral norm constrained networks.

Abstract

While previous optimization results have suggested that deep neural networks tend to favour low-rank weight matrices, the implications of this inductive bias on generalization bounds remain underexplored. In this paper, we apply Maurer's chain rule for Gaussian complexity to analyze how low-rank layers in deep networks can prevent the accumulation of rank and dimensionality factors that typically multiply across layers. This approach yields generalization bounds for rank and spectral norm constrained networks. We compare our results to prior generalization bounds for deep networks, highlighting how deep networks with low-rank layers can achieve better generalization than those with full-rank layers. Additionally, we discuss how this framework provides new perspectives on the generalization capabilities of deep networks exhibiting neural collapse.

Paper Structure

This paper contains 27 sections, 13 theorems, 32 equations, 1 table.

Key Result

theorem 1

Let $\mathcal{H}$ be a function class of hypotheses composed with losses mapping from $\mathcal{Z} = \mathcal{X} \times \mathcal{Y}$ to $[0, 1]$. Then, for any $\delta > 0$, with probability at least $1 - \delta$ over the draw of an i.i.d. sample $S$ of size $m$, the following holds for all $g \in \

Theorems & Definitions (18)

  • theorem 1: Gaussian Complexity Generalization Bound
  • theorem 2: Vector-valued Gaussian complexity Generalization Bound
  • theorem 3: Maurer Gaussian Complexity Chain Rule
  • lemma 1: Gaussian complexity of a deep linear network
  • proof
  • lemma 2: Diameter of the deep nonlinear network function class
  • proof
  • lemma 3: Gaussian average of Lipschitz coefficients
  • proof
  • theorem 4: Gaussian complexity of deep Lipschitz neural network
  • ...and 8 more