On Rademacher Complexity-based Generalization Bounds for Deep Learning

Lan V. Truong

On Rademacher Complexity-based Generalization Bounds for Deep Learning

Lan V. Truong

TL;DR

This work advances the theoretical understanding of deep-network generalization by introducing contraction lemmas for high-dimensional, vector-valued mappings and integrating them into a Rademacher-complexity framework. It derives non-vacuous generalization bounds for CNNs and ReLU-based networks, with bounds that depend on layer-wise weight norms (including $\infty$-norm and $p$-norms) and activation Lipschitz constants, and extends applicability beyond ReLU activations. The authors provide explicit loss and margin-based generalization guarantees, supported by MNIST experiments showing non-vacuous bounds for small class sets, and compare favorably to Golowich et al. (2018). Overall, the paper broadens activation-function coverage, tightens depth-sensitive bounds, and demonstrates practical implications for understanding CNN generalization in common image-classification tasks.

Abstract

We show that the Rademacher complexity-based framework can establish non-vacuous generalization bounds for Convolutional Neural Networks (CNNs) in the context of classifying a small set of image classes. A key technical advancement is the formulation of novel contraction lemmas for high-dimensional mappings between vector spaces, specifically designed for general Lipschitz activation functions. These lemmas extend and refine the Talagrand contraction lemma across a broader range of scenarios. Our Rademacher complexity bound provides an enhancement over the results presented by Golowich et al. for ReLU-based Deep Neural Networks (DNNs). Moreover, while previous works utilizing Rademacher complexity have primarily focused on ReLU DNNs, our results generalize to a wider class of activation functions.

On Rademacher Complexity-based Generalization Bounds for Deep Learning

TL;DR

-norm and

-norms) and activation Lipschitz constants, and extends applicability beyond ReLU activations. The authors provide explicit loss and margin-based generalization guarantees, supported by MNIST experiments showing non-vacuous bounds for small class sets, and compare favorably to Golowich et al. (2018). Overall, the paper broadens activation-function coverage, tightens depth-sensitive bounds, and demonstrates practical implications for understanding CNN generalization in common image-classification tasks.

Abstract

Paper Structure (32 sections, 24 theorems, 213 equations, 3 figures)

This paper contains 32 sections, 24 theorems, 213 equations, 3 figures.

Introduction
Related Papers
Contributions
Other Notations
Contraction Lemmas in High Dimensional Vector Spaces
Rademacher Complexity Bounds for Deep Neural Networks (DNNs)
General Deep Neural Network Models
Rademacher complexity bounds for ReLU-DNNs
Rademacher complexity bounds for CNNs
Some Contraction Lemmas for CNNs
Rademacher complexity bounds for CNNs
Generalization Bounds for DNNs
Loss Generalization Bounds for ReLU-DNNs
Generalization Error Bounds for CNNs
Numerical Results
...and 17 more sections

Key Result

Lemma 1

LedouxT1991book Let $\mathcal{H}$ be a hypothesis set of functions mapping from some set $\mathcal{X}$ to $\mathbb{R}$ and $\psi$ be a $\mu$-Lipschitz function from $\mathbb{R} \to \mathbb{R}$ for some $\mu>0$. Then, for any sample $S$ of $n$ points $\mathbf{x}_1,\mathbf{x}_2,\cdots,\mathbf{x}_n \in

Figures (3)

Figure 1: CNN model with ReLU activations
Figure 2: CNN model with sigmoid activations
Figure 3: CNN model with sigmoid activations

Theorems & Definitions (31)

Lemma 1
Theorem 2
Theorem 3
Remark 4
Theorem 5
Lemma 6
Definition 7
Lemma 8
Lemma 9
Lemma 10
...and 21 more

On Rademacher Complexity-based Generalization Bounds for Deep Learning

TL;DR

Abstract

On Rademacher Complexity-based Generalization Bounds for Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (31)