On Rademacher Complexity-based Generalization Bounds for Deep Learning
Lan V. Truong
TL;DR
This work advances the theoretical understanding of deep-network generalization by introducing contraction lemmas for high-dimensional, vector-valued mappings and integrating them into a Rademacher-complexity framework. It derives non-vacuous generalization bounds for CNNs and ReLU-based networks, with bounds that depend on layer-wise weight norms (including $\infty$-norm and $p$-norms) and activation Lipschitz constants, and extends applicability beyond ReLU activations. The authors provide explicit loss and margin-based generalization guarantees, supported by MNIST experiments showing non-vacuous bounds for small class sets, and compare favorably to Golowich et al. (2018). Overall, the paper broadens activation-function coverage, tightens depth-sensitive bounds, and demonstrates practical implications for understanding CNN generalization in common image-classification tasks.
Abstract
We show that the Rademacher complexity-based framework can establish non-vacuous generalization bounds for Convolutional Neural Networks (CNNs) in the context of classifying a small set of image classes. A key technical advancement is the formulation of novel contraction lemmas for high-dimensional mappings between vector spaces, specifically designed for general Lipschitz activation functions. These lemmas extend and refine the Talagrand contraction lemma across a broader range of scenarios. Our Rademacher complexity bound provides an enhancement over the results presented by Golowich et al. for ReLU-based Deep Neural Networks (DNNs). Moreover, while previous works utilizing Rademacher complexity have primarily focused on ReLU DNNs, our results generalize to a wider class of activation functions.
