Table of Contents
Fetching ...

A Margin-based Multiclass Generalization Bound via Geometric Complexity

Michael Munn, Benoit Dherin, Javier Gonzalvo

TL;DR

The paper addresses the question of why deep neural networks generalize well by introducing a margin-based multiclass generalization bound that scales with the geometric complexity (GC) of the network. It proves a bound under data distributions satisfying a Poincaré inequality, linking the generalization error to the margin and to the margin-normalized GC, and extends the analysis from binary to multiclass settings using a covering-number and Dudley integral approach. Empirical validation on ResNet-18 trained on CIFAR-10 and CIFAR-100 with both original and random labels shows a strong correlation between GC and excess risk, with margin normalization stabilizing GC across training. The results offer an architecture-agnostic perspective on generalization, highlight the role of data geometry, and suggest GC as a practical proxy for assessing and possibly guiding generalization in neural networks.

Abstract

There has been considerable effort to better understand the generalization capabilities of deep neural networks both as a means to unlock a theoretical understanding of their success as well as providing directions for further improvements. In this paper, we investigate margin-based multiclass generalization bounds for neural networks which rely on a recent complexity measure, the geometric complexity, developed for neural networks. We derive a new upper bound on the generalization error which scales with the margin-normalized geometric complexity of the network and which holds for a broad family of data distributions and model classes. Our generalization bound is empirically investigated for a ResNet-18 model trained with SGD on the CIFAR-10 and CIFAR-100 datasets with both original and random labels.

A Margin-based Multiclass Generalization Bound via Geometric Complexity

TL;DR

The paper addresses the question of why deep neural networks generalize well by introducing a margin-based multiclass generalization bound that scales with the geometric complexity (GC) of the network. It proves a bound under data distributions satisfying a Poincaré inequality, linking the generalization error to the margin and to the margin-normalized GC, and extends the analysis from binary to multiclass settings using a covering-number and Dudley integral approach. Empirical validation on ResNet-18 trained on CIFAR-10 and CIFAR-100 with both original and random labels shows a strong correlation between GC and excess risk, with margin normalization stabilizing GC across training. The results offer an architecture-agnostic perspective on generalization, highlight the role of data geometry, and suggest GC as a practical proxy for assessing and possibly guiding generalization in neural networks.

Abstract

There has been considerable effort to better understand the generalization capabilities of deep neural networks both as a means to unlock a theoretical understanding of their success as well as providing directions for further improvements. In this paper, we investigate margin-based multiclass generalization bounds for neural networks which rely on a recent complexity measure, the geometric complexity, developed for neural networks. We derive a new upper bound on the generalization error which scales with the margin-normalized geometric complexity of the network and which holds for a broad family of data distributions and model classes. Our generalization bound is empirically investigated for a ResNet-18 model trained with SGD on the CIFAR-10 and CIFAR-100 datasets with both original and random labels.
Paper Structure (24 sections, 7 theorems, 61 equations, 2 figures)

This paper contains 24 sections, 7 theorems, 61 equations, 2 figures.

Key Result

Theorem 1.1

Given $a_1, a_2$ be positive reals. Let $S = \{(x_1, y_1), \dots, (x_m, y_m)\}$ be i.i.d. input-output pairs in $\mathbb{R}^d \times \{1,\cdots, k\}$ and suppose the distribution $\mu$ of the $x_i$ satisfies the Poincaré inequality with constant $\rho >0$. Then, for any $\delta > 0$, with probabili where $\widehat{\mathcal{R}}_{S, \gamma}(f) =m^{-1} \sum_i \mathbbm{1}_{y_if(x_i) \leq \gamma}$ and

Figures (2)

  • Figure 1: Analysis of ResNet-18 he2016deep trained with SGD on CIFAR-10 with both original and with random labels. The blue triangle-marked curves plot the excess risk across training epochs (on a log scale) while the green circle-marked curves track the geometric complexity ($\mathop{\mathrm{GC}}\nolimits$), normalized so that the two curves for random labels meet. Note that in both settings the $\mathop{\mathrm{GC}}\nolimits$ is closely correlated with the excess risk. Furthermore, normalizing the $\mathop{\mathrm{GC}}\nolimits$ by the margin (i.e., the square-marked curve) neutralizes growth across epochs. Similar plots for CIFAR-100 can be found in Section \ref{['section:appendix_cifar_experiments']} of the Appendix.
  • Figure 2: Analysis of ResNet-18 he2016deep trained with SGD on CIFAR-10 (left) and CIFAR-100 (right) with both original and with random labels. The triangle-marked curves plot the excess risk across training epochs (on a log scale). Circle-marked curves track the geometric complexity ($\mathop{\mathrm{GC}}\nolimits$). Note that the $\mathop{\mathrm{GC}}\nolimits$ is tightly correlated with the excess risk in both settings. Normalizing the $\mathop{\mathrm{GC}}\nolimits$ by the margin neutralizes growth across epochs.

Theorems & Definitions (16)

  • Theorem 1.1
  • Definition 3.1
  • Definition 3.2: Empirical Geometric Complexity
  • Definition 3.3: Theoretical Geometric Complexity
  • Proposition 3.4
  • proof
  • Lemma 4.1
  • proof
  • Theorem 4.2
  • Lemma 4.3
  • ...and 6 more