A Margin-based Multiclass Generalization Bound via Geometric Complexity

Michael Munn; Benoit Dherin; Javier Gonzalvo

A Margin-based Multiclass Generalization Bound via Geometric Complexity

Michael Munn, Benoit Dherin, Javier Gonzalvo

TL;DR

The paper addresses the question of why deep neural networks generalize well by introducing a margin-based multiclass generalization bound that scales with the geometric complexity (GC) of the network. It proves a bound under data distributions satisfying a Poincaré inequality, linking the generalization error to the margin and to the margin-normalized GC, and extends the analysis from binary to multiclass settings using a covering-number and Dudley integral approach. Empirical validation on ResNet-18 trained on CIFAR-10 and CIFAR-100 with both original and random labels shows a strong correlation between GC and excess risk, with margin normalization stabilizing GC across training. The results offer an architecture-agnostic perspective on generalization, highlight the role of data geometry, and suggest GC as a practical proxy for assessing and possibly guiding generalization in neural networks.

Abstract

There has been considerable effort to better understand the generalization capabilities of deep neural networks both as a means to unlock a theoretical understanding of their success as well as providing directions for further improvements. In this paper, we investigate margin-based multiclass generalization bounds for neural networks which rely on a recent complexity measure, the geometric complexity, developed for neural networks. We derive a new upper bound on the generalization error which scales with the margin-normalized geometric complexity of the network and which holds for a broad family of data distributions and model classes. Our generalization bound is empirically investigated for a ResNet-18 model trained with SGD on the CIFAR-10 and CIFAR-100 datasets with both original and random labels.

A Margin-based Multiclass Generalization Bound via Geometric Complexity

TL;DR

Abstract

Paper Structure (24 sections, 7 theorems, 61 equations, 2 figures)

This paper contains 24 sections, 7 theorems, 61 equations, 2 figures.

Introduction
Contributions
Related Work
Complexity measures for neural networks and double descent.
Implicit regularization and sharpness aware techniques.
Generalization bounds for neural networks.
Background and Notation
Preliminaries
Notation
The Poincaré Inequality
Poincaré and Isoperimetry
Geometric Complexity
Comparing the theoretical and empirical geometric complexity
Covering Numbers
Main Results
...and 9 more sections

Key Result

Theorem 1.1

Given $a_1, a_2$ be positive reals. Let $S = \{(x_1, y_1), \dots, (x_m, y_m)\}$ be i.i.d. input-output pairs in $\mathbb{R}^d \times \{1,\cdots, k\}$ and suppose the distribution $\mu$ of the $x_i$ satisfies the Poincaré inequality with constant $\rho >0$. Then, for any $\delta > 0$, with probabili where $\widehat{\mathcal{R}}_{S, \gamma}(f) =m^{-1} \sum_i \mathbbm{1}_{y_if(x_i) \leq \gamma}$ and

Figures (2)

Figure 1: Analysis of ResNet-18 he2016deep trained with SGD on CIFAR-10 with both original and with random labels. The blue triangle-marked curves plot the excess risk across training epochs (on a log scale) while the green circle-marked curves track the geometric complexity ($\mathop{\mathrm{GC}}\nolimits$), normalized so that the two curves for random labels meet. Note that in both settings the $\mathop{\mathrm{GC}}\nolimits$ is closely correlated with the excess risk. Furthermore, normalizing the $\mathop{\mathrm{GC}}\nolimits$ by the margin (i.e., the square-marked curve) neutralizes growth across epochs. Similar plots for CIFAR-100 can be found in Section \ref{['section:appendix_cifar_experiments']} of the Appendix.
Figure 2: Analysis of ResNet-18 he2016deep trained with SGD on CIFAR-10 (left) and CIFAR-100 (right) with both original and with random labels. The triangle-marked curves plot the excess risk across training epochs (on a log scale). Circle-marked curves track the geometric complexity ($\mathop{\mathrm{GC}}\nolimits$). Note that the $\mathop{\mathrm{GC}}\nolimits$ is tightly correlated with the excess risk in both settings. Normalizing the $\mathop{\mathrm{GC}}\nolimits$ by the margin neutralizes growth across epochs.

Theorems & Definitions (16)

Theorem 1.1
Definition 3.1
Definition 3.2: Empirical Geometric Complexity
Definition 3.3: Theoretical Geometric Complexity
Proposition 3.4
proof
Lemma 4.1
proof
Theorem 4.2
Lemma 4.3
...and 6 more

A Margin-based Multiclass Generalization Bound via Geometric Complexity

TL;DR

Abstract

A Margin-based Multiclass Generalization Bound via Geometric Complexity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (16)