Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces

Anastasis Kratsios; A. Martina Neuman; Gudmund Pammer

Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces

Anastasis Kratsios, A. Martina Neuman, Gudmund Pammer

TL;DR

This work develops adaptive, geometry-aware generalization and estimation bounds for learning models implemented on digital computers, where inputs live on finite grids and outputs are discretized. Central to the approach is representing finite metric spaces via bi-Lipschitz Euclidean embeddings into $\bR^m$, allowing tight non-asymptotic concentration bounds in the $1$-Wasserstein metric that scale with the representation dimension $m$ and sample size $N$. The key theoretical results include an adaptive concentration bound for empirical measures on finite metric spaces and a companion adaptive generalization/estimation bound that uses these concentration rates to yield effects such as dimensionality independence for practical $N$, and explicit distortion-aware constants. The method is illustrated by applications to deep networks with ReLU activations and kernel ridge regression on discretized domains, showing that digital computing constraints can mitigate the curse of dimensionality and yield meaningful, dimension-adaptive learning guarantees in realistic regimes.

Abstract

Machine learning models with inputs in a Euclidean space $\mathbb{R}^d$, when implemented on digital computers, generalize, and their generalization gap converges to $0$ at a rate of $c/N^{1/2}$ concerning the sample size $N$. However, the constant $c>0$ obtained through classical methods can be large in terms of the ambient dimension $d$ and machine precision, posing a challenge when $N$ is small to realistically large. In this paper, we derive a family of generalization bounds $\{c_m/N^{1/(2\vee m)}\}_{m=1}^{\infty}$ tailored for learning models on digital computers, which adapt to both the sample size $N$ and the so-called geometric representation dimension $m$ of the discrete learning problem. Adjusting the parameter $m$ according to $N$ results in significantly tighter generalization bounds for practical sample sizes $N$, while setting $m$ small maintains the optimal dimension-free worst-case rate of $\mathcal{O}(1/N^{1/2})$. Notably, $c_{m}\in \mathcal{O}(m^{1/2})$ for learning models on discretized Euclidean domains. Furthermore, our adaptive generalization bounds are formulated based on our new non-asymptotic result for concentration of measure in finite metric spaces, established via leveraging metric embedding arguments.

Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces

TL;DR

, allowing tight non-asymptotic concentration bounds in the

-Wasserstein metric that scale with the representation dimension

and sample size

. The key theoretical results include an adaptive concentration bound for empirical measures on finite metric spaces and a companion adaptive generalization/estimation bound that uses these concentration rates to yield effects such as dimensionality independence for practical

, and explicit distortion-aware constants. The method is illustrated by applications to deep networks with ReLU activations and kernel ridge regression on discretized domains, showing that digital computing constraints can mitigate the curse of dimensionality and yield meaningful, dimension-adaptive learning guarantees in realistic regimes.

Abstract

Machine learning models with inputs in a Euclidean space

, when implemented on digital computers, generalize, and their generalization gap converges to

at a rate of

concerning the sample size

. However, the constant

obtained through classical methods can be large in terms of the ambient dimension

and machine precision, posing a challenge when

is small to realistically large. In this paper, we derive a family of generalization bounds

tailored for learning models on digital computers, which adapt to both the sample size

and the so-called geometric representation dimension

of the discrete learning problem. Adjusting the parameter

according to

results in significantly tighter generalization bounds for practical sample sizes

, while setting

small maintains the optimal dimension-free worst-case rate of

. Notably,

for learning models on discretized Euclidean domains. Furthermore, our adaptive generalization bounds are formulated based on our new non-asymptotic result for concentration of measure in finite metric spaces, established via leveraging metric embedding arguments.

Paper Structure (36 sections, 10 theorems, 111 equations, 3 figures, 2 tables)

This paper contains 36 sections, 10 theorems, 111 equations, 3 figures, 2 tables.

Introduction
The Learning Problem
Concentration of Measure on Finite Metric Spaces
Summary of Contributions
Outline of Paper
Comparison with Related Work
Comparison with VC and Information Theoretic Results
Concentration of Measure on Finite Metric Spaces
The Impact of Digital Computing
Preliminary
Background and Notation
On Metric Spaces
On Lipschitz Mappings of Finite Metric Spaces
On Probability Spaces
On Distances between Probability Measures on Finite Metric Spaces
...and 21 more sections

Key Result

Proposition 1

Let $(\mathscr{X},d_{\mathscr{X}})$ be a finite metric space with $\mathrm{card}(\mathscr{X})=k$. Then for every $m\in\mathbb{N}$, there exists a bi-Lipschitz embedding $\varphi_m: \mathscr{X}\to\mathbb{R}^m$ whose distortion $\tau(\varphi_m)$ adheres to the following conditions. Here in mid1, mid2, Suppose in addition that there exists $d\in\mathbb{N}$ such that $\mathscr{X}$ is a metric subspa

Figures (3)

Figure 1: When the sample size $(N)$ is small-to-realistically-large, our non-asymptotic risk bounds are tighter than the classical bounds, e.g. shalev2014understanding). For massive sample sizes $N$, both bounds yield the parametric rate of $\mathcal{O}(1/N^{1/2})$. See Subsection \ref{['s:Discussion_PAC']} for theoretical and numerical demonstrations of this phenomenon.
Figure 2: The distortion incurred when compressing at $3$-point subset of $\mathbb{R}^2$, illustrated by Figure \ref{['fig:DistIllustration__NoDist']}, into a $3$-point subset of the real line $\mathbb{R}$, intuitively illustrated by Figure \ref{['fig:DistIllustration__Dist']}, results from the necessary shrinking or stretching of distances between the points.
Figure 3: Comparison of generalization bounds on a $k$-point packing $\mathscr{X}\subset [0,1]^{100}$ consisting of $k=10^{15}$ points. The tightest generalization bound, selected from embedding dimensions $m\in [1,100]$, is plotted.

Theorems & Definitions (20)

Proposition 1: Euclidean Representation of Finite Metric Spaces
Theorem 1: Adaptive Concentration of Measure on Finite Metric Spaces
Remark 1
Theorem 2: Adaptive Generalization and Estimation Bounds between Finite Metric Spaces
Corollary 1: Generalization Bounds for ReLU NNs on Digital Computers
Remark 2: The significance of Corollary \ref{['cor:MLPDiscretetization']}
Lemma 1: Ultra-Low-Dimensional Metric Embedding
proof
Lemma 2: High-Dimensional Metric Embedding
proof
...and 10 more

Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces

TL;DR

Abstract

Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (20)