Table of Contents
Fetching ...

Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces

Anastasis Kratsios, A. Martina Neuman, Gudmund Pammer

TL;DR

This work develops adaptive, geometry-aware generalization and estimation bounds for learning models implemented on digital computers, where inputs live on finite grids and outputs are discretized. Central to the approach is representing finite metric spaces via bi-Lipschitz Euclidean embeddings into $\bR^m$, allowing tight non-asymptotic concentration bounds in the $1$-Wasserstein metric that scale with the representation dimension $m$ and sample size $N$. The key theoretical results include an adaptive concentration bound for empirical measures on finite metric spaces and a companion adaptive generalization/estimation bound that uses these concentration rates to yield effects such as dimensionality independence for practical $N$, and explicit distortion-aware constants. The method is illustrated by applications to deep networks with ReLU activations and kernel ridge regression on discretized domains, showing that digital computing constraints can mitigate the curse of dimensionality and yield meaningful, dimension-adaptive learning guarantees in realistic regimes.

Abstract

Machine learning models with inputs in a Euclidean space $\mathbb{R}^d$, when implemented on digital computers, generalize, and their generalization gap converges to $0$ at a rate of $c/N^{1/2}$ concerning the sample size $N$. However, the constant $c>0$ obtained through classical methods can be large in terms of the ambient dimension $d$ and machine precision, posing a challenge when $N$ is small to realistically large. In this paper, we derive a family of generalization bounds $\{c_m/N^{1/(2\vee m)}\}_{m=1}^{\infty}$ tailored for learning models on digital computers, which adapt to both the sample size $N$ and the so-called geometric representation dimension $m$ of the discrete learning problem. Adjusting the parameter $m$ according to $N$ results in significantly tighter generalization bounds for practical sample sizes $N$, while setting $m$ small maintains the optimal dimension-free worst-case rate of $\mathcal{O}(1/N^{1/2})$. Notably, $c_{m}\in \mathcal{O}(m^{1/2})$ for learning models on discretized Euclidean domains. Furthermore, our adaptive generalization bounds are formulated based on our new non-asymptotic result for concentration of measure in finite metric spaces, established via leveraging metric embedding arguments.

Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces

TL;DR

This work develops adaptive, geometry-aware generalization and estimation bounds for learning models implemented on digital computers, where inputs live on finite grids and outputs are discretized. Central to the approach is representing finite metric spaces via bi-Lipschitz Euclidean embeddings into , allowing tight non-asymptotic concentration bounds in the -Wasserstein metric that scale with the representation dimension and sample size . The key theoretical results include an adaptive concentration bound for empirical measures on finite metric spaces and a companion adaptive generalization/estimation bound that uses these concentration rates to yield effects such as dimensionality independence for practical , and explicit distortion-aware constants. The method is illustrated by applications to deep networks with ReLU activations and kernel ridge regression on discretized domains, showing that digital computing constraints can mitigate the curse of dimensionality and yield meaningful, dimension-adaptive learning guarantees in realistic regimes.

Abstract

Machine learning models with inputs in a Euclidean space , when implemented on digital computers, generalize, and their generalization gap converges to at a rate of concerning the sample size . However, the constant obtained through classical methods can be large in terms of the ambient dimension and machine precision, posing a challenge when is small to realistically large. In this paper, we derive a family of generalization bounds tailored for learning models on digital computers, which adapt to both the sample size and the so-called geometric representation dimension of the discrete learning problem. Adjusting the parameter according to results in significantly tighter generalization bounds for practical sample sizes , while setting small maintains the optimal dimension-free worst-case rate of . Notably, for learning models on discretized Euclidean domains. Furthermore, our adaptive generalization bounds are formulated based on our new non-asymptotic result for concentration of measure in finite metric spaces, established via leveraging metric embedding arguments.
Paper Structure (36 sections, 10 theorems, 111 equations, 3 figures, 2 tables)

This paper contains 36 sections, 10 theorems, 111 equations, 3 figures, 2 tables.

Key Result

Proposition 1

Let $(\mathscr{X},d_{\mathscr{X}})$ be a finite metric space with $\mathrm{card}(\mathscr{X})=k$. Then for every $m\in\mathbb{N}$, there exists a bi-Lipschitz embedding $\varphi_m: \mathscr{X}\to\mathbb{R}^m$ whose distortion $\tau(\varphi_m)$ adheres to the following conditions. Here in mid1, mid2, Suppose in addition that there exists $d\in\mathbb{N}$ such that $\mathscr{X}$ is a metric subspa

Figures (3)

  • Figure 1: When the sample size $(N)$ is small-to-realistically-large, our non-asymptotic risk bounds are tighter than the classical bounds, e.g. shalev2014understanding). For massive sample sizes $N$, both bounds yield the parametric rate of $\mathcal{O}(1/N^{1/2})$. See Subsection \ref{['s:Discussion_PAC']} for theoretical and numerical demonstrations of this phenomenon.
  • Figure 2: The distortion incurred when compressing at $3$-point subset of $\mathbb{R}^2$, illustrated by Figure \ref{['fig:DistIllustration__NoDist']}, into a $3$-point subset of the real line $\mathbb{R}$, intuitively illustrated by Figure \ref{['fig:DistIllustration__Dist']}, results from the necessary shrinking or stretching of distances between the points.
  • Figure 3: Comparison of generalization bounds on a $k$-point packing $\mathscr{X}\subset [0,1]^{100}$ consisting of $k=10^{15}$ points. The tightest generalization bound, selected from embedding dimensions $m\in [1,100]$, is plotted.

Theorems & Definitions (20)

  • Proposition 1: Euclidean Representation of Finite Metric Spaces
  • Theorem 1: Adaptive Concentration of Measure on Finite Metric Spaces
  • Remark 1
  • Theorem 2: Adaptive Generalization and Estimation Bounds between Finite Metric Spaces
  • Corollary 1: Generalization Bounds for ReLU NNs on Digital Computers
  • Remark 2: The significance of Corollary \ref{['cor:MLPDiscretetization']}
  • Lemma 1: Ultra-Low-Dimensional Metric Embedding
  • proof
  • Lemma 2: High-Dimensional Metric Embedding
  • proof
  • ...and 10 more