Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces
Anastasis Kratsios, A. Martina Neuman, Gudmund Pammer
TL;DR
This work develops adaptive, geometry-aware generalization and estimation bounds for learning models implemented on digital computers, where inputs live on finite grids and outputs are discretized. Central to the approach is representing finite metric spaces via bi-Lipschitz Euclidean embeddings into $\bR^m$, allowing tight non-asymptotic concentration bounds in the $1$-Wasserstein metric that scale with the representation dimension $m$ and sample size $N$. The key theoretical results include an adaptive concentration bound for empirical measures on finite metric spaces and a companion adaptive generalization/estimation bound that uses these concentration rates to yield effects such as dimensionality independence for practical $N$, and explicit distortion-aware constants. The method is illustrated by applications to deep networks with ReLU activations and kernel ridge regression on discretized domains, showing that digital computing constraints can mitigate the curse of dimensionality and yield meaningful, dimension-adaptive learning guarantees in realistic regimes.
Abstract
Machine learning models with inputs in a Euclidean space $\mathbb{R}^d$, when implemented on digital computers, generalize, and their generalization gap converges to $0$ at a rate of $c/N^{1/2}$ concerning the sample size $N$. However, the constant $c>0$ obtained through classical methods can be large in terms of the ambient dimension $d$ and machine precision, posing a challenge when $N$ is small to realistically large. In this paper, we derive a family of generalization bounds $\{c_m/N^{1/(2\vee m)}\}_{m=1}^{\infty}$ tailored for learning models on digital computers, which adapt to both the sample size $N$ and the so-called geometric representation dimension $m$ of the discrete learning problem. Adjusting the parameter $m$ according to $N$ results in significantly tighter generalization bounds for practical sample sizes $N$, while setting $m$ small maintains the optimal dimension-free worst-case rate of $\mathcal{O}(1/N^{1/2})$. Notably, $c_{m}\in \mathcal{O}(m^{1/2})$ for learning models on discretized Euclidean domains. Furthermore, our adaptive generalization bounds are formulated based on our new non-asymptotic result for concentration of measure in finite metric spaces, established via leveraging metric embedding arguments.
