Concentration of Non-Isotropic Random Tensors with Applications to Learning and Empirical Risk Minimization
Mathieu Even, Laurent Massoulié
TL;DR
The paper tackles the dimensionality bottleneck in learning with non-isotropic data by introducing an effective dimension $d_{\rm eff}(r)$ and proving non-asymptotic, ellipsoid-based metric-entropybased concentration bounds. Through a chaining framework, it derives uniform bounds for symmetric random tensors of rank 1 and connects these to ERM, Hessian concentration, and randomized smoothing, all while exploiting non-isotropic structure. The resulting results yield improved statistical preconditioning and smoothing performance when data exhibit low effective dimension, with infinite-dimensional extensions via spectral dimension. Practically, these insights enable more efficient optimization and learning in high-dimensional, anisotropic settings. The work provides a coherent pathway from rigorous probabilistic bounds to concrete algorithmic improvements across preconditioning and smoothing in ERM-like problems.
Abstract
Dimension is an inherent bottleneck to some modern learning tasks, where optimization methods suffer from the size of the data. In this paper, we study non-isotropic distributions of data and develop tools that aim at reducing these dimensional costs by a dependency on an effective dimension rather than the ambient one. Based on non-asymptotic estimates of the metric entropy of ellipsoids -- that prove to generalize to infinite dimensions -- and on a chaining argument, our uniform concentration bounds involve an effective dimension instead of the global dimension, improving over existing results. We show the importance of taking advantage of non-isotropic properties in learning problems with the following applications: i) we improve state-of-the-art results in statistical preconditioning for communication-efficient distributed optimization, ii) we introduce a non-isotropic randomized smoothing for non-smooth optimization. Both applications cover a class of functions that encompasses empirical risk minization (ERM) for linear models.
