Gaussian random field approximation via Stein's method with applications to wide random neural networks
Krishnakumar Balasubramanian, Larry Goldstein, Nathan Ross, Adil Salim
TL;DR
This work develops quantitative Gaussian random-field approximations for fields indexed by the sphere using Stein's method, focusing on the Wasserstein distance $W_1$ under the sup-norm. The authors introduce a novel Laplacian-based smoothing that yields an explicit Cameron–Martin space, enabling transfer from smooth metrics to the $W_1$ distance and extending prior one-dimensional smoothing techniques to sphere-indexed fields. They derive a master bound for the distance between a general random field $F$ and a Gaussian field $H$ on $\mathcal{S}^n$, and apply it to wide deep neural networks to obtain explicit, depth- and width-dependent $W_1$ bounds at the random-field level, under Lipschitz activations. They further show improved rates when activations are three-times differentiable and discuss comparisons to related works and potential future directions, including rate improvements and extensions to heavier-tailed weight distributions or other manifolds.
Abstract
We derive upper bounds on the Wasserstein distance ($W_1$), with respect to $\sup$-norm, between any continuous $\mathbb{R}^d$ valued random field indexed by the $n$-sphere and the Gaussian, based on Stein's method. We develop a novel Gaussian smoothing technique that allows us to transfer a bound in a smoother metric to the $W_1$ distance. The smoothing is based on covariance functions constructed using powers of Laplacian operators, designed so that the associated Gaussian process has a tractable Cameron-Martin or Reproducing Kernel Hilbert Space. This feature enables us to move beyond one dimensional interval-based index sets that were previously considered in the literature. Specializing our general result, we obtain the first bounds on the Gaussian random field approximation of wide random neural networks of any depth and Lipschitz activation functions at the random field level. Our bounds are explicitly expressed in terms of the widths of the network and moments of the random weights. We also obtain tighter bounds when the activation function has three bounded derivatives.
