Sparsifying Suprema of Gaussian Processes
Anindya De, Shivam Nadimpalli, Ryan O'Donnell, Rocco A. Servedio
TL;DR
This work proves a dimension-free sparsification theorem for the suprema of centered Gaussian processes: for any bounded $T\subset\mathbb{R}^n$, there exists a small subset $S\subseteq T$ and shifts $\{c_s\}$ such that $\sup_{t\in T} \boldsymbol{X}_t$ is approximated in $L^1$ by $\sup_{s\in S}(\boldsymbol{X}_s + c_s)$, with error $\varepsilon$ scaling and $|S|$ depending only on $w(T)$ and $\varepsilon$, not on $|T|$ or $n$. The construction leverages Talagrand's majorizing measures to build a hierarchical clustering of $T$, enabling a multi-scale, sparse representation of the supremum. Two major applications follow: (i) a Junta Theorem for norms on Gaussian space, showing any norm can be approximated by a low-dimensional projection-based norm; (ii) sparsification of intersections of halfspaces of bounded width, yielding dimension-free, agnostic learning and tolerant testing results for convex sets. The results connect geometric complexity, Gaussian width, and majorizing measures to practical algorithmic tasks, and they establish both upper bounds and lower-bound limitations on sparsifier size, clarifying the role of centering and sparsity in high-dimensional Gaussian analysis.
Abstract
We give a dimension-independent sparsification result for suprema of centered Gaussian processes: Let $T$ be any (possibly infinite) bounded set of vectors in $\mathbb{R}^n$, and let $\{\boldsymbol{X}_t := t \cdot \boldsymbol{g} \}_{t\in T}$ be the canonical Gaussian process on $T$, where $\boldsymbol{g}\sim N(0, I_n)$. We show that there is an $O_\varepsilon(1)$-size subset $S \subseteq T$ and a set of real values $\{c_s\}_{s \in S}$ such that the random variable $\sup_{s \in S} \{\boldsymbol{X}_s + c_s\}$ is an $\varepsilon$-approximator\,(in $L^1$) of the random variable $\sup_{t \in T} {\boldsymbol{X}}_t$. Notably, the size of the sparsifier $S$ is completely independent of both $|T|$ and the ambient dimension $n$. We give two applications of this sparsification theorem: - A "Junta Theorem" for Norms: We show that given any norm $ν(x)$ on $\mathbb{R}^n$, there is another norm $ψ(x)$ depending only on the projection of $x$ onto $O_\varepsilon(1)$ directions, for which $ψ({\boldsymbol{g}})$ is a multiplicative $(1 \pm \varepsilon)$-approximation of $ν({\boldsymbol{g}})$ with probability $1-\varepsilon$ for ${\boldsymbol{g}} \sim N(0,I_n)$. - Sparsification of Convex Sets: We show that any intersection of (possibly infinitely many) halfspaces in $\mathbb{R}^n$ that are at distance $r$ from the origin is $\varepsilon$-close (under $N(0,I_n)$) to an intersection of only $O_{r,\varepsilon}(1)$ halfspaces. This yields new polynomial-time \emph{agnostic learning} and \emph{tolerant property testing} algorithms for intersections of halfspaces.
