Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks
Andrea Montanari, Kangjie Zhou
TL;DR
The paper analyzes the high‑dimensional behavior of low‑dimensional projections of Gaussian data under proportional asymptotics $n/d\to\alpha$, introducing the feasibility set $\mathscr{F}_{m,\alpha}$ of attainable projection distributions. It develops sharp outer bounds (Wasserstein and KL‑Wasserstein) and constructive inner bounds for unsupervised projection, plus information‑dimension constraints, revealing fundamental limits on how far projected distributions can deviate from Gaussianity. It then extends the framework to supervised learning with $\mathscr{F}^{\varphi}_{m,\alpha}$, deriving analogous bounds and applying them to bound interpolation thresholds for two‑layer neural networks and to characterize the margin distribution in max‑margin classification. The results connect projection pursuit, ICA, and linear/2‑layer models, providing precise quantitative limits that underpin unsupervised and supervised dimensionality reduction in high dimensions. The work yields practical implications for understanding the interpolation capabilities of neural networks and the role of low‑dimensional projections in learning systems.
Abstract
Given a cloud of $n$ data points in $\mathbb{R}^d$, consider all projections onto $m$-dimensional subspaces of $\mathbb{R}^d$ and, for each such projection, the empirical distribution of the projected points. What does this collection of probability distributions look like when $n,d$ grow large? We consider this question under the null model in which the points are i.i.d. standard Gaussian vectors, focusing on the asymptotic regime in which $n,d\to\infty$, with $n/d\toα\in (0,\infty)$, while $m$ is fixed. Denoting by $\mathscr{F}_{m, α}$ the set of probability distributions in $\mathbb{R}^m$ that arise as low-dimensional projections in this limit, we establish new inner and outer bounds on $\mathscr{F}_{m, α}$. In particular, we characterize the Wasserstein radius of $\mathscr{F}_{m,α}$ up to constant multiplicative factors, and determine it exactly for $m=1$. We also prove sharp bounds in terms of Kullback-Leibler divergence and Rényi information dimension. The previous question has application to unsupervised learning methods, such as projection pursuit and independent component analysis. We introduce a version of the same problem that is relevant for supervised learning, and prove a sharp Wasserstein radius bound. As an application, we establish an upper bound on the interpolation threshold of two-layers neural networks with $m$ hidden neurons.
