Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks

Andrea Montanari; Kangjie Zhou

Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks

Andrea Montanari, Kangjie Zhou

TL;DR

The paper analyzes the high‑dimensional behavior of low‑dimensional projections of Gaussian data under proportional asymptotics $n/d\to\alpha$, introducing the feasibility set $\mathscr{F}_{m,\alpha}$ of attainable projection distributions. It develops sharp outer bounds (Wasserstein and KL‑Wasserstein) and constructive inner bounds for unsupervised projection, plus information‑dimension constraints, revealing fundamental limits on how far projected distributions can deviate from Gaussianity. It then extends the framework to supervised learning with $\mathscr{F}^{\varphi}_{m,\alpha}$, deriving analogous bounds and applying them to bound interpolation thresholds for two‑layer neural networks and to characterize the margin distribution in max‑margin classification. The results connect projection pursuit, ICA, and linear/2‑layer models, providing precise quantitative limits that underpin unsupervised and supervised dimensionality reduction in high dimensions. The work yields practical implications for understanding the interpolation capabilities of neural networks and the role of low‑dimensional projections in learning systems.

Abstract

Given a cloud of $n$ data points in $\mathbb{R}^d$, consider all projections onto $m$-dimensional subspaces of $\mathbb{R}^d$ and, for each such projection, the empirical distribution of the projected points. What does this collection of probability distributions look like when $n,d$ grow large? We consider this question under the null model in which the points are i.i.d. standard Gaussian vectors, focusing on the asymptotic regime in which $n,d\to\infty$, with $n/d\toα\in (0,\infty)$, while $m$ is fixed. Denoting by $\mathscr{F}_{m, α}$ the set of probability distributions in $\mathbb{R}^m$ that arise as low-dimensional projections in this limit, we establish new inner and outer bounds on $\mathscr{F}_{m, α}$. In particular, we characterize the Wasserstein radius of $\mathscr{F}_{m,α}$ up to constant multiplicative factors, and determine it exactly for $m=1$. We also prove sharp bounds in terms of Kullback-Leibler divergence and Rényi information dimension. The previous question has application to unsupervised learning methods, such as projection pursuit and independent component analysis. We introduce a version of the same problem that is relevant for supervised learning, and prove a sharp Wasserstein radius bound. As an application, we establish an upper bound on the interpolation threshold of two-layers neural networks with $m$ hidden neurons.

Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks

TL;DR

The paper analyzes the high‑dimensional behavior of low‑dimensional projections of Gaussian data under proportional asymptotics

, introducing the feasibility set

of attainable projection distributions. It develops sharp outer bounds (Wasserstein and KL‑Wasserstein) and constructive inner bounds for unsupervised projection, plus information‑dimension constraints, revealing fundamental limits on how far projected distributions can deviate from Gaussianity. It then extends the framework to supervised learning with

, deriving analogous bounds and applying them to bound interpolation thresholds for two‑layer neural networks and to characterize the margin distribution in max‑margin classification. The results connect projection pursuit, ICA, and linear/2‑layer models, providing precise quantitative limits that underpin unsupervised and supervised dimensionality reduction in high dimensions. The work yields practical implications for understanding the interpolation capabilities of neural networks and the role of low‑dimensional projections in learning systems.

Abstract

Given a cloud of

data points in

, consider all projections onto

-dimensional subspaces of

and, for each such projection, the empirical distribution of the projected points. What does this collection of probability distributions look like when

grow large? We consider this question under the null model in which the points are i.i.d. standard Gaussian vectors, focusing on the asymptotic regime in which

, with

, while

is fixed. Denoting by

the set of probability distributions in

that arise as low-dimensional projections in this limit, we establish new inner and outer bounds on

. In particular, we characterize the Wasserstein radius of

up to constant multiplicative factors, and determine it exactly for

. We also prove sharp bounds in terms of Kullback-Leibler divergence and Rényi information dimension. The previous question has application to unsupervised learning methods, such as projection pursuit and independent component analysis. We introduce a version of the same problem that is relevant for supervised learning, and prove a sharp Wasserstein radius bound. As an application, we establish an upper bound on the interpolation threshold of two-layers neural networks with

hidden neurons.

Paper Structure (34 sections, 31 theorems, 367 equations, 1 figure)

This paper contains 34 sections, 31 theorems, 367 equations, 1 figure.

Introduction and main results
A null model for unsupervised learning
A null model for supervised learning
Outer bounds: Unsupervised learning
Wasserstein radius for $m=1$
KL-Wasserstein outer bound for general $m$
Information dimension bound
Application to the negative spherical perceptron
Inner bounds: Unsupervised learning
Inner bound for $m = 1$
Inner bound for general $m>1$
Main results: Supervised learning
Wasserstein outer bound for $m = 1$
KL-Wasserstein outer bound for general $m$
Interpolation threshold for two-layers neural network
...and 19 more sections

Key Result

Theorem 2.1

Consider the case $m = 1$. Then for any $\alpha\in(0, \infty)$, we have

Figures (1)

Figure 1: A cartoon of the $W_2$ geometry of the feasibility set $\mathscr{F}_{1, \alpha}$ (blue shaded area). The outer $W_2$ radius of $\mathscr{F}_{1, \alpha}$ (with respect to center $\mathsf{N} (0, 1)$) is equal to $1 / \sqrt{\alpha}$, but the inner radius is zero. Namely, for any $\varepsilon$, the $W_2$ ball centered at $\mathsf{N} (0, 1)$ with radius $\varepsilon$ is not contained in $\mathscr{F}_{1, \alpha}$ for any $\alpha > 1$.

Theorems & Definitions (63)

Theorem 2.1: Wasserstein radius for $m=1$
Remark 2.1
Theorem 2.2: KL-Wasserstein outer bound
Theorem 2.3
Remark 2.2
Definition 2.1: Information dimension renyi1959dimension
Theorem 2.4
Remark 2.3
Theorem 2.5
proof
...and 53 more

Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks

TL;DR

Abstract

Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (63)