Table of Contents
Fetching ...

Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks

Andrea Montanari, Kangjie Zhou

TL;DR

The paper analyzes the high‑dimensional behavior of low‑dimensional projections of Gaussian data under proportional asymptotics $n/d\to\alpha$, introducing the feasibility set $\mathscr{F}_{m,\alpha}$ of attainable projection distributions. It develops sharp outer bounds (Wasserstein and KL‑Wasserstein) and constructive inner bounds for unsupervised projection, plus information‑dimension constraints, revealing fundamental limits on how far projected distributions can deviate from Gaussianity. It then extends the framework to supervised learning with $\mathscr{F}^{\varphi}_{m,\alpha}$, deriving analogous bounds and applying them to bound interpolation thresholds for two‑layer neural networks and to characterize the margin distribution in max‑margin classification. The results connect projection pursuit, ICA, and linear/2‑layer models, providing precise quantitative limits that underpin unsupervised and supervised dimensionality reduction in high dimensions. The work yields practical implications for understanding the interpolation capabilities of neural networks and the role of low‑dimensional projections in learning systems.

Abstract

Given a cloud of $n$ data points in $\mathbb{R}^d$, consider all projections onto $m$-dimensional subspaces of $\mathbb{R}^d$ and, for each such projection, the empirical distribution of the projected points. What does this collection of probability distributions look like when $n,d$ grow large? We consider this question under the null model in which the points are i.i.d. standard Gaussian vectors, focusing on the asymptotic regime in which $n,d\to\infty$, with $n/d\toα\in (0,\infty)$, while $m$ is fixed. Denoting by $\mathscr{F}_{m, α}$ the set of probability distributions in $\mathbb{R}^m$ that arise as low-dimensional projections in this limit, we establish new inner and outer bounds on $\mathscr{F}_{m, α}$. In particular, we characterize the Wasserstein radius of $\mathscr{F}_{m,α}$ up to constant multiplicative factors, and determine it exactly for $m=1$. We also prove sharp bounds in terms of Kullback-Leibler divergence and Rényi information dimension. The previous question has application to unsupervised learning methods, such as projection pursuit and independent component analysis. We introduce a version of the same problem that is relevant for supervised learning, and prove a sharp Wasserstein radius bound. As an application, we establish an upper bound on the interpolation threshold of two-layers neural networks with $m$ hidden neurons.

Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks

TL;DR

The paper analyzes the high‑dimensional behavior of low‑dimensional projections of Gaussian data under proportional asymptotics , introducing the feasibility set of attainable projection distributions. It develops sharp outer bounds (Wasserstein and KL‑Wasserstein) and constructive inner bounds for unsupervised projection, plus information‑dimension constraints, revealing fundamental limits on how far projected distributions can deviate from Gaussianity. It then extends the framework to supervised learning with , deriving analogous bounds and applying them to bound interpolation thresholds for two‑layer neural networks and to characterize the margin distribution in max‑margin classification. The results connect projection pursuit, ICA, and linear/2‑layer models, providing precise quantitative limits that underpin unsupervised and supervised dimensionality reduction in high dimensions. The work yields practical implications for understanding the interpolation capabilities of neural networks and the role of low‑dimensional projections in learning systems.

Abstract

Given a cloud of data points in , consider all projections onto -dimensional subspaces of and, for each such projection, the empirical distribution of the projected points. What does this collection of probability distributions look like when grow large? We consider this question under the null model in which the points are i.i.d. standard Gaussian vectors, focusing on the asymptotic regime in which , with , while is fixed. Denoting by the set of probability distributions in that arise as low-dimensional projections in this limit, we establish new inner and outer bounds on . In particular, we characterize the Wasserstein radius of up to constant multiplicative factors, and determine it exactly for . We also prove sharp bounds in terms of Kullback-Leibler divergence and Rényi information dimension. The previous question has application to unsupervised learning methods, such as projection pursuit and independent component analysis. We introduce a version of the same problem that is relevant for supervised learning, and prove a sharp Wasserstein radius bound. As an application, we establish an upper bound on the interpolation threshold of two-layers neural networks with hidden neurons.
Paper Structure (34 sections, 31 theorems, 367 equations, 1 figure)

This paper contains 34 sections, 31 theorems, 367 equations, 1 figure.

Key Result

Theorem 2.1

Consider the case $m = 1$. Then for any $\alpha\in(0, \infty)$, we have

Figures (1)

  • Figure 1: A cartoon of the $W_2$ geometry of the feasibility set $\mathscr{F}_{1, \alpha}$ (blue shaded area). The outer $W_2$ radius of $\mathscr{F}_{1, \alpha}$ (with respect to center $\mathsf{N} (0, 1)$) is equal to $1 / \sqrt{\alpha}$, but the inner radius is zero. Namely, for any $\varepsilon$, the $W_2$ ball centered at $\mathsf{N} (0, 1)$ with radius $\varepsilon$ is not contained in $\mathscr{F}_{1, \alpha}$ for any $\alpha > 1$.

Theorems & Definitions (63)

  • Theorem 2.1: Wasserstein radius for $m=1$
  • Remark 2.1
  • Theorem 2.2: KL-Wasserstein outer bound
  • Theorem 2.3
  • Remark 2.2
  • Definition 2.1: Information dimension renyi1959dimension
  • Theorem 2.4
  • Remark 2.3
  • Theorem 2.5
  • proof
  • ...and 53 more