Computational-Statistical Gaps in Gaussian Single-Index Models

Alex Damian; Loucas Pillaud-Vivien; Jason D. Lee; Joan Bruna

Computational-Statistical Gaps in Gaussian Single-Index Models

Alex Damian, Loucas Pillaud-Vivien, Jason D. Lee, Joan Bruna

TL;DR

This work investigates Gaussian single-index models with planted one-dimensional structure and identifies a fundamental generative exponent k*(P) that governs the computational difficulty of recovering the hidden direction w*. The authors prove tight lower bounds under both Statistical Query and Low-Degree Polynomial frameworks, showing that any efficient algorithm requires n at least on the order of d^{k*/2}, while a partial-trace estimator achieves matching upper bounds, establishing a sharp computational-to-statistical gap when k*(P) > 2. They further show that for any k there exist smooth link functions yielding k*(P)=k, and provide an information-theoretic upper bound of n = ñ d/(λ_k^2 ε^2) for recovery, indicating the gap is intrinsic to the problem class rather than a limitation of a specific method. The paper also connects these results to NGCA, Tensor PCA, and CLWE, and discusses extensions to unknown distributions P and multi-index settings, highlighting both theoretical and practical implications for high-dimensional inference with planted structure.

Abstract

Single-Index Models are high-dimensional regression problems with planted structure, whereby labels depend on an unknown one-dimensional projection of the input via a generic, non-linear, and potentially non-deterministic transformation. As such, they encompass a broad class of statistical inference tasks, and provide a rich template to study statistical and computational trade-offs in the high-dimensional regime. While the information-theoretic sample complexity to recover the hidden direction is linear in the dimension $d$, we show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Ω(d^{k^\star/2})$ samples, where $k^\star$ is a "generative" exponent associated with the model that we explicitly characterize. Moreover, we show that this sample complexity is also sufficient, by establishing matching upper bounds using a partial-trace algorithm. Therefore, our results provide evidence of a sharp computational-to-statistical gap (under both the SQ and LDP class) whenever $k^\star>2$. To complete the study, we provide examples of smooth and Lipschitz deterministic target functions with arbitrarily large generative exponents $k^\star$.

Computational-Statistical Gaps in Gaussian Single-Index Models

TL;DR

Abstract

, we show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require

samples, where

is a "generative" exponent associated with the model that we explicitly characterize. Moreover, we show that this sample complexity is also sufficient, by establishing matching upper bounds using a partial-trace algorithm. Therefore, our results provide evidence of a sharp computational-to-statistical gap (under both the SQ and LDP class) whenever

. To complete the study, we provide examples of smooth and Lipschitz deterministic target functions with arbitrarily large generative exponents

Paper Structure (50 sections, 61 theorems, 237 equations, 5 figures, 2 algorithms)

This paper contains 50 sections, 61 theorems, 237 equations, 5 figures, 2 algorithms.

Introduction
Problem Setup
Background and Related Work
Summary of Main Results
Notations
Hermite Polynomials
Acknowledgements:
The Generative Exponent
Computational Lower Bounds
From CSQ to SQ lower bounds
SQ Framework for Single-Index Models
The Low Degree Polynomial Method
Discussion
Relationship between SQ and LD lower bounds
NGCA aka 'Gaussian Pancakes'
...and 35 more sections

Key Result

Theorem 1.2

Given $\mathsf{P} \in \mathcal{G}$ and $n$ i.i.d. samples from $\mathbb{P}_{w^\star, \mathsf{P}}$, there exists an explicit exponent ${k^\star} = {k^\star}(\mathsf{P})<\infty$ such that no polynomial time SQ-algorithm can succeed in recovering $w^\star$ unless $n \gtrsim d^{{k^\star}/2}$.

Figures (5)

Figure 1: Visualization of the joint density $\mathsf{P}$ of $(Z,Y)$ for the additive noise model, multiplicative noise model, and mixture of distributions model. The heatmap shows the density of $\mathsf{P}$ and the plots to the left of and below the heatmap show the densities of the marginals $\mathsf{P}_y$ and $\mathsf{P}_z$ respectively.
Figure 2: We plot three examples of a joint distribution $\mathsf{P}$ of $(Z,Y)$, the witness function $\xi_{k^\star}(y)$, and the joint distribution of $(Z,\xi_k(Y))$. In the first example, $Y = c_1 x + \sin(c_2 Z)$ for constants $c_1,c_2$ such that $\beta_1 = 0$. The transformation $\xi_1$ zeros out the bulk and amplifies the caps of the curve in order to lower the information exponent from ${l^\star}(\mathsf{P}) = 2$ to ${l^\star}((\mathrm{Id} \otimes \zeta_1)_\#\mathsf{P}) = 1$. As a result, ${k^\star}(\mathsf{P}) = 1$. In the second example, the model $Y = Z \xi$ has multiplicative Gaussian noise and $E[Y|Z] = 0$ so this model has ${l^\star}(\mathsf{P}) = \infty$. The transformation $\zeta_2$ interpolates between $\sqrt{|Y|}$ for $|Y| \approx 0$ and $|Y|$ for $|Y|$ farther from $0$. The transformed distribution (right column) now has ${l^\star}((\mathrm{Id} \otimes \xi_k)_\# \mathsf{P}) = 2$ so ${k^\star}(\mathsf{P}) = 2$. The third example is the distribution used in mondelli2018fundamental as an example where ${k^\star} > 2$. In this case, we verify that ${k^\star} = {l^\star} = 4$ so the tight sample complexity for the single index model corresponding to this choice of $\mathsf{P}$ is $n \gtrsim d^2$.
Figure 3: For every $k$, $Y=h_k(Z)$ has generative exponent $1$ if $k$ is odd and $2$ if $k$ is even. In particular, the difficulty of learning the single index model defined by $\mathsf{P} = (\mathrm{Id} \otimes h_k)_\# \gamma_1$ does not grow with $k$.
Figure 4: We empirically verify our observation in \ref{['lem:partial_trace_even']} that $M_n$ satisfies Gaussian universality for ${k^\star} > 2$. We take ${k^\star} = 4$ and compute $M_n$ for $d = 4096$ and varying $\delta = \frac{n}{d^2}$. The target function is $Y = \zeta_4(Z^2 e^{-Z^2})$ (see \ref{['fig:examples_zeta']}). Here $\delta^\star = \frac{\mathbb{E}[Y^2]}{\beta_k^2 k(k-1)!!}$ is the predicted BBP threshold baik2005phase. The dots represent the medians over 10 random seeds of each quantity (eigenvalues and eigenvector correlation) and the error bars represent the standard deviation over these 10 trials.
Figure 5: Explicit constructions of $\sigma$ with different prescribed generative exponents. These were generated by numerically integrating the ODE in \ref{['eq:keep_k_ode']}.

Theorems & Definitions (123)

Definition 1.1: Gaussian Single-Index Model
Theorem 1.2: SQ lower bound, informal version of \ref{['thm:sq_lower_bound']}
Theorem 1.3: Low-degree method detection lower bound, informal version of \ref{['thm:low_degree']}
Theorem 1.4: informal version of \ref{['thm:optimal_sq_alg']} and \ref{['coro:partialtrace']}
Theorem 1.5: informal version of \ref{['thm:smooth_link']} and \ref{['thm:additive_noise_link']}
Definition 2.1: Information Exponent revisited
Lemma 2.1: Mutual Information Decomposition
Definition 2.2: Generative Exponent
Remark 2.3
Proposition 2.3: A Variational Representation
...and 113 more

Computational-Statistical Gaps in Gaussian Single-Index Models

TL;DR

Abstract

Computational-Statistical Gaps in Gaussian Single-Index Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (123)