Computational-Statistical Gaps in Gaussian Single-Index Models
Alex Damian, Loucas Pillaud-Vivien, Jason D. Lee, Joan Bruna
TL;DR
This work investigates Gaussian single-index models with planted one-dimensional structure and identifies a fundamental generative exponent k*(P) that governs the computational difficulty of recovering the hidden direction w*. The authors prove tight lower bounds under both Statistical Query and Low-Degree Polynomial frameworks, showing that any efficient algorithm requires n at least on the order of d^{k*/2}, while a partial-trace estimator achieves matching upper bounds, establishing a sharp computational-to-statistical gap when k*(P) > 2. They further show that for any k there exist smooth link functions yielding k*(P)=k, and provide an information-theoretic upper bound of n = ñ d/(λ_k^2 ε^2) for recovery, indicating the gap is intrinsic to the problem class rather than a limitation of a specific method. The paper also connects these results to NGCA, Tensor PCA, and CLWE, and discusses extensions to unknown distributions P and multi-index settings, highlighting both theoretical and practical implications for high-dimensional inference with planted structure.
Abstract
Single-Index Models are high-dimensional regression problems with planted structure, whereby labels depend on an unknown one-dimensional projection of the input via a generic, non-linear, and potentially non-deterministic transformation. As such, they encompass a broad class of statistical inference tasks, and provide a rich template to study statistical and computational trade-offs in the high-dimensional regime. While the information-theoretic sample complexity to recover the hidden direction is linear in the dimension $d$, we show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Ω(d^{k^\star/2})$ samples, where $k^\star$ is a "generative" exponent associated with the model that we explicitly characterize. Moreover, we show that this sample complexity is also sufficient, by establishing matching upper bounds using a partial-trace algorithm. Therefore, our results provide evidence of a sharp computational-to-statistical gap (under both the SQ and LDP class) whenever $k^\star>2$. To complete the study, we provide examples of smooth and Lipschitz deterministic target functions with arbitrarily large generative exponents $k^\star$.
