Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination
Ilias Diakonikolas, Chao Gao, Daniel M. Kane, John Lafferty, Ankit Pensia
TL;DR
This work establishes a fundamental information-computation tradeoff for noiseless linear regression with Gaussian covariates under oblivious contamination in the responses. By embedding the problem into a testing task and constructing a careful contamination model using a discrete Gaussian distribution, the authors reduce to a conditional NGCA framework and apply Gaussian Fourier analysis to prove SQ lower bounds. The main contribution is a formal lower bound showing that any efficient SQ algorithm requires a simulation complexity of at least $\tilde{\Omega}(\sqrt{d}/\alpha^2)$, implying a quadratic dependence on $1/\alpha$ that cannot be avoided by SQ methods. The results illuminate intrinsic computational barriers in robust, high-dimensional regression under weak contamination, while leaving open the precise dependence on dimension $d$ for the optimal computational-sample complexity and inviting further exploration of the low-degree polynomial regime.
Abstract
We study the task of noiseless linear regression under Gaussian covariates in the presence of additive oblivious contamination. Specifically, we are given i.i.d.\ samples from a distribution $(x, y)$ on $\mathbb{R}^d \times \mathbb{R}$ with $x \sim \mathcal{N}(0,\mathbf{I}_d)$ and $y = x^\top β+ z$, where $z$ is drawn independently of $x$ from an unknown distribution $E$. Moreover, $z$ satisfies $\mathbb{P}_E[z = 0] = α>0$. The goal is to accurately recover the regressor $β$ to small $\ell_2$-error. Ignoring computational considerations, this problem is known to be solvable using $O(d/α)$ samples. On the other hand, the best known polynomial-time algorithms require $Ω(d/α^2)$ samples. Here we provide formal evidence that the quadratic dependence in $1/α$ is inherent for efficient algorithms. Specifically, we show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least $\tildeΩ(d^{1/2}/α^2)$.
