Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination

Ilias Diakonikolas; Chao Gao; Daniel M. Kane; John Lafferty; Ankit Pensia

Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination

Ilias Diakonikolas, Chao Gao, Daniel M. Kane, John Lafferty, Ankit Pensia

TL;DR

This work establishes a fundamental information-computation tradeoff for noiseless linear regression with Gaussian covariates under oblivious contamination in the responses. By embedding the problem into a testing task and constructing a careful contamination model using a discrete Gaussian distribution, the authors reduce to a conditional NGCA framework and apply Gaussian Fourier analysis to prove SQ lower bounds. The main contribution is a formal lower bound showing that any efficient SQ algorithm requires a simulation complexity of at least $\tilde{\Omega}(\sqrt{d}/\alpha^2)$, implying a quadratic dependence on $1/\alpha$ that cannot be avoided by SQ methods. The results illuminate intrinsic computational barriers in robust, high-dimensional regression under weak contamination, while leaving open the precise dependence on dimension $d$ for the optimal computational-sample complexity and inviting further exploration of the low-degree polynomial regime.

Abstract

We study the task of noiseless linear regression under Gaussian covariates in the presence of additive oblivious contamination. Specifically, we are given i.i.d.\ samples from a distribution $(x, y)$ on $\mathbb{R}^d \times \mathbb{R}$ with $x \sim \mathcal{N}(0,\mathbf{I}_d)$ and $y = x^\top β+ z$, where $z$ is drawn independently of $x$ from an unknown distribution $E$. Moreover, $z$ satisfies $\mathbb{P}_E[z = 0] = α>0$. The goal is to accurately recover the regressor $β$ to small $\ell_2$-error. Ignoring computational considerations, this problem is known to be solvable using $O(d/α)$ samples. On the other hand, the best known polynomial-time algorithms require $Ω(d/α^2)$ samples. Here we provide formal evidence that the quadratic dependence in $1/α$ is inherent for efficient algorithms. Specifically, we show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least $\tildeΩ(d^{1/2}/α^2)$.

Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination

TL;DR

, implying a quadratic dependence on

that cannot be avoided by SQ methods. The results illuminate intrinsic computational barriers in robust, high-dimensional regression under weak contamination, while leaving open the precise dependence on dimension

for the optimal computational-sample complexity and inviting further exploration of the low-degree polynomial regime.

Abstract

We study the task of noiseless linear regression under Gaussian covariates in the presence of additive oblivious contamination. Specifically, we are given i.i.d.\ samples from a distribution

with

and

, where

is drawn independently of

from an unknown distribution

. Moreover,

satisfies

. The goal is to accurately recover the regressor

to small

-error. Ignoring computational considerations, this problem is known to be solvable using

samples. On the other hand, the best known polynomial-time algorithms require

samples. Here we provide formal evidence that the quadratic dependence in

is inherent for efficient algorithms. Specifically, we show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least

Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination

TL;DR

Abstract

Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (52)