Sparse Linear Regression when Noises and Covariates are Heavy-Tailed and Contaminated by Outliers

Takeyuki Sasai; Hironori Fujisawa

Sparse Linear Regression when Noises and Covariates are Heavy-Tailed and Contaminated by Outliers

Takeyuki Sasai, Hironori Fujisawa

TL;DR

This work tackles robust sparse linear regression under heavy-tailed covariates and noise, including adversarial outliers. It combines a covariate thresholding step with an $\ell_1$-penalized Huber regression to yield tractable estimators with non-asymptotic guarantees, under a finite-kurtosis assumption and a restricted eigenvalue condition. The paper then extends to contaminated settings by introducing ROBUST-SPARSE-ESTIMATION II, which uses a robust pre-processing via COMPUTE-WEIGHT (a sparse-PCA–style SDP) and weighted Huber regression to achieve error bounds that scale as $\sqrt{s\log(d/\delta)/n} + \sqrt{o/n}$, with explicit rates in terms of problem constants. The results advance the understanding of high-dimensional robust estimation by delivering sharp, provable guarantees under heavy tails and outliers, while preserving computational tractability. When compared to prior work, the methods explicitly address finite kurtosis covariates and outlier contamination, offering a clear framework for robust sparse recovery in challenging data regimes with practical implications for high-dimensional statistics and econometrics.

Abstract

We investigate a problem estimating coefficients of linear regression under sparsity assumption when covariates and noises are sampled from heavy tailed distributions. Additionally, we consider the situation where not only covariates and noises are sampled from heavy tailed distributions but also contaminated by outliers. Our estimators can be computed efficiently, and exhibit sharp error bounds.

Sparse Linear Regression when Noises and Covariates are Heavy-Tailed and Contaminated by Outliers

TL;DR

This work tackles robust sparse linear regression under heavy-tailed covariates and noise, including adversarial outliers. It combines a covariate thresholding step with an

-penalized Huber regression to yield tractable estimators with non-asymptotic guarantees, under a finite-kurtosis assumption and a restricted eigenvalue condition. The paper then extends to contaminated settings by introducing ROBUST-SPARSE-ESTIMATION II, which uses a robust pre-processing via COMPUTE-WEIGHT (a sparse-PCA–style SDP) and weighted Huber regression to achieve error bounds that scale as

, with explicit rates in terms of problem constants. The results advance the understanding of high-dimensional robust estimation by delivering sharp, provable guarantees under heavy tails and outliers, while preserving computational tractability. When compared to prior work, the methods explicitly address finite kurtosis covariates and outlier contamination, offering a clear framework for robust sparse recovery in challenging data regimes with practical implications for high-dimensional statistics and econometrics.

Abstract

Paper Structure (47 sections, 16 theorems, 154 equations, 5 algorithms)

This paper contains 47 sections, 16 theorems, 154 equations, 5 algorithms.

Introduction
Method, result and related work
Method
THRESHOLDING
PENALIZED-HUBER-REGRESSION
Result
Related work
Case of contamination
Method
COMPUTE-WEIGHT
Result
Related work
Key propositions for Theorems \ref{['t:main:no']} and Theorems \ref{['t:main']}
Auxiliary lemmas
Preparation of proof of Proposition \ref{['p:main']}
...and 32 more sections

Key Result

Theorem 2.1

Suppose that Assumption a:1 holds. Suppose that the parameters $\tau_\mathbf{x}\,,\lambda_o$ and $\lambda_s$ satisfy where $c_s\geq 16$, and $r_\Sigma,\,r_1$ and $r_2$ satisfy where $c_{r_1} = c_r(1+c_{\mathrm{RE}})/\kappa$, $c_{r_2} = c_r(1+c_{\mathrm{RE}})/\kappa_\mathrm{l}$ and $c_r\geq 6$. Assume that $r_\Sigma\leq 1$ and Then, with probability at least $1-2\delta$, the output of ROBUST-SP

Theorems & Definitions (29)

Definition 1.1: Finite kurtosis distribution
Definition 2.1: Restricted eigenvalue condition of the covariance matrix
Theorem 2.1
Remark 2.1
Remark 2.2
Proposition 3.1
Proposition 3.2
Theorem 3.1
Remark 3.1
Remark 3.2
...and 19 more

Sparse Linear Regression when Noises and Covariates are Heavy-Tailed and Contaminated by Outliers

TL;DR

Abstract

Sparse Linear Regression when Noises and Covariates are Heavy-Tailed and Contaminated by Outliers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (29)