Sparse Linear Regression when Noises and Covariates are Heavy-Tailed and Contaminated by Outliers
Takeyuki Sasai, Hironori Fujisawa
TL;DR
This work tackles robust sparse linear regression under heavy-tailed covariates and noise, including adversarial outliers. It combines a covariate thresholding step with an $\ell_1$-penalized Huber regression to yield tractable estimators with non-asymptotic guarantees, under a finite-kurtosis assumption and a restricted eigenvalue condition. The paper then extends to contaminated settings by introducing ROBUST-SPARSE-ESTIMATION II, which uses a robust pre-processing via COMPUTE-WEIGHT (a sparse-PCA–style SDP) and weighted Huber regression to achieve error bounds that scale as $\sqrt{s\log(d/\delta)/n} + \sqrt{o/n}$, with explicit rates in terms of problem constants. The results advance the understanding of high-dimensional robust estimation by delivering sharp, provable guarantees under heavy tails and outliers, while preserving computational tractability. When compared to prior work, the methods explicitly address finite kurtosis covariates and outlier contamination, offering a clear framework for robust sparse recovery in challenging data regimes with practical implications for high-dimensional statistics and econometrics.
Abstract
We investigate a problem estimating coefficients of linear regression under sparsity assumption when covariates and noises are sampled from heavy tailed distributions. Additionally, we consider the situation where not only covariates and noises are sampled from heavy tailed distributions but also contaminated by outliers. Our estimators can be computed efficiently, and exhibit sharp error bounds.
