Table of Contents
Fetching ...

Sparse Linear Regression when Noises and Covariates are Heavy-Tailed and Contaminated by Outliers

Takeyuki Sasai, Hironori Fujisawa

TL;DR

This work tackles robust sparse linear regression under heavy-tailed covariates and noise, including adversarial outliers. It combines a covariate thresholding step with an $\ell_1$-penalized Hub­er regression to yield tractable estimators with non-asymptotic guarantees, under a finite-kurtosis assumption and a restricted eigenvalue condition. The paper then extends to contaminated settings by introducing ROBUST-SPARSE-ESTIMATION II, which uses a robust pre-processing via COMPUTE-WEIGHT (a sparse-PCA–style SDP) and weighted Hub­er regression to achieve error bounds that scale as $\sqrt{s\log(d/\delta)/n} + \sqrt{o/n}$, with explicit rates in terms of problem constants. The results advance the understanding of high-dimensional robust estimation by delivering sharp, provable guarantees under heavy tails and outliers, while preserving computational tractability. When compared to prior work, the methods explicitly address finite kurtosis covariates and outlier contamination, offering a clear framework for robust sparse recovery in challenging data regimes with practical implications for high-dimensional statistics and econometrics.

Abstract

We investigate a problem estimating coefficients of linear regression under sparsity assumption when covariates and noises are sampled from heavy tailed distributions. Additionally, we consider the situation where not only covariates and noises are sampled from heavy tailed distributions but also contaminated by outliers. Our estimators can be computed efficiently, and exhibit sharp error bounds.

Sparse Linear Regression when Noises and Covariates are Heavy-Tailed and Contaminated by Outliers

TL;DR

This work tackles robust sparse linear regression under heavy-tailed covariates and noise, including adversarial outliers. It combines a covariate thresholding step with an -penalized Hub­er regression to yield tractable estimators with non-asymptotic guarantees, under a finite-kurtosis assumption and a restricted eigenvalue condition. The paper then extends to contaminated settings by introducing ROBUST-SPARSE-ESTIMATION II, which uses a robust pre-processing via COMPUTE-WEIGHT (a sparse-PCA–style SDP) and weighted Hub­er regression to achieve error bounds that scale as , with explicit rates in terms of problem constants. The results advance the understanding of high-dimensional robust estimation by delivering sharp, provable guarantees under heavy tails and outliers, while preserving computational tractability. When compared to prior work, the methods explicitly address finite kurtosis covariates and outlier contamination, offering a clear framework for robust sparse recovery in challenging data regimes with practical implications for high-dimensional statistics and econometrics.

Abstract

We investigate a problem estimating coefficients of linear regression under sparsity assumption when covariates and noises are sampled from heavy tailed distributions. Additionally, we consider the situation where not only covariates and noises are sampled from heavy tailed distributions but also contaminated by outliers. Our estimators can be computed efficiently, and exhibit sharp error bounds.
Paper Structure (47 sections, 16 theorems, 154 equations, 5 algorithms)

This paper contains 47 sections, 16 theorems, 154 equations, 5 algorithms.

Key Result

Theorem 2.1

Suppose that Assumption a:1 holds. Suppose that the parameters $\tau_\mathbf{x}\,,\lambda_o$ and $\lambda_s$ satisfy where $c_s\geq 16$, and $r_\Sigma,\,r_1$ and $r_2$ satisfy where $c_{r_1} = c_r(1+c_{\mathrm{RE}})/\kappa$, $c_{r_2} = c_r(1+c_{\mathrm{RE}})/\kappa_\mathrm{l}$ and $c_r\geq 6$. Assume that $r_\Sigma\leq 1$ and Then, with probability at least $1-2\delta$, the output of ROBUST-SP

Theorems & Definitions (29)

  • Definition 1.1: Finite kurtosis distribution
  • Definition 2.1: Restricted eigenvalue condition of the covariance matrix
  • Theorem 2.1
  • Remark 2.1
  • Remark 2.2
  • Proposition 3.1
  • Proposition 3.2
  • Theorem 3.1
  • Remark 3.1
  • Remark 3.2
  • ...and 19 more