Table of Contents
Fetching ...

Statistical Inference for High-Dimensional Robust Linear Regression Models via Recursive Online-Score Estimation

Dian Zheng, Lingzhou Xue

TL;DR

The paper tackles inference for high-dimensional robust linear regression under nonconvex penalized M-estimation. It extends recursive online-score estimation (ROSE) to robust settings with a data-driven active-set, initial nonconvex estimator, and online estimating equations to produce valid confidence intervals for a low-dimensional coefficient. Key contributions include a nonconvex landscape analysis with estimation bounds, a computationally efficient composite gradient algorithm, and a proof of asymptotic normality for CI construction, supported by simulations under contamination and heavy tails and a riboflavin-data application. The work delivers a principled, scalable toolkit for robust, high-dimensional inference that remains reliable when standard convex methods falter due to heavy-tailed noise or outliers.

Abstract

This paper introduces a novel framework for estimation and inference in penalized M-estimators applied to robust high-dimensional linear regression models. Traditional methods for high-dimensional statistical inference, which predominantly rely on convex likelihood-based approaches, struggle to address the nonconvexity inherent in penalized M-estimation with nonconvex objective functions. Our proposed method extends the recursive online score estimation (ROSE) framework of Shi et al. (2021) to robust high-dimensional settings by developing a recursive score equation based on penalized M-estimation, explicitly addressing nonconvexity. We establish the statistical consistency and asymptotic normality of the resulting estimator, providing a rigorous foundation for valid inference in robust high-dimensional regression. The effectiveness of our method is demonstrated through simulation studies and a real-world application, showcasing its superior performance compared to existing approaches.

Statistical Inference for High-Dimensional Robust Linear Regression Models via Recursive Online-Score Estimation

TL;DR

The paper tackles inference for high-dimensional robust linear regression under nonconvex penalized M-estimation. It extends recursive online-score estimation (ROSE) to robust settings with a data-driven active-set, initial nonconvex estimator, and online estimating equations to produce valid confidence intervals for a low-dimensional coefficient. Key contributions include a nonconvex landscape analysis with estimation bounds, a computationally efficient composite gradient algorithm, and a proof of asymptotic normality for CI construction, supported by simulations under contamination and heavy tails and a riboflavin-data application. The work delivers a principled, scalable toolkit for robust, high-dimensional inference that remains reliable when standard convex methods falter due to heavy-tailed noise or outliers.

Abstract

This paper introduces a novel framework for estimation and inference in penalized M-estimators applied to robust high-dimensional linear regression models. Traditional methods for high-dimensional statistical inference, which predominantly rely on convex likelihood-based approaches, struggle to address the nonconvexity inherent in penalized M-estimation with nonconvex objective functions. Our proposed method extends the recursive online score estimation (ROSE) framework of Shi et al. (2021) to robust high-dimensional settings by developing a recursive score equation based on penalized M-estimation, explicitly addressing nonconvexity. We establish the statistical consistency and asymptotic normality of the resulting estimator, providing a rigorous foundation for valid inference in robust high-dimensional regression. The effectiveness of our method is demonstrated through simulation studies and a real-world application, showcasing its superior performance compared to existing approaches.

Paper Structure

This paper contains 12 sections, 3 theorems, 23 equations, 3 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Under Assumption 1, further assume $\left\|\boldsymbol{\beta}_0\right\|_0 \leq s_0$ and $\left\|\boldsymbol{\beta}_0\right\|_2 \leq r / 2$. Then there exist constants $C_n, C_\lambda, C_s$ and $\varepsilon_0$ depending on $\left(L, C_g, r, \tau^2, \underline{\gamma}, \delta\right)$ and the function

Figures (3)

  • Figure 1: Overview of the ROSE method
  • Figure 2: The Correlogram of the detected genes
  • Figure :

Theorems & Definitions (3)

  • Theorem 1
  • Proposition 1
  • Theorem 2