Support estimation in high-dimensional heteroscedastic mean regression
Philipp Hermann, Hajo Holzmann
TL;DR
This work addresses support estimation in high-dimensional linear mean regression under random design with heteroscedastic, heavy-tailed errors by proposing a weighted adaptive LASSO estimator built on a smooth pseudo Huber loss. The authors prove sign-consistency and optimal $\ell_\infty$-convergence rates under mild moment conditions, carefully handling the potential mismatch between the robustified and true supports via a primal-dual witness analysis. The theory accommodates two routes depending on the initial estimator’s accuracy ($\ell_\infty$ vs $\ell_2/\ell_1$ bounds) and prescribes scaling $\alpha_n \asymp (\log p / n)^{1/2}$ and $\lambda_n \asymp (\log p)/n$, with a beta-min condition ensuring exact support recovery. Empirical results, including simulations with heavy tails and heteroscedasticity and a riboflavin real-data example, show the proposed method is robust and competitive, especially when combined with knockoffs for false discovery rate control.
Abstract
A current strand of research in high-dimensional statistics deals with robustifying the available methodology with respect to deviations from the pervasive light-tail assumptions. In this paper we consider a linear mean regression model with random design and potentially heteroscedastic, heavy-tailed errors, and investigate support estimation in this framework. We use a strictly convex, smooth variant of the Huber loss function with tuning parameter depending on the parameters of the problem, as well as the adaptive LASSO penalty for computational efficiency. For the resulting estimator we show sign-consistency and optimal rates of convergence in the $\ell_\infty$ norm as in the homoscedastic, light-tailed setting. In our analysis, we have to deal with the issue that the support of the target parameter in the linear mean regression model and its robustified version may differ substantially even for small values of the tuning parameter of the Huber loss function. Simulations illustrate the favorable numerical performance of the proposed methodology.
