Derivatives and residual distribution of regularized M-estimators with application to adaptive tuning
Pierre C Bellec, Yiwei Shen
TL;DR
This work develops a framework for robust, regularized M-estimators in linear models with Gaussian design by deriving differentiability properties with respect to both responses and designs, and by characterizing the residual distribution in high-dimensional regimes. It introduces a data-driven adaptive tuning criterion that proxies out-of-sample error without requiring knowledge of the noise distribution or design covariance, using observable quantities such as the estimated degrees of freedom and a residual-based matrix $\boldsymbol{V}$. A stochastic representation links residuals to a Gaussian term whose magnitude reflects the estimator’s out-of-sample error, and the analysis reveals new connections between derivatives and effective degrees of freedom. The paper specializes to the Huber loss with Elastic-Net penalty, provides practical active-set expressions, and validates the theory via simulations, including heavy-tailed noise and anisotropic designs. It also shows how the strong convexity assumption can be relaxed using Lipschitz extensions, broadening applicability to non-smooth penalties typical in high-dimensional robust regression.
Abstract
This paper studies M-estimators with gradient-Lipschitz loss function regularized with convex penalty in linear models with Gaussian design matrix and arbitrary noise distribution. A practical example is the robust M-estimator constructed with the Huber loss and the Elastic-Net penalty and the noise distribution has heavy-tails. Our main contributions are three-fold. (i) We provide general formulae for the derivatives of regularized M-estimators $\hatβ(y,X)$ where differentiation is taken with respect to both $y$ and $X$; this reveals a simple differentiability structure shared by all convex regularized M-estimators. (ii) Using these derivatives, we characterize the distribution of the residual $r_i = y_i-x_i^\top\hatβ$ in the intermediate high-dimensional regime where dimension and sample size are of the same order. (iii) Motivated by the distribution of the residuals, we propose a novel adaptive criterion to select tuning parameters of regularized M-estimators. The criterion approximates the out-of-sample error up to an additive constant independent of the estimator, so that minimizing the criterion provides a proxy for minimizing the out-of-sample error. The proposed adaptive criterion does not require the knowledge of the noise distribution or of the covariance of the design. Simulated data confirms the theoretical findings, regarding both the distribution of the residuals and the success of the criterion as a proxy of the out-of-sample error. Finally our results reveal new relationships between the derivatives of $\hatβ(y,X)$ and the effective degrees of freedom of the M-estimator, which are of independent interest.
