Error estimation and adaptive tuning for unregularized robust M-estimator

Pierre C. Bellec; Takuya Koriyama

Error estimation and adaptive tuning for unregularized robust M-estimator

Pierre C. Bellec, Takuya Koriyama

TL;DR

This paper develops a principled framework for error estimation and adaptive tuning in unregularized robust M-estimation under high-dimensional proportional asymptotics. By introducing an observable out-of-sample risk proxy $\hat{R}$ and proving its consistency with the true risk $R$, the authors enable data-driven loss selection and scale tuning without knowing the noise or design parameters. The core technique combines Ridge smoothing with a careful differentiable analysis to bridge unregularized estimators and their smoothed counterparts, yielding rigorous guarantees and an optimal grid-based tuning procedure. Numerical experiments validate the risk estimator and the adaptive tuning method for losses including the Huber loss, and illustrate robustness to covariate distributions and noise scales, highlighting practical implications for robust high-dimensional regression.

Abstract

We consider unregularized robust M-estimators for linear models under Gaussian design and heavy-tailed noise, in the proportional asymptotics regime where the sample size n and the number of features p are both increasing such that $p/n \to γ\in (0,1)$. An estimator of the out-of-sample error of a robust M-estimator is analyzed and proved to be consistent for a large family of loss functions that includes the Huber loss. As an application of this result, we propose an adaptive tuning procedure of the scale parameter $λ>0$ of a given loss function $ρ$: choosing $\hat λ$ in a given interval $I$ that minimizes the out-of-sample error estimate of the M-estimator constructed with loss $ρ_λ(\cdot) = λ^2 ρ(\cdot/λ)$ leads to the optimal out-of-sample error over $I$. The proof relies on a smoothing argument: the unregularized M-estimation objective function is perturbed, or smoothed, with a Ridge penalty that vanishes as $n\to+\infty$, and shows that the unregularized M-estimator of interest inherits properties of its smoothed version.

Error estimation and adaptive tuning for unregularized robust M-estimator

TL;DR

and proving its consistency with the true risk

, the authors enable data-driven loss selection and scale tuning without knowing the noise or design parameters. The core technique combines Ridge smoothing with a careful differentiable analysis to bridge unregularized estimators and their smoothed counterparts, yielding rigorous guarantees and an optimal grid-based tuning procedure. Numerical experiments validate the risk estimator and the adaptive tuning method for losses including the Huber loss, and illustrate robustness to covariate distributions and noise scales, highlighting practical implications for robust high-dimensional regression.

Abstract

. An estimator of the out-of-sample error of a robust M-estimator is analyzed and proved to be consistent for a large family of loss functions that includes the Huber loss. As an application of this result, we propose an adaptive tuning procedure of the scale parameter

of a given loss function

: choosing

in a given interval

that minimizes the out-of-sample error estimate of the M-estimator constructed with loss

leads to the optimal out-of-sample error over

. The proof relies on a smoothing argument: the unregularized M-estimation objective function is perturbed, or smoothed, with a Ridge penalty that vanishes as

, and shows that the unregularized M-estimator of interest inherits properties of its smoothed version.

Paper Structure (30 sections, 24 theorems, 158 equations, 9 figures)

This paper contains 30 sections, 24 theorems, 158 equations, 9 figures.

Introduction
Results at a glance
Related work
Precise analysis of a perturbed M-estimator
Organization
Notation
Estimation of the out-of-sample error
Adaptive tuning of scale parameters
Numerical simulations
Outline of the proof
Differentiability of M-estimators
Consistency of the risk estimate for the smoothed M-estimator
Back to the original M-estimator
Existence and uniqueness of solutions to \ref{['eq:nonlinear']}
Proof of \ref{['th:ofs']}
...and 15 more sections

Key Result

Theorem 1

Assume that $(\rho, F_\epsilon)$ satisfy as:loss and as:noise. Let $\psi=\rho':\mathbb{R}\to\mathbb{R}$ be the derivative of the loss, and $\bm V$ be the Jacobian matrix $(\partial/\partial \bm{y})\psi(\bm{y}-\bm X\hat{\bm \beta})\in\mathbb{R}^{n\times n}$. Then, as $n, p\to\infty$ with $p/n\to \gam

Figures (9)

Figure 1: \ref{['subfig:risk_estimate_intro']} is the plot of the out-of-sample error $R$ and its estimator $\hat{R}$ with the Huber loss for different scaling parameter $\lambda>0$. \ref{['subfig:adaptive_tuning_intro']} is the plot of the oracle out-of-sample error $R(\lambda_{\text{opt}})$ and the out-of-sample error $R(\hat{\lambda})$ with $\hat{\lambda}$ being the minimizer of the estimator $\hat{R}$ among a finite grid $I$, as the scale of noise changes. See \ref{['sec:numeric']} for the details.
Figure 2: Example of loss satisfying \ref{['as:loss']}
Figure 3: Plot of the out-of-sample error $R(\lambda)$ and estimator $\hat{R}(\lambda)$ over 100 repetitions, with $n=4000$, $p=1200$, for the Huber loss for different values of the scale parameters $\lambda$. The noise distribution is $\text{t-dist}(\text{df}=2)$. $\alpha^2(\lambda)$ is the solution to the nonlinear system \ref{['eq:nonlinear']}.
Figure 4: Adaptive tuning with the scale of noise $\sigma$ changing as $F_\epsilon = \sigma \cdot \text{t-dist}(\text{df}=2)$. Here, $I=[1,10]$, $n=4000$, $p=1200$, and $I_N$ is the uniform grid in log-scale of length $101$. ${R}(\hat{\lambda}_N)$ is the out-of-sample error with $\lambda$ selected by $\hat{R}$, $\min_{\lambda\in I_N} R(\lambda)$ is the optimal out-of-sample error among $I_N$, and $\min_{\lambda\in I}\alpha^2(\lambda)$ is the theoretically optimal risk limit. We repeat $100$ times.
Figure 5: Plot of the out-of-sample error $R(\lambda)$ and estimator $\hat{R}(\lambda)$ over 100 repetitions, with $n=4000$, $p=1200$, the Huber loss for different values of the scale parameters $\lambda$. The noise distribution is $3 \lceil \text{t-dist} (\text{df}=2) \rceil$. $\alpha^2(\lambda)$ is the theoretical limit given by the nonlinear system \ref{['eq:intro_nonlinear']}.
...and 4 more figures

Theorems & Definitions (27)

Remark 1
Theorem 1
Proposition 1
Corollary 1
Proposition 2
Theorem 2
Remark 2
Theorem 3
Proposition 3: Proposition 4.1 in bellec2023out
Theorem 4: Theorem 1 in bellec2022derivatives
...and 17 more

Error estimation and adaptive tuning for unregularized robust M-estimator

TL;DR

Abstract

Error estimation and adaptive tuning for unregularized robust M-estimator

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (27)