Table of Contents
Fetching ...

Error estimation and adaptive tuning for unregularized robust M-estimator

Pierre C. Bellec, Takuya Koriyama

TL;DR

This paper develops a principled framework for error estimation and adaptive tuning in unregularized robust M-estimation under high-dimensional proportional asymptotics. By introducing an observable out-of-sample risk proxy $\hat{R}$ and proving its consistency with the true risk $R$, the authors enable data-driven loss selection and scale tuning without knowing the noise or design parameters. The core technique combines Ridge smoothing with a careful differentiable analysis to bridge unregularized estimators and their smoothed counterparts, yielding rigorous guarantees and an optimal grid-based tuning procedure. Numerical experiments validate the risk estimator and the adaptive tuning method for losses including the Huber loss, and illustrate robustness to covariate distributions and noise scales, highlighting practical implications for robust high-dimensional regression.

Abstract

We consider unregularized robust M-estimators for linear models under Gaussian design and heavy-tailed noise, in the proportional asymptotics regime where the sample size n and the number of features p are both increasing such that $p/n \to γ\in (0,1)$. An estimator of the out-of-sample error of a robust M-estimator is analyzed and proved to be consistent for a large family of loss functions that includes the Huber loss. As an application of this result, we propose an adaptive tuning procedure of the scale parameter $λ>0$ of a given loss function $ρ$: choosing $\hat λ$ in a given interval $I$ that minimizes the out-of-sample error estimate of the M-estimator constructed with loss $ρ_λ(\cdot) = λ^2 ρ(\cdot/λ)$ leads to the optimal out-of-sample error over $I$. The proof relies on a smoothing argument: the unregularized M-estimation objective function is perturbed, or smoothed, with a Ridge penalty that vanishes as $n\to+\infty$, and shows that the unregularized M-estimator of interest inherits properties of its smoothed version.

Error estimation and adaptive tuning for unregularized robust M-estimator

TL;DR

This paper develops a principled framework for error estimation and adaptive tuning in unregularized robust M-estimation under high-dimensional proportional asymptotics. By introducing an observable out-of-sample risk proxy and proving its consistency with the true risk , the authors enable data-driven loss selection and scale tuning without knowing the noise or design parameters. The core technique combines Ridge smoothing with a careful differentiable analysis to bridge unregularized estimators and their smoothed counterparts, yielding rigorous guarantees and an optimal grid-based tuning procedure. Numerical experiments validate the risk estimator and the adaptive tuning method for losses including the Huber loss, and illustrate robustness to covariate distributions and noise scales, highlighting practical implications for robust high-dimensional regression.

Abstract

We consider unregularized robust M-estimators for linear models under Gaussian design and heavy-tailed noise, in the proportional asymptotics regime where the sample size n and the number of features p are both increasing such that . An estimator of the out-of-sample error of a robust M-estimator is analyzed and proved to be consistent for a large family of loss functions that includes the Huber loss. As an application of this result, we propose an adaptive tuning procedure of the scale parameter of a given loss function : choosing in a given interval that minimizes the out-of-sample error estimate of the M-estimator constructed with loss leads to the optimal out-of-sample error over . The proof relies on a smoothing argument: the unregularized M-estimation objective function is perturbed, or smoothed, with a Ridge penalty that vanishes as , and shows that the unregularized M-estimator of interest inherits properties of its smoothed version.
Paper Structure (30 sections, 24 theorems, 158 equations, 9 figures)

This paper contains 30 sections, 24 theorems, 158 equations, 9 figures.

Key Result

Theorem 1

Assume that $(\rho, F_\epsilon)$ satisfy as:loss and as:noise. Let $\psi=\rho':\mathbb{R}\to\mathbb{R}$ be the derivative of the loss, and $\bm V$ be the Jacobian matrix $(\partial/\partial \bm{y})\psi(\bm{y}-\bm X\hat{\bm \beta})\in\mathbb{R}^{n\times n}$. Then, as $n, p\to\infty$ with $p/n\to \gam

Figures (9)

  • Figure 1: \ref{['subfig:risk_estimate_intro']} is the plot of the out-of-sample error $R$ and its estimator $\hat{R}$ with the Huber loss for different scaling parameter $\lambda>0$. \ref{['subfig:adaptive_tuning_intro']} is the plot of the oracle out-of-sample error $R(\lambda_{\text{opt}})$ and the out-of-sample error $R(\hat{\lambda})$ with $\hat{\lambda}$ being the minimizer of the estimator $\hat{R}$ among a finite grid $I$, as the scale of noise changes. See \ref{['sec:numeric']} for the details.
  • Figure 2: Example of loss satisfying \ref{['as:loss']}
  • Figure 3: Plot of the out-of-sample error $R(\lambda)$ and estimator $\hat{R}(\lambda)$ over 100 repetitions, with $n=4000$, $p=1200$, for the Huber loss for different values of the scale parameters $\lambda$. The noise distribution is $\text{t-dist}(\text{df}=2)$. $\alpha^2(\lambda)$ is the solution to the nonlinear system \ref{['eq:nonlinear']}.
  • Figure 4: Adaptive tuning with the scale of noise $\sigma$ changing as $F_\epsilon = \sigma \cdot \text{t-dist}(\text{df}=2)$. Here, $I=[1,10]$, $n=4000$, $p=1200$, and $I_N$ is the uniform grid in log-scale of length $101$. ${R}(\hat{\lambda}_N)$ is the out-of-sample error with $\lambda$ selected by $\hat{R}$, $\min_{\lambda\in I_N} R(\lambda)$ is the optimal out-of-sample error among $I_N$, and $\min_{\lambda\in I}\alpha^2(\lambda)$ is the theoretically optimal risk limit. We repeat $100$ times.
  • Figure 5: Plot of the out-of-sample error $R(\lambda)$ and estimator $\hat{R}(\lambda)$ over 100 repetitions, with $n=4000$, $p=1200$, the Huber loss for different values of the scale parameters $\lambda$. The noise distribution is $3 \lceil \text{t-dist} (\text{df}=2) \rceil$. $\alpha^2(\lambda)$ is the theoretical limit given by the nonlinear system \ref{['eq:intro_nonlinear']}.
  • ...and 4 more figures

Theorems & Definitions (27)

  • Remark 1
  • Theorem 1
  • Proposition 1
  • Corollary 1
  • Proposition 2
  • Theorem 2
  • Remark 2
  • Theorem 3
  • Proposition 3: Proposition 4.1 in bellec2023out
  • Theorem 4: Theorem 1 in bellec2022derivatives
  • ...and 17 more