Robust variance-regularized risk minimization with concomitant scaling
Matthew J. Holland
TL;DR
This work tackles learning under heavy-tailed losses by optimizing a mean–standard-deviation objective, rather than the traditional mean loss. It introduces a gradient-friendly procedure called Modified Sun–Huber, built by extending robust one-dimensional mean estimation to the mean–SD setting via a joint scale-location criterion over $(h,a,b)$. The authors establish theoretical connections between the population objective and the mean–SD objective, derive finite-sample concentration results, and propose a practical algorithm with a simple scheduling of $(\alpha,\beta)$ (e.g., $\beta=\beta_0/\sqrt{n}$ and $\alpha(\beta)=\beta$). Empirical results on simulated and real datasets show that Modified Sun–Huber often matches or outperforms CVaR and DRO baselines in mean–SD performance, while remaining simple to integrate into standard gradient-based pipelines. Overall, the approach provides a scalable, robust alternative for risk-sensitive learning in the presence of heavy-tailed losses.
Abstract
Under losses which are potentially heavy-tailed, we consider the task of minimizing sums of the loss mean and standard deviation, without trying to accurately estimate the variance. By modifying a technique for variance-free robust mean estimation to fit our problem setting, we derive a simple learning procedure which can be easily combined with standard gradient-based solvers to be used in traditional machine learning workflows. Empirically, we verify that our proposed approach, despite its simplicity, performs as well or better than even the best-performing candidates derived from alternative criteria such as CVaR or DRO risks on a variety of datasets.
