Trimmed sample means for robust uniform mean estimation and regression
Roberto I. Oliveira, Lucas Resende
TL;DR
This work advances robust statistical estimation by leveraging trimmed means across three core problems: uniform mean estimation, vector mean estimation under general norms, and regression with quadratic loss. By developing trimmed-mean-based estimators and proving nonasymptotic, minimax-optimal guarantees under adversarial contamination and heavy tails, the authors extend beyond traditional MoM methods and classical robust statistics. They provide both theory—master theorems and localized radii—and practical algorithms (Plug-in and cross-validated trimming) for robust regression, accompanied by extensive experiments showing competitive performance against OLS, MoM, and Huber-type methods. Overall, the results establish trimmed means as a principled, versatile tool for robust, finite-sample estimation in high-dimensional and contaminated settings with broad applicability to learning and inference tasks.
Abstract
It is well-known that trimmed sample means are robust against heavy tails and data contamination. This paper analyzes the performance of trimmed means and related methods in two novel contexts. The first one consists of estimating expectations of functions in a given family, with uniform error bounds; this is closely related to the problem of estimating the mean of a random vector under a general norm. The second problem considered is that of regression with quadratic loss. In both cases, trimmed-mean-based estimators are the first to obtain optimal dependence on the (adversarial) contamination level. Moreover, they also match or improve upon the state of the art in terms of heavy tails. Experiments with synthetic data show that a natural ``trimmed mean linear regression'' method often performs better than both ordinary least squares and alternative methods based on median-of-means.
