Winsorized mean estimation with heavy tails and adversarial contamination
Anders Bredahl Kock, David Preinerstorfer
TL;DR
This work tackles robust univariate mean estimation under heavy tails and adversarial contamination by introducing a Winsorized-mean estimator that avoids data-splitting and uses minimal Winsorization. It delivers finite-sample, high-probability bounds that explicitly quantify the dependence on the contamination level $\eta$ and the moment order $m$, with a parametric-like rate when $m\ge 2$. The paper further provides adaptive procedures based on Lepski's method to handle unknown $\eta_{\min}$, achieving near-optimal dependence on contamination at a practical computational cost. Additionally, it extends the results to relaxed moment assumptions $m\in(1,\infty)$ via alternative estimators and concentration tools, broadening applicability to a wider class of heavy-tailed distributions. Overall, the results offer principled, implementable robust mean estimation techniques with concrete finite-sample guarantees in contaminated, heavy-tailed environments.
Abstract
Finite-sample upper bounds on the estimation error of a winsorized mean estimator of the population mean in the presence of heavy tails and adversarial contamination are established. In comparison to existing results, the winsorized mean estimator we study avoids a sample splitting device and winsorizes substantially fewer observations, which improves its applicability and practical performance.
