Finite-sample properties of the trimmed mean
Roberto I. Oliveira, Paulo Orenstein, Zoraida F. Rico
TL;DR
The paper analyzes the finite-sample behavior of the $k$-trimmed mean as an estimator of the mean for i.i.d. data with finite variance and beyond. It develops a conditional-mean framework showing the trimmed mean concentrates with sub-Gaussian tails and, under stronger moments, enjoys precise Gaussian-approximation and confidence intervals even in deep tails. It also proves minimax-optimality under adversarial contamination, quantifying the trade-off between random fluctuations and contamination. The methods hinge on viewing the trimmed mean as an average of conditionally i.i.d. data given trimming endpoints, enabling Bernstein-type and self-normalized CLT techniques. Together, the results provide practical finite-sample guarantees, sharp constants, and robust performance analyses for trimmed-mean estimators in both light- and heavy-tailed and contaminated settings.
Abstract
The trimmed mean of $n$ scalar random variables from a distribution $P$ is the variant of the standard sample mean where the $k$ smallest and $k$ largest values in the sample are discarded for some parameter $k$. In this paper, we look at the finite-sample properties of the trimmed mean as an estimator for the mean of $P$. Assuming finite variance, we prove that the trimmed mean is ``sub-Gaussian'' in the sense of achieving Gaussian-type concentration around the mean. Under slightly stronger assumptions, we show the left and right tails of the trimmed mean satisfy a strong ratio-type approximation by the corresponding Gaussian tail, even for very small probabilities of the order $e^{-n^c}$ for some $c>0$. In the more challenging setting of weaker moment assumptions and adversarial sample contamination, we prove that the trimmed mean is minimax-optimal up to constants.
