Finite-sample properties of the trimmed mean

Roberto I. Oliveira; Paulo Orenstein; Zoraida F. Rico

Finite-sample properties of the trimmed mean

Roberto I. Oliveira, Paulo Orenstein, Zoraida F. Rico

TL;DR

The paper analyzes the finite-sample behavior of the $k$-trimmed mean as an estimator of the mean for i.i.d. data with finite variance and beyond. It develops a conditional-mean framework showing the trimmed mean concentrates with sub-Gaussian tails and, under stronger moments, enjoys precise Gaussian-approximation and confidence intervals even in deep tails. It also proves minimax-optimality under adversarial contamination, quantifying the trade-off between random fluctuations and contamination. The methods hinge on viewing the trimmed mean as an average of conditionally i.i.d. data given trimming endpoints, enabling Bernstein-type and self-normalized CLT techniques. Together, the results provide practical finite-sample guarantees, sharp constants, and robust performance analyses for trimmed-mean estimators in both light- and heavy-tailed and contaminated settings.

Abstract

The trimmed mean of $n$ scalar random variables from a distribution $P$ is the variant of the standard sample mean where the $k$ smallest and $k$ largest values in the sample are discarded for some parameter $k$. In this paper, we look at the finite-sample properties of the trimmed mean as an estimator for the mean of $P$. Assuming finite variance, we prove that the trimmed mean is ``sub-Gaussian'' in the sense of achieving Gaussian-type concentration around the mean. Under slightly stronger assumptions, we show the left and right tails of the trimmed mean satisfy a strong ratio-type approximation by the corresponding Gaussian tail, even for very small probabilities of the order $e^{-n^c}$ for some $c>0$. In the more challenging setting of weaker moment assumptions and adversarial sample contamination, we prove that the trimmed mean is minimax-optimal up to constants.

Finite-sample properties of the trimmed mean

TL;DR

The paper analyzes the finite-sample behavior of the

-trimmed mean as an estimator of the mean for i.i.d. data with finite variance and beyond. It develops a conditional-mean framework showing the trimmed mean concentrates with sub-Gaussian tails and, under stronger moments, enjoys precise Gaussian-approximation and confidence intervals even in deep tails. It also proves minimax-optimality under adversarial contamination, quantifying the trade-off between random fluctuations and contamination. The methods hinge on viewing the trimmed mean as an average of conditionally i.i.d. data given trimming endpoints, enabling Bernstein-type and self-normalized CLT techniques. Together, the results provide practical finite-sample guarantees, sharp constants, and robust performance analyses for trimmed-mean estimators in both light- and heavy-tailed and contaminated settings.

Abstract

The trimmed mean of

scalar random variables from a distribution

is the variant of the standard sample mean where the

smallest and

largest values in the sample are discarded for some parameter

. In this paper, we look at the finite-sample properties of the trimmed mean as an estimator for the mean of

. Assuming finite variance, we prove that the trimmed mean is ``sub-Gaussian'' in the sense of achieving Gaussian-type concentration around the mean. Under slightly stronger assumptions, we show the left and right tails of the trimmed mean satisfy a strong ratio-type approximation by the corresponding Gaussian tail, even for very small probabilities of the order

for some

. In the more challenging setting of weaker moment assumptions and adversarial sample contamination, we prove that the trimmed mean is minimax-optimal up to constants.

Paper Structure (41 sections, 26 theorems, 195 equations, 1 figure)

This paper contains 41 sections, 26 theorems, 195 equations, 1 figure.

Introduction
Sub-Gaussian properties.
Precise Gaussian approximation and confidence intervals
Heavier tails and contamination
Technical and conceptual contributions
Additional background
Background on the trimmed mean.
Finite-sample bounds, sub-Gaussian estimators and related topics
Adversarial contamination
Organization
Preliminaries
General notation
Probability notation and facts.
Concentration and Gaussian approximation for i.i.d. sums
Trimmed means: first steps
...and 26 more sections

Key Result

Theorem 1.1.1

Consider i.i.d. random variables $X_1,\dots,X_n$ with a well-defined mean $\mu$ and variance $\sigma^2<+\infty$. Take $0<x\leq \sqrt{{n}/{(\sqrt{2}+1)^2}-2}$ and consider the trimmed mean $\overline{X}_{n,k}$ with trimming parameter $k(x):=\lceil x^2/2\rceil$. Then:

Figures (1)

Figure 1: Violin plot for the three estimators under $t$ distributions with different parameters.

Theorems & Definitions (54)

Theorem 1.1.1: Proof in § \ref{['subsub:proof:allsubgaussian']}
Theorem 1.1.2: Proof in § \ref{['subsub:proof:sharpersubgaussian']}
Theorem 1.1.3: Proof in § \ref{['subsub:proof:multiplesubgaussian']}
Theorem 1.2.1: Proof in § \ref{['sub:proof:preciseconfidence']}
Corollary 1.2.2: Proof omitted
Remark 1.2.3
Theorem 1.3.1: Proof in Section \ref{['sec:proof:minimaxcontaminated']}
Remark 1.3.2
Remark 1.3.3
Proposition 2.2.1
...and 44 more

Finite-sample properties of the trimmed mean

TL;DR

Abstract

Finite-sample properties of the trimmed mean

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (54)