Table of Contents
Fetching ...

Winsorized mean estimation with heavy tails and adversarial contamination

Anders Bredahl Kock, David Preinerstorfer

TL;DR

This work tackles robust univariate mean estimation under heavy tails and adversarial contamination by introducing a Winsorized-mean estimator that avoids data-splitting and uses minimal Winsorization. It delivers finite-sample, high-probability bounds that explicitly quantify the dependence on the contamination level $\eta$ and the moment order $m$, with a parametric-like rate when $m\ge 2$. The paper further provides adaptive procedures based on Lepski's method to handle unknown $\eta_{\min}$, achieving near-optimal dependence on contamination at a practical computational cost. Additionally, it extends the results to relaxed moment assumptions $m\in(1,\infty)$ via alternative estimators and concentration tools, broadening applicability to a wider class of heavy-tailed distributions. Overall, the results offer principled, implementable robust mean estimation techniques with concrete finite-sample guarantees in contaminated, heavy-tailed environments.

Abstract

Finite-sample upper bounds on the estimation error of a winsorized mean estimator of the population mean in the presence of heavy tails and adversarial contamination are established. In comparison to existing results, the winsorized mean estimator we study avoids a sample splitting device and winsorizes substantially fewer observations, which improves its applicability and practical performance.

Winsorized mean estimation with heavy tails and adversarial contamination

TL;DR

This work tackles robust univariate mean estimation under heavy tails and adversarial contamination by introducing a Winsorized-mean estimator that avoids data-splitting and uses minimal Winsorization. It delivers finite-sample, high-probability bounds that explicitly quantify the dependence on the contamination level and the moment order , with a parametric-like rate when . The paper further provides adaptive procedures based on Lepski's method to handle unknown , achieving near-optimal dependence on contamination at a practical computational cost. Additionally, it extends the results to relaxed moment assumptions via alternative estimators and concentration tools, broadening applicability to a wider class of heavy-tailed distributions. Overall, the results offer principled, implementable robust mean estimation techniques with concrete finite-sample guarantees in contaminated, heavy-tailed environments.

Abstract

Finite-sample upper bounds on the estimation error of a winsorized mean estimator of the population mean in the presence of heavy tails and adversarial contamination are established. In comparison to existing results, the winsorized mean estimator we study avoids a sample splitting device and winsorizes substantially fewer observations, which improves its applicability and practical performance.

Paper Structure

This paper contains 10 sections, 13 theorems, 130 equations.

Key Result

Theorem 3.1

Fix $c\in(1,\sqrt{1.5})$, $n\in\mathbb{N}$, $\delta\in(0,1)$, and let Assumption ass:setting be satisfied with $m\in[2,\infty)$. If $\varepsilon_c(\eta)\in(0,1/2)$ with $\varepsilon_c(\eta)$ as defined in eq:epsfam, it holds with probability at least $1-\delta$ that In particular, for $m=2$ it holds with probability at least $1-\delta$ that

Theorems & Definitions (29)

  • Remark 3.1
  • Theorem 3.1
  • Theorem 4.1
  • Remark 4.1
  • Remark 4.2
  • Theorem 5.1
  • Theorem B.1: Bernstein's inequality
  • Lemma B.2
  • proof
  • Lemma B.3
  • ...and 19 more