Table of Contents
Fetching ...

Trimmed sample means for robust uniform mean estimation and regression

Roberto I. Oliveira, Lucas Resende

TL;DR

This work advances robust statistical estimation by leveraging trimmed means across three core problems: uniform mean estimation, vector mean estimation under general norms, and regression with quadratic loss. By developing trimmed-mean-based estimators and proving nonasymptotic, minimax-optimal guarantees under adversarial contamination and heavy tails, the authors extend beyond traditional MoM methods and classical robust statistics. They provide both theory—master theorems and localized radii—and practical algorithms (Plug-in and cross-validated trimming) for robust regression, accompanied by extensive experiments showing competitive performance against OLS, MoM, and Huber-type methods. Overall, the results establish trimmed means as a principled, versatile tool for robust, finite-sample estimation in high-dimensional and contaminated settings with broad applicability to learning and inference tasks.

Abstract

It is well-known that trimmed sample means are robust against heavy tails and data contamination. This paper analyzes the performance of trimmed means and related methods in two novel contexts. The first one consists of estimating expectations of functions in a given family, with uniform error bounds; this is closely related to the problem of estimating the mean of a random vector under a general norm. The second problem considered is that of regression with quadratic loss. In both cases, trimmed-mean-based estimators are the first to obtain optimal dependence on the (adversarial) contamination level. Moreover, they also match or improve upon the state of the art in terms of heavy tails. Experiments with synthetic data show that a natural ``trimmed mean linear regression'' method often performs better than both ordinary least squares and alternative methods based on median-of-means.

Trimmed sample means for robust uniform mean estimation and regression

TL;DR

This work advances robust statistical estimation by leveraging trimmed means across three core problems: uniform mean estimation, vector mean estimation under general norms, and regression with quadratic loss. By developing trimmed-mean-based estimators and proving nonasymptotic, minimax-optimal guarantees under adversarial contamination and heavy tails, the authors extend beyond traditional MoM methods and classical robust statistics. They provide both theory—master theorems and localized radii—and practical algorithms (Plug-in and cross-validated trimming) for robust regression, accompanied by extensive experiments showing competitive performance against OLS, MoM, and Huber-type methods. Overall, the results establish trimmed means as a principled, versatile tool for robust, finite-sample estimation in high-dimensional and contaminated settings with broad applicability to learning and inference tasks.

Abstract

It is well-known that trimmed sample means are robust against heavy tails and data contamination. This paper analyzes the performance of trimmed means and related methods in two novel contexts. The first one consists of estimating expectations of functions in a given family, with uniform error bounds; this is closely related to the problem of estimating the mean of a random vector under a general norm. The second problem considered is that of regression with quadratic loss. In both cases, trimmed-mean-based estimators are the first to obtain optimal dependence on the (adversarial) contamination level. Moreover, they also match or improve upon the state of the art in terms of heavy tails. Experiments with synthetic data show that a natural ``trimmed mean linear regression'' method often performs better than both ordinary least squares and alternative methods based on median-of-means.
Paper Structure (48 sections, 12 theorems, 200 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 48 sections, 12 theorems, 200 equations, 5 figures, 1 table, 2 algorithms.

Key Result

Theorem 2.1

Assume that a measure $P$ over $({\bf X}, \mathcal{X})$ is 1-compatible with a family of measurable functions $\mathcal{G}$ from ${\bf X}$ to $\mathbb{R}$.

Figures (5)

  • Figure 1: Experiments varying the contamination proportion using Setup B, left panel with $p=0.05$ and right panel with $p=0.09$.
  • Figure 2: Comparison of the two cross validation strategies -- minimum loss (min. loss.) and maximum slope (max. slope.) -- across all experimental results. Bands represent the $5\%$ and the $95\%$ percentiles, while the solid line displays the median.
  • Figure 3: Heatmap of the $L^2$ error $\| \widehat{\beta}_n - \beta^\star \|$. Each line is a different combination of $(\alpha, \lambda)$ and homoscedasticity/heteroscedasticity. The columns vary the method and the contamination level $\varepsilon$.
  • Figure 4: Percentage of seeds over which each method is the top performer. Each line is a different combination of $(\alpha, \lambda)$ and homoscedasticity/heteroscedasticity. The columns vary the method and the contamination level $\varepsilon$. The bluer the entry, the higher the percentage of times that the specific method was the top performer for that choice of $(\alpha,\lambda,\varepsilon)$.
  • Figure 5: Parameter selected via cross-validation for different choices of $(\alpha, \lambda)$. Recall TM and MoM use the max. slope strategy and Huber regression uses min. loss strategy. Homocedastic and heteroscedastic cases are displayed together as it does not seem to impact the parameter selection.

Theorems & Definitions (35)

  • Theorem 2.1
  • Theorem 3.1: Proof in § \ref{['sub:proof:uniformTM']}
  • Remark 1
  • Remark 2: Is the complexity term optimal?
  • Theorem 4.1
  • Remark 3
  • Remark 4
  • Example 1: Linear regression with independent errors
  • Remark 5: Critical radii in linear regression with independent errors
  • Remark 6: Moment conditions and linear regression
  • ...and 25 more