Table of Contents
Fetching ...

Do we need to estimate the variance in robust mean estimation?

Qiang Sun

Abstract

In this paper, we propose self-tuned robust estimators for estimating the mean of heavy-tailed distributions, which refer to distributions with only finite variances. Our approach introduces a new loss function that considers both the mean parameter and a robustification parameter. By jointly optimizing the empirical loss function with respect to both parameters, the robustification parameter estimator can automatically adapt to the unknown data variance, and thus the self-tuned mean estimator can achieve optimal finite-sample performance. Our method outperforms previous approaches in terms of both computational and asymptotic efficiency. Specifically, it does not require cross-validation or Lepski's method to tune the robustification parameter, and the variance of our estimator achieves the Cramér-Rao lower bound. Project source code is available at \url{https://github.com/statsle/automean}.

Do we need to estimate the variance in robust mean estimation?

Abstract

In this paper, we propose self-tuned robust estimators for estimating the mean of heavy-tailed distributions, which refer to distributions with only finite variances. Our approach introduces a new loss function that considers both the mean parameter and a robustification parameter. By jointly optimizing the empirical loss function with respect to both parameters, the robustification parameter estimator can automatically adapt to the unknown data variance, and thus the self-tuned mean estimator can achieve optimal finite-sample performance. Our method outperforms previous approaches in terms of both computational and asymptotic efficiency. Specifically, it does not require cross-validation or Lepski's method to tune the robustification parameter, and the variance of our estimator achieves the Cramér-Rao lower bound. Project source code is available at \url{https://github.com/statsle/automean}.

Paper Structure

This paper contains 48 sections, 29 theorems, 250 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Theorem 2.1

Take $\tau= \sigma\sqrt n/z$ with $z=\sqrt{\log (1/\delta)}$, and assume $n$ is sufficiently large. Then, for any $0<\delta< 1$, with probability at least $1-\delta$, we have

Figures (5)

  • Figure 1: Comparing our self-tuned estimator with the MoM estimator in terms of adaptivity.
  • Figure 2: The $\alpha$-quantile of the estimation error (estimation error, $y$-axis) versus $\alpha$ (quantile level, $x$-axis) for our estimator, the sample mean estimator, the MoM estimator, and the trimmed mean estimator.
  • Figure 3: Empirical 99%-quantile of the estimation error (estimation error, $y$-axis) versus a distributution parameter (parameter, $x$-axis) for our estimator, the sample mean estimator, the MoM estimator and the trimmed mean estimator. The distribution parameter is $\sigma$ for normal distribution and $q$ for skewed generalized $t$ distribution.
  • Figure 4: The $\alpha$-quantile of the estimation error (estimation error, $y$-axis) versus $\alpha$ (quantile level, $x$-axis) for our estimator, cross validation and Lepski's method.
  • Figure 5: The empirical 99%-quantile of the estimation error (estimation error, $y$-axis) versus a distributution parameter (parameter, $x$-axis) for our estimator, cross validation and Lepski's method.

Theorems & Definitions (30)

  • Theorem 2.1: Informal result
  • Definition 2.2: Penalized pseudo-Huber loss
  • Theorem 2.3: Self-tuning property of $v_*$
  • Proposition 2.4: Joint convexity
  • Theorem 3.1
  • Lemma 3.2
  • Corollary 3.3
  • Theorem 3.4: Self-tuning property
  • Theorem 3.5: Self-tuned mean estimators
  • Theorem 4.1: Theorem 2 by lugosi2019mean
  • ...and 20 more