Table of Contents
Fetching ...

On the Optimality of the Median-of-Means Estimator under Adversarial Contamination

Xabier de Juan, Santiago Mazuelas

TL;DR

This work characterizes the optimality of the Median-of-Means estimator under adversarial contamination across multiple distribution classes. It proves minimax optimality for finite-variance and infinite-variance-with-finite-absolute-(1+r)-th-moment classes, and shows that MoM achieves favorable bounds in sub-exponential and sub-Gaussian regimes under appropriate block-structure choices. The results also identify a lower bound that prevents further improvement beyond a $\sqrt{\alpha}$ bias in certain general distributions, and show MoM excels for symmetric distributions, while it is sub-optimal for light-tailed tails. Overall, the paper provides a complete picture of when MoM is most effective under contamination and clarifies its limitations compared to other robust estimators.

Abstract

The Median-of-Means (MoM) is a robust estimator widely used in machine learning that is known to be (minimax) optimal in scenarios where samples are i.i.d. In more grave scenarios, samples are contaminated by an adversary that can inspect and modify the data. Previous work has theoretically shown the suitability of the MoM estimator in certain contaminated settings. However, the (minimax) optimality of MoM and its limitations under adversarial contamination remain unknown beyond the Gaussian case. In this paper, we present upper and lower bounds for the error of MoM under adversarial contamination for multiple classes of distributions. In particular, we show that MoM is (minimax) optimal in the class of distributions with finite variance, as well as in the class of distributions with infinite variance and finite absolute $(1+r)$-th moment. We also provide lower bounds for MoM's error that match the order of the presented upper bounds, and show that MoM is sub-optimal for light-tailed distributions.

On the Optimality of the Median-of-Means Estimator under Adversarial Contamination

TL;DR

This work characterizes the optimality of the Median-of-Means estimator under adversarial contamination across multiple distribution classes. It proves minimax optimality for finite-variance and infinite-variance-with-finite-absolute-(1+r)-th-moment classes, and shows that MoM achieves favorable bounds in sub-exponential and sub-Gaussian regimes under appropriate block-structure choices. The results also identify a lower bound that prevents further improvement beyond a bias in certain general distributions, and show MoM excels for symmetric distributions, while it is sub-optimal for light-tailed tails. Overall, the paper provides a complete picture of when MoM is most effective under contamination and clarifies its limitations compared to other robust estimators.

Abstract

The Median-of-Means (MoM) is a robust estimator widely used in machine learning that is known to be (minimax) optimal in scenarios where samples are i.i.d. In more grave scenarios, samples are contaminated by an adversary that can inspect and modify the data. Previous work has theoretically shown the suitability of the MoM estimator in certain contaminated settings. However, the (minimax) optimality of MoM and its limitations under adversarial contamination remain unknown beyond the Gaussian case. In this paper, we present upper and lower bounds for the error of MoM under adversarial contamination for multiple classes of distributions. In particular, we show that MoM is (minimax) optimal in the class of distributions with finite variance, as well as in the class of distributions with infinite variance and finite absolute -th moment. We also provide lower bounds for MoM's error that match the order of the presented upper bounds, and show that MoM is sub-optimal for light-tailed distributions.

Paper Structure

This paper contains 24 sections, 12 theorems, 96 equations, 2 figures, 2 tables.

Key Result

Theorem 3.1

Let $\widehat{\mu}_{\mathop{\mathrm{MoM}}\nolimits}$ be the MoM estimator with $k$ blocks of size $m=\lfloor n/k\rfloor$ evaluated at $n$$\alpha$-contaminated samples. If the number of blocks satisfies $2\alpha n < k \leq n$, then for all $\delta > 2\exp(-2k(1/2-\alpha m)^2)$ holds with probability at least $1-\delta$.

Figures (2)

  • Figure 1: Empirical errors align with the theoretical bounds presented over multiple classes of distributions.
  • Figure 2: For any $k=\lceil4\alpha^in\rceil$, the error is at least $\mathcal{O}(\alpha^{2/3})$ for a sub-Gaussian distribution $\mathrm{p}\in\mathcal{P}_{\mathop{\mathrm{SG}}\nolimits}$.

Theorems & Definitions (36)

  • Definition 2.1: Adversarial contamination
  • Definition 3.1
  • Theorem 3.1
  • proof : Sketch of proof.
  • Theorem 3.2
  • proof : Sketch of proof.
  • Theorem 3.3
  • proof : Sketch of proof:
  • Theorem 3.4
  • proof : Sketch of proof
  • ...and 26 more