Table of Contents
Fetching ...

Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination

Ilias Diakonikolas, Giannis Iakovidis, Daniel M. Kane, Thanasis Pittas

TL;DR

This work addresses robust high-dimensional mean estimation under mean-shift contamination, where α<1/2 of the samples are adversarially shifted. It introduces a novel, computationally efficient algorithm that combines dimension reduction via a carefully constructed reweighted second-moment matrix with a final low-dimensional brute-force refinement, achieving ε-accuracy with high probability and near-optimal sample complexity n = $\tilde{O}(d/\epsilon^{2+o(1)} + 2^{O(1/\epsilon^2)})$ in time poly$(n,d)$. A key contribution is a rigorous analysis showing that iterative dimension reduction concentrates the signal in a low-dimensional subspace while controlling error, enabling a polynomial-time solution where prior multivariate mean-shift estimators were exponential in $d$. The results demonstrate that mean-shift contamination admits computationally efficient robust inference in high dimensions, including adaptivity to unknown α, and advance the understanding of structured noise models that lie between fully adversarial and random regimes.

Abstract

We study the algorithmic problem of robust mean estimation of an identity covariance Gaussian in the presence of mean-shift contamination. In this contamination model, we are given a set of points in $\mathbb{R}^d$ generated i.i.d. via the following process. For a parameter $α<1/2$, the $i$-th sample $x_i$ is obtained as follows: with probability $1-α$, $x_i$ is drawn from $\mathcal{N}(μ, I)$, where $μ\in \mathbb{R}^d$ is the target mean; and with probability $α$, $x_i$ is drawn from $\mathcal{N}(z_i, I)$, where $z_i$ is unknown and potentially arbitrary. Prior work characterized the information-theoretic limits of this task. Specifically, it was shown that, in contrast to Huber contamination, in the presence of mean-shift contamination consistent estimation is possible. On the other hand, all known robust estimators in the mean-shift model have running times exponential in the dimension. Here we give the first computationally efficient algorithm for high-dimensional robust mean estimation with mean-shift contamination that can tolerate a constant fraction of outliers. In particular, our algorithm has near-optimal sample complexity, runs in sample-polynomial time, and approximates the target mean to any desired accuracy. Conceptually, our result contributes to a growing body of work that studies inference with respect to natural noise models lying in between fully adversarial and random settings.

Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination

TL;DR

This work addresses robust high-dimensional mean estimation under mean-shift contamination, where α<1/2 of the samples are adversarially shifted. It introduces a novel, computationally efficient algorithm that combines dimension reduction via a carefully constructed reweighted second-moment matrix with a final low-dimensional brute-force refinement, achieving ε-accuracy with high probability and near-optimal sample complexity n = in time poly. A key contribution is a rigorous analysis showing that iterative dimension reduction concentrates the signal in a low-dimensional subspace while controlling error, enabling a polynomial-time solution where prior multivariate mean-shift estimators were exponential in . The results demonstrate that mean-shift contamination admits computationally efficient robust inference in high dimensions, including adaptivity to unknown α, and advance the understanding of structured noise models that lie between fully adversarial and random regimes.

Abstract

We study the algorithmic problem of robust mean estimation of an identity covariance Gaussian in the presence of mean-shift contamination. In this contamination model, we are given a set of points in generated i.i.d. via the following process. For a parameter , the -th sample is obtained as follows: with probability , is drawn from , where is the target mean; and with probability , is drawn from , where is unknown and potentially arbitrary. Prior work characterized the information-theoretic limits of this task. Specifically, it was shown that, in contrast to Huber contamination, in the presence of mean-shift contamination consistent estimation is possible. On the other hand, all known robust estimators in the mean-shift model have running times exponential in the dimension. Here we give the first computationally efficient algorithm for high-dimensional robust mean estimation with mean-shift contamination that can tolerate a constant fraction of outliers. In particular, our algorithm has near-optimal sample complexity, runs in sample-polynomial time, and approximates the target mean to any desired accuracy. Conceptually, our result contributes to a growing body of work that studies inference with respect to natural noise models lying in between fully adversarial and random settings.

Paper Structure

This paper contains 49 sections, 13 theorems, 58 equations, 1 algorithm.

Key Result

Theorem 1.2

(Main Algorithmic Result) Let $d \in \mathbb Z_+$ denote the dimension, $\mu \in \mathbb R^d$ be an unknown mean vector, $\epsilon \in (0,1)$ be an accuracy parameter, and $\alpha\le 0.49$ be a contamination parameter. There exists an algorithm that takes as input $\epsilon$, draws $n = \tilde{O}(d/

Theorems & Definitions (36)

  • Definition 1.1: Mean-Shift Contamination Model
  • Theorem 1.2
  • Proposition 2.0: Inefficient Estimator
  • Lemma 2.1
  • Definition 2.2: ($\eta,\beta$)-concentrated set
  • Definition 2.3: ($\eta,\beta$)-positive definite
  • Definition 2.4: $(\eta,\beta)$-good set
  • Lemma 2.4
  • Lemma 2.4
  • proof : Proof Sketch of \ref{['lem:sample_complexity']}
  • ...and 26 more