Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination
Ilias Diakonikolas, Giannis Iakovidis, Daniel M. Kane, Thanasis Pittas
TL;DR
This work addresses robust high-dimensional mean estimation under mean-shift contamination, where α<1/2 of the samples are adversarially shifted. It introduces a novel, computationally efficient algorithm that combines dimension reduction via a carefully constructed reweighted second-moment matrix with a final low-dimensional brute-force refinement, achieving ε-accuracy with high probability and near-optimal sample complexity n = $\tilde{O}(d/\epsilon^{2+o(1)} + 2^{O(1/\epsilon^2)})$ in time poly$(n,d)$. A key contribution is a rigorous analysis showing that iterative dimension reduction concentrates the signal in a low-dimensional subspace while controlling error, enabling a polynomial-time solution where prior multivariate mean-shift estimators were exponential in $d$. The results demonstrate that mean-shift contamination admits computationally efficient robust inference in high dimensions, including adaptivity to unknown α, and advance the understanding of structured noise models that lie between fully adversarial and random regimes.
Abstract
We study the algorithmic problem of robust mean estimation of an identity covariance Gaussian in the presence of mean-shift contamination. In this contamination model, we are given a set of points in $\mathbb{R}^d$ generated i.i.d. via the following process. For a parameter $α<1/2$, the $i$-th sample $x_i$ is obtained as follows: with probability $1-α$, $x_i$ is drawn from $\mathcal{N}(μ, I)$, where $μ\in \mathbb{R}^d$ is the target mean; and with probability $α$, $x_i$ is drawn from $\mathcal{N}(z_i, I)$, where $z_i$ is unknown and potentially arbitrary. Prior work characterized the information-theoretic limits of this task. Specifically, it was shown that, in contrast to Huber contamination, in the presence of mean-shift contamination consistent estimation is possible. On the other hand, all known robust estimators in the mean-shift model have running times exponential in the dimension. Here we give the first computationally efficient algorithm for high-dimensional robust mean estimation with mean-shift contamination that can tolerate a constant fraction of outliers. In particular, our algorithm has near-optimal sample complexity, runs in sample-polynomial time, and approximates the target mean to any desired accuracy. Conceptually, our result contributes to a growing body of work that studies inference with respect to natural noise models lying in between fully adversarial and random settings.
