Robust High-Dimensional Mean Estimation With Low Data Size, an Empirical Study

Cullen Anderson; Jeff M. Phillips

Robust High-Dimensional Mean Estimation With Low Data Size, an Empirical Study

Cullen Anderson, Jeff M. Phillips

TL;DR

This work addresses robust mean estimation in high dimensions under low data size, a regime where classical theory often demands $n\asymp d$ or larger. It conducts an extensive empirical comparison across many estimators and introduces practical adaptations (notably QUE_low_n with an eigenvalue-threshold refinement) to handle $d\ge n$ scenarios. The study shows that, for Gaussian-like inliers, QUE_low_n nearly matches the best possible inlier mean and often surpasses other robust methods, while real-world embeddings demonstrate reliable performance with early halting; subtractive corruption remains particularly challenging. Overall, the paper highlights the practical value of robust mean estimation under limited data, provides actionable algorithmic adjustments, and motivates further theoretical and empirical exploration beyond Gaussian assumptions.

Abstract

Robust statistics aims to compute quantities to represent data where a fraction of it may be arbitrarily corrupted. The most essential statistic is the mean, and in recent years, there has been a flurry of theoretical advancement for efficiently estimating the mean in high dimensions on corrupted data. While several algorithms have been proposed that achieve near-optimal error, they all rely on large data size requirements as a function of dimension. In this paper, we perform an extensive experimentation over various mean estimation techniques where data size might not meet this requirement due to the high-dimensional setting.

Robust High-Dimensional Mean Estimation With Low Data Size, an Empirical Study

TL;DR

This work addresses robust mean estimation in high dimensions under low data size, a regime where classical theory often demands

or larger. It conducts an extensive empirical comparison across many estimators and introduces practical adaptations (notably QUE_low_n with an eigenvalue-threshold refinement) to handle

scenarios. The study shows that, for Gaussian-like inliers, QUE_low_n nearly matches the best possible inlier mean and often surpasses other robust methods, while real-world embeddings demonstrate reliable performance with early halting; subtractive corruption remains particularly challenging. Overall, the paper highlights the practical value of robust mean estimation under limited data, provides actionable algorithmic adjustments, and motivates further theoretical and empirical exploration beyond Gaussian assumptions.

Robust High-Dimensional Mean Estimation With Low Data Size, an Empirical Study

TL;DR

Abstract

Robust High-Dimensional Mean Estimation With Low Data Size, an Empirical Study

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (73)

Theorems & Definitions (7)