High-dimensional Gaussian and bootstrap approximations for robust means
Anders Bredahl Kock, David Preinerstorfer
TL;DR
The paper addresses reliable Gaussian and bootstrap approximations for high-dimensional robust means when the dimension $d$ grows rapidly with the sample size $n$. It introduces data-driven Winsorized and trimmed means that adapt to the (unknown) number of finite moments $m>2$, remain robust to adversarial contamination, and admit Gaussian and bootstrap inference via a novel robust covariance estimator. The key contributions include a rigorous high-dimensional Gaussian approximation for Winsorized means that allows exponential growth in $d$, a PSD-corrected covariance estimator with contamination-robust guarantees, and bootstrap consistency including normalized and trimmed variants. Together, these results provide practical, adaptive inference tools for high-dimensional mean estimation under heavy tails and contamination with broad applicability to multivariate settings and complex hypothesis testing.
Abstract
Recent years have witnessed much progress on Gaussian and bootstrap approximations to the distribution of sums of independent random vectors with dimension $d$ large relative to the sample size $n$. However, for any number of moments $m>2$ that the summands may possess, there exist distributions such that these approximations break down if $d$ grows faster than the polynomial barrier $n^{\frac{m}{2}-1}$. In this paper, we establish Gaussian and bootstrap approximations to the distributions of winsorized and trimmed means that allow $d$ to grow at an exponential rate in $n$ as long as $m>2$ moments exist. The approximations remain valid under some amount of adversarial contamination. Our implementations of the winsorized and trimmed means do not require knowledge of $m$. As a consequence, the performance of the approximation guarantees ``adapts'' to $m$.
