Improved performance guarantees for Tukey's median
Stanislav Minsker, Yinan Shen
TL;DR
The paper investigates Tukey's depth and empirical Tukey's median in multivariate settings with elliptically symmetric distributions, establishing that estimation accuracy and depth-region diameters depend on the effective rank $r(\Sigma)$ rather than the ambient dimension. By leveraging affine equivariance and strong approximation of empirical depth processes, it derives nonparametric bounds for the Euclidean error of Tukey's median and sharp diameter rates for empirical depth regions, including a $O(n^{-3/4}\sqrt{\log n})$ rate in dimension two. It also analyzes robustness under adversarial contamination, showing the error remains controlled when the contamination level $\varepsilon$ is at most on the order of $n^{-1/2}$, with the bounds scaling with $\sqrt{r(\Sigma)}$ rather than $\sqrt{d}$. The work provides technical tools, such as strong approximations for halfspace-indexed empirical processes via Brownian bridges, with implications for affine-equivariant estimators beyond Tukey's median. Overall, it advances understanding of depth-based multivariate ordering under realistic noise and contamination, highlighting the importance of intrinsic dimensionality in high-dimensional robust statistics.
Abstract
Is there a natural way to order data in dimension greater than one? The approach based on the notion of data depth, often associated with John Tukey, is among the most popular. Tukey's depth has found applications in robust statistics, graph theory, and the study of elections and social choice. We present improved performance guarantees for empirical Tukey's median, a deepest point associated with a given sample, when the data-generating distribution is elliptically symmetric and possibly anisotropic. Some of our results remain valid in the wider class of affine equivariant estimators. As a corollary of our bounds, we show that the typical diameter of the set of all empirical Tukey's medians scales like $o(n^{-1/2})$ where $n$ is the sample size. Moreover, when the data follow the bivariate normal distribution, we prove that with high probability, the diameter is of order $O(n^{-3/4}\log^{1/2}(n))$. On the technical side, we show how affine equivariance can be leveraged to improve concentration bounds; moreover, we develop sharp strong approximation results for empirical processes indexed by halfspaces that could be of independent interest.
