Improved performance guarantees for Tukey's median

Stanislav Minsker; Yinan Shen

Improved performance guarantees for Tukey's median

Stanislav Minsker, Yinan Shen

TL;DR

The paper investigates Tukey's depth and empirical Tukey's median in multivariate settings with elliptically symmetric distributions, establishing that estimation accuracy and depth-region diameters depend on the effective rank $r(\Sigma)$ rather than the ambient dimension. By leveraging affine equivariance and strong approximation of empirical depth processes, it derives nonparametric bounds for the Euclidean error of Tukey's median and sharp diameter rates for empirical depth regions, including a $O(n^{-3/4}\sqrt{\log n})$ rate in dimension two. It also analyzes robustness under adversarial contamination, showing the error remains controlled when the contamination level $\varepsilon$ is at most on the order of $n^{-1/2}$, with the bounds scaling with $\sqrt{r(\Sigma)}$ rather than $\sqrt{d}$. The work provides technical tools, such as strong approximations for halfspace-indexed empirical processes via Brownian bridges, with implications for affine-equivariant estimators beyond Tukey's median. Overall, it advances understanding of depth-based multivariate ordering under realistic noise and contamination, highlighting the importance of intrinsic dimensionality in high-dimensional robust statistics.

Abstract

Is there a natural way to order data in dimension greater than one? The approach based on the notion of data depth, often associated with John Tukey, is among the most popular. Tukey's depth has found applications in robust statistics, graph theory, and the study of elections and social choice. We present improved performance guarantees for empirical Tukey's median, a deepest point associated with a given sample, when the data-generating distribution is elliptically symmetric and possibly anisotropic. Some of our results remain valid in the wider class of affine equivariant estimators. As a corollary of our bounds, we show that the typical diameter of the set of all empirical Tukey's medians scales like $o(n^{-1/2})$ where $n$ is the sample size. Moreover, when the data follow the bivariate normal distribution, we prove that with high probability, the diameter is of order $O(n^{-3/4}\log^{1/2}(n))$. On the technical side, we show how affine equivariance can be leveraged to improve concentration bounds; moreover, we develop sharp strong approximation results for empirical processes indexed by halfspaces that could be of independent interest.

Improved performance guarantees for Tukey's median

TL;DR

rather than the ambient dimension. By leveraging affine equivariance and strong approximation of empirical depth processes, it derives nonparametric bounds for the Euclidean error of Tukey's median and sharp diameter rates for empirical depth regions, including a

rate in dimension two. It also analyzes robustness under adversarial contamination, showing the error remains controlled when the contamination level

is at most on the order of

, with the bounds scaling with

rather than

. The work provides technical tools, such as strong approximations for halfspace-indexed empirical processes via Brownian bridges, with implications for affine-equivariant estimators beyond Tukey's median. Overall, it advances understanding of depth-based multivariate ordering under realistic noise and contamination, highlighting the importance of intrinsic dimensionality in high-dimensional robust statistics.

Abstract

where

is the sample size. Moreover, when the data follow the bivariate normal distribution, we prove that with high probability, the diameter is of order

. On the technical side, we show how affine equivariance can be leveraged to improve concentration bounds; moreover, we develop sharp strong approximation results for empirical processes indexed by halfspaces that could be of independent interest.

Paper Structure (12 sections, 16 theorems, 130 equations, 5 figures)

This paper contains 12 sections, 16 theorems, 130 equations, 5 figures.

Introduction
Notation
Main results
Contamination-free framework
Performance guarantees in the adversarial contamination framework
Proof of Theorem \ref{['th:diameter']}
Lower bounds for the diameter of the empirical depth regions
Discussion
Proof of Theorem \ref{['th:affine']}.
Proof of Lemma \ref{['lemma:rate']}.
Proof of Lemma \ref{['lemma:density:bound']}.
Auxiliary results.

Key Result

Theorem 2.1

Let $X_1,\ldots,X_n$ be i.i.d. copies of $X\sim \mathcal{E}(\mu,\Sigma,F)$ where $\Sigma$ is non-singular. Let $\widetilde{\mu}_n$ be any affine equivariant estimator of $\mu$. Suppose that the inequality holds with probability at least $1-p(t)$. Then with probability at least $1-p(t) - e^{-t}$.

Figures (5)

Figure 1: Depth contours: lighter-colored regions correspond to higher depth. Empirical Tukey's median is marked with a circle and the sample mean -- with a cross
Figure 2: Empirical depth contours approach their population limit. The blue circle denotes Tukey's median.
Figure 3: Contaminated sample with contamination proportion $\varepsilon=0.1$. Circle mark denotes Tukey median and the cross represents the mean.
Figure 4: A concave function with a unique maximizer, and its upper level set.
Figure 5: Subdifferential of $W(\widehat{z})$.

Theorems & Definitions (30)

Theorem 2.1
Corollary 2.1: Tukey's median
proof
Remark 2.1
Corollary 2.2: Stahel-Donoho estimator
proof
Remark 2.2
Lemma 2.1
proof
Theorem 2.2
...and 20 more

Improved performance guarantees for Tukey's median

TL;DR

Abstract

Improved performance guarantees for Tukey's median

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (30)