Table of Contents
Fetching ...

Improved performance guarantees for Tukey's median

Stanislav Minsker, Yinan Shen

TL;DR

The paper investigates Tukey's depth and empirical Tukey's median in multivariate settings with elliptically symmetric distributions, establishing that estimation accuracy and depth-region diameters depend on the effective rank $r(\Sigma)$ rather than the ambient dimension. By leveraging affine equivariance and strong approximation of empirical depth processes, it derives nonparametric bounds for the Euclidean error of Tukey's median and sharp diameter rates for empirical depth regions, including a $O(n^{-3/4}\sqrt{\log n})$ rate in dimension two. It also analyzes robustness under adversarial contamination, showing the error remains controlled when the contamination level $\varepsilon$ is at most on the order of $n^{-1/2}$, with the bounds scaling with $\sqrt{r(\Sigma)}$ rather than $\sqrt{d}$. The work provides technical tools, such as strong approximations for halfspace-indexed empirical processes via Brownian bridges, with implications for affine-equivariant estimators beyond Tukey's median. Overall, it advances understanding of depth-based multivariate ordering under realistic noise and contamination, highlighting the importance of intrinsic dimensionality in high-dimensional robust statistics.

Abstract

Is there a natural way to order data in dimension greater than one? The approach based on the notion of data depth, often associated with John Tukey, is among the most popular. Tukey's depth has found applications in robust statistics, graph theory, and the study of elections and social choice. We present improved performance guarantees for empirical Tukey's median, a deepest point associated with a given sample, when the data-generating distribution is elliptically symmetric and possibly anisotropic. Some of our results remain valid in the wider class of affine equivariant estimators. As a corollary of our bounds, we show that the typical diameter of the set of all empirical Tukey's medians scales like $o(n^{-1/2})$ where $n$ is the sample size. Moreover, when the data follow the bivariate normal distribution, we prove that with high probability, the diameter is of order $O(n^{-3/4}\log^{1/2}(n))$. On the technical side, we show how affine equivariance can be leveraged to improve concentration bounds; moreover, we develop sharp strong approximation results for empirical processes indexed by halfspaces that could be of independent interest.

Improved performance guarantees for Tukey's median

TL;DR

The paper investigates Tukey's depth and empirical Tukey's median in multivariate settings with elliptically symmetric distributions, establishing that estimation accuracy and depth-region diameters depend on the effective rank rather than the ambient dimension. By leveraging affine equivariance and strong approximation of empirical depth processes, it derives nonparametric bounds for the Euclidean error of Tukey's median and sharp diameter rates for empirical depth regions, including a rate in dimension two. It also analyzes robustness under adversarial contamination, showing the error remains controlled when the contamination level is at most on the order of , with the bounds scaling with rather than . The work provides technical tools, such as strong approximations for halfspace-indexed empirical processes via Brownian bridges, with implications for affine-equivariant estimators beyond Tukey's median. Overall, it advances understanding of depth-based multivariate ordering under realistic noise and contamination, highlighting the importance of intrinsic dimensionality in high-dimensional robust statistics.

Abstract

Is there a natural way to order data in dimension greater than one? The approach based on the notion of data depth, often associated with John Tukey, is among the most popular. Tukey's depth has found applications in robust statistics, graph theory, and the study of elections and social choice. We present improved performance guarantees for empirical Tukey's median, a deepest point associated with a given sample, when the data-generating distribution is elliptically symmetric and possibly anisotropic. Some of our results remain valid in the wider class of affine equivariant estimators. As a corollary of our bounds, we show that the typical diameter of the set of all empirical Tukey's medians scales like where is the sample size. Moreover, when the data follow the bivariate normal distribution, we prove that with high probability, the diameter is of order . On the technical side, we show how affine equivariance can be leveraged to improve concentration bounds; moreover, we develop sharp strong approximation results for empirical processes indexed by halfspaces that could be of independent interest.
Paper Structure (12 sections, 16 theorems, 130 equations, 5 figures)

This paper contains 12 sections, 16 theorems, 130 equations, 5 figures.

Key Result

Theorem 2.1

Let $X_1,\ldots,X_n$ be i.i.d. copies of $X\sim \mathcal{E}(\mu,\Sigma,F)$ where $\Sigma$ is non-singular. Let $\widetilde{\mu}_n$ be any affine equivariant estimator of $\mu$. Suppose that the inequality holds with probability at least $1-p(t)$. Then with probability at least $1-p(t) - e^{-t}$.

Figures (5)

  • Figure 1: Depth contours: lighter-colored regions correspond to higher depth. Empirical Tukey's median is marked with a circle and the sample mean -- with a cross
  • Figure 2: Empirical depth contours approach their population limit. The blue circle denotes Tukey's median.
  • Figure 3: Contaminated sample with contamination proportion $\varepsilon=0.1$. Circle mark denotes Tukey median and the cross represents the mean.
  • Figure 4: A concave function with a unique maximizer, and its upper level set.
  • Figure 5: Subdifferential of $W(\widehat{z})$.

Theorems & Definitions (30)

  • Theorem 2.1
  • Corollary 2.1: Tukey's median
  • proof
  • Remark 2.1
  • Corollary 2.2: Stahel-Donoho estimator
  • proof
  • Remark 2.2
  • Lemma 2.1
  • proof
  • Theorem 2.2
  • ...and 20 more