On the distance between mean and geometric median in high dimensions

Richard Schwank; Mathias Drton

On the distance between mean and geometric median in high dimensions

Richard Schwank, Mathias Drton

TL;DR

This work analyzes the distance between the geometric median and the mean in high-dimensional settings under $M$-dependent component structure. It proves a dimension-dependent contraction: the Euclidean distance $\|\mathbf{m}_p - \boldsymbol{\mu}\|$ decays as $O(p^{-1/2})$, and provides a first-order, componentwise expansion $m_{p,i} = \mu_i - \frac{1}{2\bar{\sigma}^2 p} \sum_{j\in\mathcal{E}(i)} \mathbb{E}[(X_i - \mu_i)(X_j - \mu_j)^2] + o(1/p)$, with $\mathcal{E}(i)$ the neighboring index set. The results rely on the distribution class $\mathcal{D}(M,q,c,\sigma_{\min},C)$ and require $q>3$ for the expansion, along with a finite limit of the average variance $\bar{\sigma}^2$. Simulations across exponential, MA(2), and Pareto data confirm the rates, while highlighting slower convergence under heavy tails; the online Robbins-Monro algorithm is used to approximate the median efficiently. These findings illuminate the bias-robustness trade-off in high dimensions and connect to classical asymptotic equivalences for isotropic Gaussian settings.

Abstract

The geometric median, a notion of center for multivariate distributions, has gained recent attention in robust statistics and machine learning. Although conceptually distinct from the mean (i.e., expectation), we demonstrate that both are very close in high dimensions when the dependence between the distribution components is suitably controlled. Concretely, we find an upper bound on the distance that vanishes with the dimension asymptotically, and derive a rate-matching first order expansion of the geometric median components. Simulations illustrate and confirm our results.

On the distance between mean and geometric median in high dimensions

TL;DR

This work analyzes the distance between the geometric median and the mean in high-dimensional settings under

-dependent component structure. It proves a dimension-dependent contraction: the Euclidean distance

decays as

, and provides a first-order, componentwise expansion

, with

the neighboring index set. The results rely on the distribution class

and require

for the expansion, along with a finite limit of the average variance

. Simulations across exponential, MA(2), and Pareto data confirm the rates, while highlighting slower convergence under heavy tails; the online Robbins-Monro algorithm is used to approximate the median efficiently. These findings illuminate the bias-robustness trade-off in high dimensions and connect to classical asymptotic equivalences for isotropic Gaussian settings.

On the distance between mean and geometric median in high dimensions

TL;DR

Abstract

On the distance between mean and geometric median in high dimensions

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (29)