On the distance between mean and geometric median in high dimensions
Richard Schwank, Mathias Drton
TL;DR
This work analyzes the distance between the geometric median and the mean in high-dimensional settings under $M$-dependent component structure. It proves a dimension-dependent contraction: the Euclidean distance $\|\mathbf{m}_p - \boldsymbol{\mu}\|$ decays as $O(p^{-1/2})$, and provides a first-order, componentwise expansion $m_{p,i} = \mu_i - \frac{1}{2\bar{\sigma}^2 p} \sum_{j\in\mathcal{E}(i)} \mathbb{E}[(X_i - \mu_i)(X_j - \mu_j)^2] + o(1/p)$, with $\mathcal{E}(i)$ the neighboring index set. The results rely on the distribution class $\mathcal{D}(M,q,c,\sigma_{\min},C)$ and require $q>3$ for the expansion, along with a finite limit of the average variance $\bar{\sigma}^2$. Simulations across exponential, MA(2), and Pareto data confirm the rates, while highlighting slower convergence under heavy tails; the online Robbins-Monro algorithm is used to approximate the median efficiently. These findings illuminate the bias-robustness trade-off in high dimensions and connect to classical asymptotic equivalences for isotropic Gaussian settings.
Abstract
The geometric median, a notion of center for multivariate distributions, has gained recent attention in robust statistics and machine learning. Although conceptually distinct from the mean (i.e., expectation), we demonstrate that both are very close in high dimensions when the dependence between the distribution components is suitably controlled. Concretely, we find an upper bound on the distance that vanishes with the dimension asymptotically, and derive a rate-matching first order expansion of the geometric median components. Simulations illustrate and confirm our results.
