Performance of the empirical median for location estimation in heteroscedastic settings
Sirine Louati
TL;DR
This work analyzes the empirical median as a robust estimator of a common location parameter $\theta$ when independent observations $X_i$ have symmetric distributions around $\theta$ but unknown and heterogeneous scales $\sigma_i$. The authors derive non-asymptotic, matching upper and lower bounds on the estimation error $|\widehat{\mathrm{med}}-\theta|$, expressing the rate in terms of $\sum_{i=j+1}^{n} \sigma_i^{-1}$ and a cutoff $j$ that depends on the confidence level $\delta$. In the Gaussian special case, the lower bound confirms the rate is tight up to a numerical constant, and corollaries provide explicit constants for practical use. The results establish the empirical median as a principled, parameter-free benchmark in heteroscedastic location estimation, while also clarifying its robustness and intrinsic limits relative to variance-aware estimators and minimax bounds.
Abstract
We investigate the performance of the empirical median for location estimation in heteroscedastic settings. Specifically, we consider independent symmetric real-valued random variables that share a common but unknown location parameter while having different and unknown scale parameters. Estimation under heteroscedasticity arises naturally in many practical situations and has recently attracted considerable attention. In this work, we analyze the empirical median as an estimator of the common location parameter and derive matching non-asymptotic upper and lower bounds on its estimation error. These results fully characterize the behavior of the empirical median in heteroscedastic settings, clarifying both its robustness and its intrinsic limitations and offering a precise understanding of its performance in modern settings where data quality may vary across sources.
