Large sample analysis of the median heuristic
Damien Garreau, Wittawat Jitkrittum, Motonobu Kanagawa
TL;DR
The paper addresses the lack of theoretical understanding of the median heuristic for kernel bandwidth selection in RBF kernels, focusing on kernel two-sample tests via MMD. It develops a large-sample framework showing the empirical median of pairwise squared distances converges to a target mixture and is asymptotically normal under mild conditions, derived through a CLT for non-identically distributed U-statistics. The authors provide exact expressions for the target distribution, prove the CLT for the empirical CDF of pairwise distances, and establish the asymptotic normality of the median itself. Empirically, they compare the median heuristic against power-maximization criteria on Gaussian benchmarks, revealing the median performs comparably in mean-shift settings but may be suboptimal under variance-shift scenarios, with implications for bandwidth choice in practice. The work offers a principled justification for the median heuristic in certain regimes and highlights its limitations, guiding when to favor alternative bandwidth selection strategies in kernel-based tests and analyses.
Abstract
In kernel methods, the median heuristic has been widely used as a way of setting the bandwidth of RBF kernels. While its empirical performances make it a safe choice under many circumstances, there is little theoretical understanding of why this is the case. Our aim in this paper is to advance our understanding of the median heuristic by focusing on the setting of kernel two-sample test. We collect new findings that may be of interest for both theoreticians and practitioners. In theory, we provide a convergence analysis that shows the asymptotic normality of the bandwidth chosen by the median heuristic in the setting of kernel two-sample test. Systematic empirical investigations are also conducted in simple settings, comparing the performances based on the bandwidths chosen by the median heuristic and those by the maximization of test power.
