Variance-Aware Estimation of Kernel Mean Embedding
Geoffrey Wolfer, Pierre Alquier
TL;DR
The paper develops variance-aware convergence bounds for kernel mean embeddings (KMEs) measured by the maximum mean discrepancy (MMD). It shows that the empirical KME deviation $\|\widehat{\mu}_{\mathbb{P}} - \mu_{\mathbb{P}}\|_{\mathcal{H}_k}$ can be tightly bounded by a term that scales with the RKHS variance $v_k(\mathbb{P})$, via $\sqrt{2 v_k(\mathbb{P}) \frac{\log(2/\delta)}{n}}$, plus lower-order terms, and that this bound can be made data-driven by replacing $v_k(\mathbb{P})$ with an empirical proxy $\widehat{v}_k$ for translation-invariant kernels. The authors extend these results to time-dependent data (\phi- and \beta- mixing), formulate empirical-variance Bernstein bounds, and apply them to hypothesis testing (goodness-of-fit and two-sample tests) and robust parametric estimation under Huber contamination, with explicit bounds in the Gaussian location setting and links to parameter-space error via a link function. Overall, the work provides finite-sample, distribution-agnostic improvements to MMD-based inference, enabling faster rates in favorable variance regimes and principled handling of dependent data. The results bridge kernel methods with empirical Bernstein techniques to yield practical, provably tighter confidence bounds and test procedures for KMEs.
Abstract
An important feature of kernel mean embeddings (KME) is that the rate of convergence of the empirical KME to the true distribution KME can be bounded independently of the dimension of the space, properties of the distribution and smoothness features of the kernel. We show how to speed-up convergence by leveraging variance information in the reproducing kernel Hilbert space. Furthermore, we show that even when such information is a priori unknown, we can efficiently estimate it from the data, recovering the desiderata of a distribution agnostic bound that enjoys acceleration in fortuitous settings. We further extend our results from independent data to stationary mixing sequences and illustrate our methods in the context of hypothesis testing and robust parametric estimation.
