A Practical Introduction to Kernel Discrepancies: MMD, HSIC & KSD
Antonin Schrab
TL;DR
The article provides a practical, kernel-based toolkit for comparing distributions and testing independence through MMD, HSIC, and KSD, detailing both exact and efficient estimators. It formalizes the kernel- and RKHS-based foundations (kernel mean embeddings, cross-covariance operators, and properties like characteristicity and universality) and explains how MMD and HSIC relate to each other and to KSD, including their respective V- and U-statistic estimators. The work then surveys a wide range of computationally efficient estimators (V-, U-, and incomplete statistics such as L, D, B, X, and R) and introduces adaptive kernel pooling (including mean, max, and fuse pooling) to address kernel selection in practice. It provides guidelines for bandwidth calibration, kernel collection, and fusion strategies to enable robust, scalable two-sample, independence, and goodness-of-fit testing in real-world settings. Overall, the paper offers a coherent, actionable framework for kernel discrepancies that balances statistical guarantees with computational efficiency and adaptivity for practical data analysis.
Abstract
This article provides a practical introduction to kernel discrepancies, focusing on the Maximum Mean Discrepancy (MMD), the Hilbert-Schmidt Independence Criterion (HSIC), and the Kernel Stein Discrepancy (KSD). Various estimators for these discrepancies are presented, including the commonly-used V-statistics and U-statistics, as well as several forms of the more computationally-efficient incomplete U-statistics. The importance of the choice of kernel bandwidth is stressed, showing how it affects the behaviour of the discrepancy estimation. Adaptive estimators are introduced, which combine multiple estimators with various kernels, addressing the problem of kernel selection.
