Table of Contents
Fetching ...

A Practical Introduction to Kernel Discrepancies: MMD, HSIC & KSD

Antonin Schrab

TL;DR

The article provides a practical, kernel-based toolkit for comparing distributions and testing independence through MMD, HSIC, and KSD, detailing both exact and efficient estimators. It formalizes the kernel- and RKHS-based foundations (kernel mean embeddings, cross-covariance operators, and properties like characteristicity and universality) and explains how MMD and HSIC relate to each other and to KSD, including their respective V- and U-statistic estimators. The work then surveys a wide range of computationally efficient estimators (V-, U-, and incomplete statistics such as L, D, B, X, and R) and introduces adaptive kernel pooling (including mean, max, and fuse pooling) to address kernel selection in practice. It provides guidelines for bandwidth calibration, kernel collection, and fusion strategies to enable robust, scalable two-sample, independence, and goodness-of-fit testing in real-world settings. Overall, the paper offers a coherent, actionable framework for kernel discrepancies that balances statistical guarantees with computational efficiency and adaptivity for practical data analysis.

Abstract

This article provides a practical introduction to kernel discrepancies, focusing on the Maximum Mean Discrepancy (MMD), the Hilbert-Schmidt Independence Criterion (HSIC), and the Kernel Stein Discrepancy (KSD). Various estimators for these discrepancies are presented, including the commonly-used V-statistics and U-statistics, as well as several forms of the more computationally-efficient incomplete U-statistics. The importance of the choice of kernel bandwidth is stressed, showing how it affects the behaviour of the discrepancy estimation. Adaptive estimators are introduced, which combine multiple estimators with various kernels, addressing the problem of kernel selection.

A Practical Introduction to Kernel Discrepancies: MMD, HSIC & KSD

TL;DR

The article provides a practical, kernel-based toolkit for comparing distributions and testing independence through MMD, HSIC, and KSD, detailing both exact and efficient estimators. It formalizes the kernel- and RKHS-based foundations (kernel mean embeddings, cross-covariance operators, and properties like characteristicity and universality) and explains how MMD and HSIC relate to each other and to KSD, including their respective V- and U-statistic estimators. The work then surveys a wide range of computationally efficient estimators (V-, U-, and incomplete statistics such as L, D, B, X, and R) and introduces adaptive kernel pooling (including mean, max, and fuse pooling) to address kernel selection in practice. It provides guidelines for bandwidth calibration, kernel collection, and fusion strategies to enable robust, scalable two-sample, independence, and goodness-of-fit testing in real-world settings. Overall, the paper offers a coherent, actionable framework for kernel discrepancies that balances statistical guarantees with computational efficiency and adaptivity for practical data analysis.

Abstract

This article provides a practical introduction to kernel discrepancies, focusing on the Maximum Mean Discrepancy (MMD), the Hilbert-Schmidt Independence Criterion (HSIC), and the Kernel Stein Discrepancy (KSD). Various estimators for these discrepancies are presented, including the commonly-used V-statistics and U-statistics, as well as several forms of the more computationally-efficient incomplete U-statistics. The importance of the choice of kernel bandwidth is stressed, showing how it affects the behaviour of the discrepancy estimation. Adaptive estimators are introduced, which combine multiple estimators with various kernels, addressing the problem of kernel selection.

Paper Structure

This paper contains 47 sections, 108 equations, 7 figures.

Figures (7)

  • Figure 1: V-statistic. Visualisation of the core kernel matrix entries $h(X_i,X_j)$ considered (in blue) and ignored (in white) in the sum for the V-statistic computation with $n=10$.
  • Figure 2: U-statistic. Visualisation of the core kernel matrix entries $h(X_i,X_j)$ considered (in blue) and ignored (in white) in the sum for the U-statistic computation with $n=10$.
  • Figure 3: L-statistic. Visualisation of the core kernel matrix entries $h(X_i,X_j)$ considered (in blue) and ignored (in white) for the L-statistic computation with $n=10$.
  • Figure 4: D-statistic. Visualisation of the core kernel matrix entries $h(X_i,X_j)$ considered (in blue) and ignored (in white) in the sum for the D-statistic computation with $n=10$. (Left)$r=1$. (Centre)$r=2$. (Right)$r=3$.
  • Figure 5: B-statistic. Visualisation of the core kernel matrix entries $h(X_i,X_j)$ considered (in blue) and ignored (in white) in the sum for the B-statistic computation with $n=10$. (Left)$b=2$, $n_1=n_2=5$. (Centre)$b=2$, $n_1=3$, $n_2=5$. (Right)$b=3$, $n_1=4$, $n_2=4$, $n_3=2$.
  • ...and 2 more figures