Table of Contents
Fetching ...

Kernel Quantile Embeddings and Associated Probability Metrics

Masha Naslidnyk, Siu Lun Chau, François-Xavier Briol, Krikamol Muandet

TL;DR

This work introduces kernel quantile embeddings (KQEs) and kernel quantile discrepancies (KQDs) to represent and compare probability distributions in RKHSs beyond the traditional kernel mean embedding. By using directional quantiles in the RKHS, the authors establish quantile-characteristic kernels under milder conditions than mean-characteristic ones, and define e-KQD and sup-KQD as distance measures with near-linear estimators. They show connections to sliced Wasserstein and Sinkhorn divergences, and provide a Gaussian-measure-based estimator for e-KQD with favorable computational costs. Empirical results on two-sample testing demonstrate competitive power with MMD-based methods while offering improved scalability, especially in high-dimensional settings. Overall, KQEs offer a flexible, efficient framework for distributional comparisons with potential across hypothesis testing and related distributional learning tasks.

Abstract

Embedding probability distributions into reproducing kernel Hilbert spaces (RKHS) has enabled powerful nonparametric methods such as the maximum mean discrepancy (MMD), a statistical distance with strong theoretical and computational properties. At its core, the MMD relies on kernel mean embeddings to represent distributions as mean functions in RKHS. However, it remains unclear if the mean function is the only meaningful RKHS representation. Inspired by generalised quantiles, we introduce the notion of kernel quantile embeddings (KQEs). We then use KQEs to construct a family of distances that: (i) are probability metrics under weaker kernel conditions than MMD; (ii) recover a kernelised form of the sliced Wasserstein distance; and (iii) can be efficiently estimated with near-linear cost. Through hypothesis testing, we show that these distances offer a competitive alternative to MMD and its fast approximations.

Kernel Quantile Embeddings and Associated Probability Metrics

TL;DR

This work introduces kernel quantile embeddings (KQEs) and kernel quantile discrepancies (KQDs) to represent and compare probability distributions in RKHSs beyond the traditional kernel mean embedding. By using directional quantiles in the RKHS, the authors establish quantile-characteristic kernels under milder conditions than mean-characteristic ones, and define e-KQD and sup-KQD as distance measures with near-linear estimators. They show connections to sliced Wasserstein and Sinkhorn divergences, and provide a Gaussian-measure-based estimator for e-KQD with favorable computational costs. Empirical results on two-sample testing demonstrate competitive power with MMD-based methods while offering improved scalability, especially in high-dimensional settings. Overall, KQEs offer a flexible, efficient framework for distributional comparisons with potential across hypothesis testing and related distributional learning tasks.

Abstract

Embedding probability distributions into reproducing kernel Hilbert spaces (RKHS) has enabled powerful nonparametric methods such as the maximum mean discrepancy (MMD), a statistical distance with strong theoretical and computational properties. At its core, the MMD relies on kernel mean embeddings to represent distributions as mean functions in RKHS. However, it remains unclear if the mean function is the only meaningful RKHS representation. Inspired by generalised quantiles, we introduce the notion of kernel quantile embeddings (KQEs). We then use KQEs to construct a family of distances that: (i) are probability metrics under weaker kernel conditions than MMD; (ii) recover a kernelised form of the sliced Wasserstein distance; and (iii) can be efficiently estimated with near-linear cost. Through hypothesis testing, we show that these distances offer a competitive alternative to MMD and its fast approximations.

Paper Structure

This paper contains 52 sections, 16 theorems, 107 equations, 8 figures, 1 algorithm.

Key Result

Theorem 1

Under as:input_spaceas:kernel, the kernel $k$ is quantile-characteristic, meaning the mapping $P \mapsto \{\rho_P^{\alpha,u} : \alpha \in [0, 1], u \in S_\mathcal{H}\}$ is injective.

Figures (8)

  • Figure 1: Illustration of bivariate quantiles.Left: Bivariate distribution $P$. Center: Density of the projection of $P$ onto direction $u$ on the unit circle, with $\phi_u(x)=\langle u, x \rangle$. Right: different quantiles for all possible directions $u$.
  • Figure 2: Illustration of the impact of the slicing direction on KQEs. Suppose $X \sim P$, the KQEs $\rho_P^{\alpha, u}(x) \coloneqq \rho_{u \#P}^{\alpha} u(x)$ are obtained by considering the $\alpha^\text{th}$ quantile of $u(X)$. Clearly, these quantiles might vary significantly depending on the slicing direction used.
  • Figure 3: Experimental results comparing our proposed methods with baseline approaches. Methods represented by dotted lines exhibit quadratic complexity for a single computation of the test statistic, while the remaining methods achieve near-linear or linear computational efficiency. A higher rejection rate indicates better performance in distinguishing between distributions. Overall, quadratic-time quantile-based estimators perform comparably to quadratic-time MMD estimators, while near-linear time quantile-based estimators often outperform their MMD-based counterparts.
  • Figure 4: Comparing the time (in seconds) required to complete the CIFAR-10 vs. CIFAR-10.1 experiment, plotted on a logarithmic scale. A shorter time indicates a faster algorithm. These results align with our complexity analysis.
  • Figure 5: Type I control results for our experiment on CIFAR-10 v.s. CIFAR-10.1. We see all methods control their Type I error around or below the specified Type I error rate $0.05$, thus confirming our tests in the main text are valid testing procedures.
  • ...and 3 more figures

Theorems & Definitions (28)

  • Theorem 1: Cramér-Wold Theorem in RKHS
  • Theorem 2
  • Theorem 3: Finite-Sample Consistency for Empirical KQEs
  • Theorem 4: KQDs as Probability Metrics
  • Theorem 5: Finite-Sample Consistency for Empirical KQDs
  • Proposition 1: Sampling from a Gaussian measure
  • Proposition 2: Centered $\text{e-KQD}_2$
  • proof
  • Definition 1: vakhania1987probability, Section IV.2.1
  • Lemma 1
  • ...and 18 more