Table of Contents
Fetching ...

Generalized Kernel Thinning

Raaz Dwivedi, Lester Mackey

TL;DR

Generalized Kernel Thinning develops four improvements to KT, yielding dimension-free, per-function error bounds and enhanced MMD guarantees across a broad class of kernels and distributions. It introduces target KT, power KT, and KT+ (a sum of base and power kernels), establishing both per-function and MMD guarantees, with theoretical support from covering-number analyses and MMD interpolation. The authors provide rigorous proofs and empirical validation up to $d=100$, showing substantial reductions in integration error and distributional mismatch compared to i.i.d. sampling, including challenging non-smooth kernels. This work broadens the applicability of kernel thinning to high-dimensional probabilistic inference and offers open-source tools for reproducible experiments.

Abstract

The kernel thinning (KT) algorithm of Dwivedi and Mackey (2021) compresses a probability distribution more effectively than independent sampling by targeting a reproducing kernel Hilbert space (RKHS) and leveraging a less smooth square-root kernel. Here we provide four improvements. First, we show that KT applied directly to the target RKHS yields tighter, dimension-free guarantees for any kernel, any distribution, and any fixed function in the RKHS. Second, we show that, for analytic kernels like Gaussian, inverse multiquadric, and sinc, target KT admits maximum mean discrepancy (MMD) guarantees comparable to or better than those of square-root KT without making explicit use of a square-root kernel. Third, we prove that KT with a fractional power kernel yields better-than-Monte-Carlo MMD guarantees for non-smooth kernels, like Laplace and Matérn, that do not have square-roots. Fourth, we establish that KT applied to a sum of the target and power kernels (a procedure we call KT+) simultaneously inherits the improved MMD guarantees of power KT and the tighter individual function guarantees of target KT. In our experiments with target KT and KT+, we witness significant improvements in integration error even in $100$ dimensions and when compressing challenging differential equation posteriors.

Generalized Kernel Thinning

TL;DR

Generalized Kernel Thinning develops four improvements to KT, yielding dimension-free, per-function error bounds and enhanced MMD guarantees across a broad class of kernels and distributions. It introduces target KT, power KT, and KT+ (a sum of base and power kernels), establishing both per-function and MMD guarantees, with theoretical support from covering-number analyses and MMD interpolation. The authors provide rigorous proofs and empirical validation up to , showing substantial reductions in integration error and distributional mismatch compared to i.i.d. sampling, including challenging non-smooth kernels. This work broadens the applicability of kernel thinning to high-dimensional probabilistic inference and offers open-source tools for reproducible experiments.

Abstract

The kernel thinning (KT) algorithm of Dwivedi and Mackey (2021) compresses a probability distribution more effectively than independent sampling by targeting a reproducing kernel Hilbert space (RKHS) and leveraging a less smooth square-root kernel. Here we provide four improvements. First, we show that KT applied directly to the target RKHS yields tighter, dimension-free guarantees for any kernel, any distribution, and any fixed function in the RKHS. Second, we show that, for analytic kernels like Gaussian, inverse multiquadric, and sinc, target KT admits maximum mean discrepancy (MMD) guarantees comparable to or better than those of square-root KT without making explicit use of a square-root kernel. Third, we prove that KT with a fractional power kernel yields better-than-Monte-Carlo MMD guarantees for non-smooth kernels, like Laplace and Matérn, that do not have square-roots. Fourth, we establish that KT applied to a sum of the target and power kernels (a procedure we call KT+) simultaneously inherits the improved MMD guarantees of power KT and the tighter individual function guarantees of target KT. In our experiments with target KT and KT+, we witness significant improvements in integration error even in dimensions and when compressing challenging differential equation posteriors.

Paper Structure

This paper contains 28 sections, 9 theorems, 61 equations, 6 figures, 3 tables.

Key Result

Theorem 1

Consider algo:ktsplit (algo:ktsplit) with obliviousThroughout, oblivious indicates that a sequence is generated independently of any randomness in KT.$\mathcal{S}_{\textup{in}}$ and $(\delta_i)_{i=1}^{n/2}$ and $\delta^\star \triangleq \min_i\delta_i$. If ${\frac{n}{2^{m}}}\!\in\! \mathbb{N}$, then, with probability at least $p_{\textup{sg}} \!\triangleq\! 1\!-\delta'-\!\sum_{j=1}^{m}\frac{2^{j\!-

Figures (6)

  • Figure 1: Generalized kernel thinning (KT) vs i.i.d. sampling for an 8-component mixture of Gaussians target $\mathbb{P}$. For kernels $\mathbf{k}$ without fast-decaying square-roots, KT+ offers visible and quantifiable improvements over i.i.d. sampling. For Gaussian $\mathbf{k}$, target KT closely mimics root KT.
  • Figure 2: MMD and single-function integration error for Gaussian $\mathbf{k}$ and standard Gaussian $\mathbb{P}$ in $\mathbb{R}^d$. Without using a square-root kernel, target KT matches the MMD performance of root KT and improves upon i.i.d. MMD and single-function integration error, even in $d=100$ dimensions.
  • Figure 3: Kernel thinning+ (KT+) vs. standard MCMC thinning (ST). For kernels without fast-decaying square-roots, KT+ improves MMD and integration error decay rates in each posterior inference task.
  • Figure 4: Generalized kernel thinning (KT) and i.i.d. coresets for various kernels $\mathbf{k}$ (in parentheses) and an 8-component mixture of Gaussian target $\mathbb{P}$ with equidensity contours underlaid. These plots are independent replicates of \ref{['fig:mog_scatter']}. See \ref{['sec:experiments']} for more details.
  • Figure 5: Kernel thinning versus i.i.d. sampling. For mixture of Gaussians $\mathbb{P}$ with $M \in \mathopen{}\mathclose{\left\{ 4, 6 \right \}$ components and the kernel choices of \ref{['sec:experiments']}, the target KT with Gauss$\mathbf{k}$ provides comparable $\mathop{\mathrm{MMD}}\nolimits_{\mathbf{k}}(\mathbb{P}, \mathbb{P}_{\textup{out}})$ error to the root KT, and both provide an $n^{-\frac{1}{2}}$ decay rate improving significantly over the $n^{-\frac{1}{4}}$ decay rate from i.i.d. sampling. For the other kernels, KT+ provides a decay rate close to $n^{-\frac{1}{2}}$ for IMQ and B-spline$\mathbf{k}$, and $n^{-0.35}$ for Laplace$\mathbf{k}$. See \ref{['sec:experiments']} for further discussion.
  • ...and 1 more figures

Theorems & Definitions (12)

  • Theorem 1: Single function guarantees for \ref{['algo:ktsplit']}
  • remark 1: Guarantees for known and oblivious stopping times
  • corollary 1: Guarantees for functions outside of $\mathcal{H}_{\textrm{split}}$
  • definition 1: $\mathbf{k}$ covering number
  • Theorem 2: MMD guarantee for target KT
  • definition 2: $\alpha$-power kernel
  • Theorem 3: MMD guarantee for power KT
  • Theorem 4: Single function & MMD guarantees for KT+
  • proposition 1: An interpolation result for MMD
  • lemma 1: Shifting property of the generalized Fourier transform
  • ...and 2 more