Generalized Kernel Thinning
Raaz Dwivedi, Lester Mackey
TL;DR
Generalized Kernel Thinning develops four improvements to KT, yielding dimension-free, per-function error bounds and enhanced MMD guarantees across a broad class of kernels and distributions. It introduces target KT, power KT, and KT+ (a sum of base and power kernels), establishing both per-function and MMD guarantees, with theoretical support from covering-number analyses and MMD interpolation. The authors provide rigorous proofs and empirical validation up to $d=100$, showing substantial reductions in integration error and distributional mismatch compared to i.i.d. sampling, including challenging non-smooth kernels. This work broadens the applicability of kernel thinning to high-dimensional probabilistic inference and offers open-source tools for reproducible experiments.
Abstract
The kernel thinning (KT) algorithm of Dwivedi and Mackey (2021) compresses a probability distribution more effectively than independent sampling by targeting a reproducing kernel Hilbert space (RKHS) and leveraging a less smooth square-root kernel. Here we provide four improvements. First, we show that KT applied directly to the target RKHS yields tighter, dimension-free guarantees for any kernel, any distribution, and any fixed function in the RKHS. Second, we show that, for analytic kernels like Gaussian, inverse multiquadric, and sinc, target KT admits maximum mean discrepancy (MMD) guarantees comparable to or better than those of square-root KT without making explicit use of a square-root kernel. Third, we prove that KT with a fractional power kernel yields better-than-Monte-Carlo MMD guarantees for non-smooth kernels, like Laplace and Matérn, that do not have square-roots. Fourth, we establish that KT applied to a sum of the target and power kernels (a procedure we call KT+) simultaneously inherits the improved MMD guarantees of power KT and the tighter individual function guarantees of target KT. In our experiments with target KT and KT+, we witness significant improvements in integration error even in $100$ dimensions and when compressing challenging differential equation posteriors.
