Table of Contents
Fetching ...

Kernel Thinning

Raaz Dwivedi, Lester Mackey

TL;DR

Kernel thinning provides a practical method to compress a distribution while preserving RKHS integration accuracy, achieving an $n^{1/2}$-sized coreset with near-minimax MMD guarantees. The approach links $L^\infty$ errors of a square-root kernel to the MMD error of the target kernel through kernel halving and a self-balancing Hilbert walk, yielding $O(n^2)$-time algorithms for near-optimal coresets. It delivers explicit non-asymptotic MMD bounds for Gaussian, Matérn, and B-spline kernels and demonstrates superior efficiency over i.i.d. sampling and standard thinning in high-dimensional vignettes. Overall, kernel thinning broadens distribution compression by combining online halving with RKHS-aware error controls, enabling scalable, kernel-aware approximations across a wide range of target distributions and kernels.

Abstract

We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}_{\star}$ and $O(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. The maximum discrepancy in integration error is $O_d(n^{-1/2}\sqrt{\log n})$ in probability for compactly supported $\mathbb{P}$ and $O_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $Ω(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. Moreover, the same construction delivers near-optimal $L^\infty$ coresets in $O(n^2)$ time. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Matérn, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.

Kernel Thinning

TL;DR

Kernel thinning provides a practical method to compress a distribution while preserving RKHS integration accuracy, achieving an -sized coreset with near-minimax MMD guarantees. The approach links errors of a square-root kernel to the MMD error of the target kernel through kernel halving and a self-balancing Hilbert walk, yielding -time algorithms for near-optimal coresets. It delivers explicit non-asymptotic MMD bounds for Gaussian, Matérn, and B-spline kernels and demonstrates superior efficiency over i.i.d. sampling and standard thinning in high-dimensional vignettes. Overall, kernel thinning broadens distribution compression by combining online halving with RKHS-aware error controls, enabling scalable, kernel-aware approximations across a wide range of target distributions and kernels.

Abstract

We introduce kernel thinning, a new procedure for compressing a distribution more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel and time, kernel thinning compresses an -point approximation to into a -point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. The maximum discrepancy in integration error is in probability for compactly supported and for sub-exponential on . In contrast, an equal-sized i.i.d. sample from suffers integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform on but apply to general distributions on and a wide range of common kernels. Moreover, the same construction delivers near-optimal coresets in time. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Matérn, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions through .

Paper Structure

This paper contains 6 sections, 2 theorems, 3 equations, 1 table.

Key Result

Proposition 1

Consider a homogeneous $\phi$-irreducible geometrically ergodic Markov chain gallegosherrada2023equivalences with initial state $x_0$, subsequent iterates $\mathcal{S}_{\infty}$, and stationary distribution $\mathbb{P}$. If ${\mathbf{k}_{\star}}$ satisfies asmp:bounded_measurable, then there exists

Theorems & Definitions (7)

  • Definition 1: Maximum mean discrepancy JMLR:v13:gretton12a
  • Definition 2: MMD coreset
  • Proposition 1: MMD guarantee for MCMC
  • Definition 4: Input radius growth rates
  • Proposition 2: Almost sure radius growth
  • Definition 5: Square-root kernel
  • Definition 6: Shift invariance and spectral density