Kernel Thinning

Raaz Dwivedi; Lester Mackey

Kernel Thinning

Raaz Dwivedi, Lester Mackey

TL;DR

Kernel thinning provides a practical method to compress a distribution while preserving RKHS integration accuracy, achieving an $n^{1/2}$-sized coreset with near-minimax MMD guarantees. The approach links $L^\infty$ errors of a square-root kernel to the MMD error of the target kernel through kernel halving and a self-balancing Hilbert walk, yielding $O(n^2)$-time algorithms for near-optimal coresets. It delivers explicit non-asymptotic MMD bounds for Gaussian, Matérn, and B-spline kernels and demonstrates superior efficiency over i.i.d. sampling and standard thinning in high-dimensional vignettes. Overall, kernel thinning broadens distribution compression by combining online halving with RKHS-aware error controls, enabling scalable, kernel-aware approximations across a wide range of target distributions and kernels.

Abstract

We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}_{\star}$ and $O(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. The maximum discrepancy in integration error is $O_d(n^{-1/2}\sqrt{\log n})$ in probability for compactly supported $\mathbb{P}$ and $O_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $Ω(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. Moreover, the same construction delivers near-optimal $L^\infty$ coresets in $O(n^2)$ time. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Matérn, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.

Kernel Thinning

TL;DR

Kernel thinning provides a practical method to compress a distribution while preserving RKHS integration accuracy, achieving an

-sized coreset with near-minimax MMD guarantees. The approach links

errors of a square-root kernel to the MMD error of the target kernel through kernel halving and a self-balancing Hilbert walk, yielding

-time algorithms for near-optimal coresets. It delivers explicit non-asymptotic MMD bounds for Gaussian, Matérn, and B-spline kernels and demonstrates superior efficiency over i.i.d. sampling and standard thinning in high-dimensional vignettes. Overall, kernel thinning broadens distribution compression by combining online halving with RKHS-aware error controls, enabling scalable, kernel-aware approximations across a wide range of target distributions and kernels.

Abstract

We introduce kernel thinning, a new procedure for compressing a distribution

more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel

and

time, kernel thinning compresses an

-point approximation to

into a

-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. The maximum discrepancy in integration error is

in probability for compactly supported

and

for sub-exponential

. In contrast, an equal-sized i.i.d. sample from

suffers

integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform

but apply to general distributions on

and a wide range of common kernels. Moreover, the same construction delivers near-optimal

coresets in

time. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Matérn, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions

through

Kernel Thinning

TL;DR

Abstract

Kernel Thinning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (7)