Table of Contents
Fetching ...

Stationary MMD Points for Cubature

Zonghao Chen, Toni Karvonen, Heishiro Kanagawa, François-Xavier Briol, Chris. J. Oates

TL;DR

This work addresses the problem of discretely approximating a target distribution $\mu$ with $n$ particles using the maximum mean discrepancy (MMD) as the guiding criterion. It introduces stationary MMD points, a relaxed and computable alternative to global MMD minimisers, and proves a surprising super-convergence: for functions $f$ in the RKHS $\mathcal{H}$, the cubature error when using stationary MMD points satisfies $|\frac{1}{n}\sum f(x_i) - \int f \, d\mu| = o(\operatorname{MMD}(\mu, \mu_n))$, with the cubature exact on $\mathcal{F}_n = \mathrm{span}\{1\} \cup \mathcal{G}_{\mathcal{X}_n}$. To compute these points practically, the paper develops a noisy MMD gradient-flow scheme that evolves particles toward stationarity; it provides a non-asymptotic, finite-particle error bound showing convergence rates balancing optimization and estimation errors. Empirically, stationary MMD points outperform several baselines on mixtures of Gaussians and OpenML datasets, and the results illuminate the impact of kernel choice and the role of gradient-flow dynamics. Overall, the combination of super-convergence theory and finite-particle convergence guarantees yields a robust, scalable framework for kernel-based cubature and coreset construction with broad applicability.

Abstract

Approximation of a target probability distribution using a finite set of points is a problem of fundamental importance, arising in cubature, data compression, and optimisation. Several authors have proposed to select points by minimising a maximum mean discrepancy (MMD), but the non-convexity of this objective precludes global minimisation in general. Instead, we consider \emph{stationary} points of the MMD which, in contrast to points globally minimising the MMD, can be accurately computed. Our main theoretical contribution is the (perhaps surprising) result that, for integrands in the associated reproducing kernel Hilbert space, the cubature error of stationary MMD points vanishes \emph{faster} than the MMD. Motivated by this \emph{super-convergence} property, we consider discretised gradient flows as a practical strategy for computing stationary points of the MMD, presenting a refined convergence analysis that establishes a novel non-asymptotic finite-particle error bound, which may be of independent interest.

Stationary MMD Points for Cubature

TL;DR

This work addresses the problem of discretely approximating a target distribution with particles using the maximum mean discrepancy (MMD) as the guiding criterion. It introduces stationary MMD points, a relaxed and computable alternative to global MMD minimisers, and proves a surprising super-convergence: for functions in the RKHS , the cubature error when using stationary MMD points satisfies , with the cubature exact on . To compute these points practically, the paper develops a noisy MMD gradient-flow scheme that evolves particles toward stationarity; it provides a non-asymptotic, finite-particle error bound showing convergence rates balancing optimization and estimation errors. Empirically, stationary MMD points outperform several baselines on mixtures of Gaussians and OpenML datasets, and the results illuminate the impact of kernel choice and the role of gradient-flow dynamics. Overall, the combination of super-convergence theory and finite-particle convergence guarantees yields a robust, scalable framework for kernel-based cubature and coreset construction with broad applicability.

Abstract

Approximation of a target probability distribution using a finite set of points is a problem of fundamental importance, arising in cubature, data compression, and optimisation. Several authors have proposed to select points by minimising a maximum mean discrepancy (MMD), but the non-convexity of this objective precludes global minimisation in general. Instead, we consider \emph{stationary} points of the MMD which, in contrast to points globally minimising the MMD, can be accurately computed. Our main theoretical contribution is the (perhaps surprising) result that, for integrands in the associated reproducing kernel Hilbert space, the cubature error of stationary MMD points vanishes \emph{faster} than the MMD. Motivated by this \emph{super-convergence} property, we consider discretised gradient flows as a practical strategy for computing stationary points of the MMD, presenting a refined convergence analysis that establishes a novel non-asymptotic finite-particle error bound, which may be of independent interest.

Paper Structure

This paper contains 32 sections, 14 theorems, 67 equations, 5 figures.

Key Result

Proposition 3.1

Suppose the kernel $k$ satisfies asst:easy_assumption. Then cubature using $\{\bm{x}_i\}_{i=1}^n$ is exact on $\mathcal{F}_n \coloneqq \mathrm{span}(\{1\}\cup \mathcal{G}_{\mathcal{X}_n})$. That is, $\frac{1}{n}\sum_{i=1}^n f(\bm{x}_i) = \int f \mathrm{d}\mu$ for all $f \in \mathcal{F}_{n}$.

Figures (5)

  • Figure 1: Quantisation of a mixture of $4$ distinct cross-shaped uniform distributions with (Left) $20$ i.i.d. samples and (Right) $20$stationary MMD points computed via a discretised gradient flow, simulated for a sufficient length of time $T$ (also shown are the intermediate times $T = 0$ and $T = 20$). Stationary MMD points correspond in this case to a local minimum of the MMD, as there are $4$ samples in the first cross and $6$ samples in the second cross; these points can be explicitly computed. Minimum MMD points (not shown) would (presumably) assign $5$ points to each cross, but such points cannot be computed in general.
  • Figure 2: Exact cubature of a function in $\mathcal{F}_n$ with stationary MMD points computed by MMD gradient flow, verifying \ref{['prop: exactness']}. The convergence plateaus at around $10^{-15}$ due to numerical precision.
  • Figure 3: Comparison of stationary MMD points with baseline methods. Top row: mixture of Gaussians. Bottom left:House8L dataset. Bottom right:Elevators dataset. All results are averaged over 20 independent runs with different random seeds; shaded regions indicate 25%--75% quantiles.
  • Figure 4: Empirical verification of \ref{['asst:noise_scale']} on noise injection level $\beta_t$ used in all our experiments. The red line represents the left hand side of \ref{['eq:noise_level_condition']} and the black line represents the right hand side of \ref{['eq:noise_level_condition']}.
  • Figure 5: Ablation study of our stationary MMD points and all baselines using a Matérn-$\frac{3}{2}$ kernel on the House8L and Elevators datasets. The function used for cubature is $f_1(\bm{x}) =\exp(-0.5 \|\bm{x}\|^2)$.

Theorems & Definitions (33)

  • Proposition 3.1: Cubature exactness
  • Remark 3.2: Example of $\mathcal{G}_{\mathcal{X}_n}$
  • Proposition 3.3: Asymptotic approximation capacity of $\mathcal{F}_n$
  • proof : Proof of \ref{['prop:f_seminorm']}
  • Theorem 3.4: Super-convergence of stationary MMD points
  • proof : Proof of \ref{['thm: main']}
  • Remark 3.5: Super-convergence
  • Theorem 4.1
  • Corollary 4.2
  • Remark 4.3: Comparison with existing convergence results in arbel2019maximum
  • ...and 23 more