Table of Contents
Fetching ...

Weighted quantization using MMD: From mean field to mean shift via gradient flows

Ayoub Belhadji, Daniel Sharp, Youssef Marzouk

TL;DR

This work addresses the problem of approximating a target distribution $\pi$ by a weighted $M$-point Dirac mixture to minimize the maximum mean discrepancy $\mathrm{MMD}$. It introduces a Wasserstein–Fisher–Rao gradient flow and its practical discretization as an interacting-particle system (IPS), along with a fixed-point scheme called mean shift interacting particles (MSIP) that extends mean shift and acts as a preconditioned gradient descent for MMD minimization. By unifying gradient flows, mean shift, and kernel-based quantization, the authors derive robust, scalable algorithms that perform well in high-dimensional and multi-modal settings, as demonstrated on Gaussian mixtures and MNIST. The proposed MSIP and WFR-IPS show improved robustness to initialization and deliver near-optimal MMD quantizations, with potential implications for efficient kernel quadrature and mode-seeking in complex distributions.

Abstract

Approximating a probability distribution using a set of particles is a fundamental problem in machine learning and statistics, with applications including clustering and quantization. Formally, we seek a weighted mixture of Dirac measures that best approximates the target distribution. While much existing work relies on the Wasserstein distance to quantify approximation errors, maximum mean discrepancy (MMD) has received comparatively less attention, especially when allowing for variable particle weights. We argue that a Wasserstein-Fisher-Rao gradient flow is well-suited for designing quantizations optimal under MMD. We show that a system of interacting particles satisfying a set of ODEs discretizes this flow. We further derive a new fixed-point algorithm called mean shift interacting particles (MSIP). We show that MSIP extends the classical mean shift algorithm, widely used for identifying modes in kernel density estimators. Moreover, we show that MSIP can be interpreted as preconditioned gradient descent and that it acts as a relaxation of Lloyd's algorithm for clustering. Our unification of gradient flows, mean shift, and MMD-optimal quantization yields algorithms that are more robust than state-of-the-art methods, as demonstrated via high-dimensional and multi-modal numerical experiments.

Weighted quantization using MMD: From mean field to mean shift via gradient flows

TL;DR

This work addresses the problem of approximating a target distribution by a weighted -point Dirac mixture to minimize the maximum mean discrepancy . It introduces a Wasserstein–Fisher–Rao gradient flow and its practical discretization as an interacting-particle system (IPS), along with a fixed-point scheme called mean shift interacting particles (MSIP) that extends mean shift and acts as a preconditioned gradient descent for MMD minimization. By unifying gradient flows, mean shift, and kernel-based quantization, the authors derive robust, scalable algorithms that perform well in high-dimensional and multi-modal settings, as demonstrated on Gaussian mixtures and MNIST. The proposed MSIP and WFR-IPS show improved robustness to initialization and deliver near-optimal MMD quantizations, with potential implications for efficient kernel quadrature and mode-seeking in complex distributions.

Abstract

Approximating a probability distribution using a set of particles is a fundamental problem in machine learning and statistics, with applications including clustering and quantization. Formally, we seek a weighted mixture of Dirac measures that best approximates the target distribution. While much existing work relies on the Wasserstein distance to quantify approximation errors, maximum mean discrepancy (MMD) has received comparatively less attention, especially when allowing for variable particle weights. We argue that a Wasserstein-Fisher-Rao gradient flow is well-suited for designing quantizations optimal under MMD. We show that a system of interacting particles satisfying a set of ODEs discretizes this flow. We further derive a new fixed-point algorithm called mean shift interacting particles (MSIP). We show that MSIP extends the classical mean shift algorithm, widely used for identifying modes in kernel density estimators. Moreover, we show that MSIP can be interpreted as preconditioned gradient descent and that it acts as a relaxation of Lloyd's algorithm for clustering. Our unification of gradient flows, mean shift, and MMD-optimal quantization yields algorithms that are more robust than state-of-the-art methods, as demonstrated via high-dimensional and multi-modal numerical experiments.

Paper Structure

This paper contains 44 sections, 8 theorems, 106 equations, 11 figures, 3 tables, 2 algorithms.

Key Result

Proposition 3.1

Define the system of ordinary differential equations where $i \in [M]$, $\alpha > 0$, and $v_{0}$ is defined in eq:mke_def. If $(\mu_{t})_{t \geq 0}$ solves eq:WFR_particle_equation, then it weakly satisfies eq:WFR_for_MMD.

Figures (11)

  • Figure 1: Comparison of quantization algorithms on a joker distribution, $M=10$. All algorithms initialized identically on the top-right mode. Each marker's size denotes the relative particle weight.
  • Figure 2: Comparison of different quantization algorithms on a GMM. (Left): dimension $d=2$, $L_0 = M = 3$. (Right): dimension $d=100$, $L_0=5$, $M=10$. We use squared-exponential kernel with bandwidth $\sigma=5$ for all kernel-based algorithms with hyperparameter tuning for each algorithm.
  • Figure 3: Comparing quantizations of MNIST
  • Figure 4: First five univariate and pairwise marginals of the $100$-dimensional distribution used in \ref{['sec:synthetic_basic_numerics']}
  • Figure 5: Trajectories of four algorithms started at two different intializations (yellow and red symbols). Each marker is one iteration for a particle, and the lines show particle paths. White markers are the final particle positions.
  • ...and 6 more figures

Theorems & Definitions (10)

  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Theorem 3.4
  • Proposition 3.5
  • Proposition 3.6
  • Corollary 3.7
  • Lemma C.1
  • proof
  • proof : Proof of \ref{['prop:mmd_inimization_using_msip']}