Table of Contents
Fetching ...

Path-Guided Particle-based Sampling

Mingzhou Fan, Ruida Zhou, Chao Tian, Xiaoning Qian

TL;DR

This work tackles multimodal posterior inference in Bayesian settings by introducing Path-Guided Particle-based Sampling (PGPS), which steers particles along a partition-free density path from an initial distribution to the target using a neural network to learn the guiding vector field. The core novelty is the Log-weighted Shrinkage (LwS) density path, which enables efficient mode discovery and coverage of the target distribution, coupled with a PDE-inspired training objective for the vector field. The authors prove a Wasserstein-distance bound between PGPS outputs and the target that decomposes into approximation and discretization errors, and demonstrate improved mode search, weight recovery, and calibration over SVGD, Langevin dynamics, and related methods in synthetic Gaussian mixtures and Bayesian neural network tasks, including UCI benchmarks and noisy MNIST. They also discuss a training-free variant based on Langevin steps and outline future directions for density-path design and convergence analysis. Overall, PGPS provides a principled, path-guided alternative to traditional gradient-flow samplers for efficient, multimodal Bayesian inference with practical gains in accuracy and uncertainty calibration.

Abstract

Particle-based Bayesian inference methods by sampling from a partition-free target (posterior) distribution, e.g., Stein variational gradient descent (SVGD), have attracted significant attention. We propose a path-guided particle-based sampling~(PGPS) method based on a novel Log-weighted Shrinkage (LwS) density path linking an initial distribution to the target distribution. We propose to utilize a Neural network to learn a vector field motivated by the Fokker-Planck equation of the designed density path. Particles, initiated from the initial distribution, evolve according to the ordinary differential equation defined by the vector field. The distribution of these particles is guided along a density path from the initial distribution to the target distribution. The proposed LwS density path allows for an efficient search of modes of the target distribution while canonical methods fail. We theoretically analyze the Wasserstein distance of the distribution of the PGPS-generated samples and the target distribution due to approximation and discretization errors. Practically, the proposed PGPS-LwS method demonstrates higher Bayesian inference accuracy and better calibration ability in experiments conducted on both synthetic and real-world Bayesian learning tasks, compared to baselines, such as SVGD and Langevin dynamics, etc.

Path-Guided Particle-based Sampling

TL;DR

This work tackles multimodal posterior inference in Bayesian settings by introducing Path-Guided Particle-based Sampling (PGPS), which steers particles along a partition-free density path from an initial distribution to the target using a neural network to learn the guiding vector field. The core novelty is the Log-weighted Shrinkage (LwS) density path, which enables efficient mode discovery and coverage of the target distribution, coupled with a PDE-inspired training objective for the vector field. The authors prove a Wasserstein-distance bound between PGPS outputs and the target that decomposes into approximation and discretization errors, and demonstrate improved mode search, weight recovery, and calibration over SVGD, Langevin dynamics, and related methods in synthetic Gaussian mixtures and Bayesian neural network tasks, including UCI benchmarks and noisy MNIST. They also discuss a training-free variant based on Langevin steps and outline future directions for density-path design and convergence analysis. Overall, PGPS provides a principled, path-guided alternative to traditional gradient-flow samplers for efficient, multimodal Bayesian inference with practical gains in accuracy and uncertainty calibration.

Abstract

Particle-based Bayesian inference methods by sampling from a partition-free target (posterior) distribution, e.g., Stein variational gradient descent (SVGD), have attracted significant attention. We propose a path-guided particle-based sampling~(PGPS) method based on a novel Log-weighted Shrinkage (LwS) density path linking an initial distribution to the target distribution. We propose to utilize a Neural network to learn a vector field motivated by the Fokker-Planck equation of the designed density path. Particles, initiated from the initial distribution, evolve according to the ordinary differential equation defined by the vector field. The distribution of these particles is guided along a density path from the initial distribution to the target distribution. The proposed LwS density path allows for an efficient search of modes of the target distribution while canonical methods fail. We theoretically analyze the Wasserstein distance of the distribution of the PGPS-generated samples and the target distribution due to approximation and discretization errors. Practically, the proposed PGPS-LwS method demonstrates higher Bayesian inference accuracy and better calibration ability in experiments conducted on both synthetic and real-world Bayesian learning tasks, compared to baselines, such as SVGD and Langevin dynamics, etc.

Paper Structure

This paper contains 34 sections, 8 theorems, 41 equations, 5 figures, 4 tables, 3 algorithms.

Key Result

Proposition 3.1

For a given partition-free density path $\{\hat{p}_t\}$, the gradient flow guided by the vector field $\boldsymbol{\phi}_t({\mathbf{x}})$ following the continuity equation eq:continuity satisfies: where $r({\bm{x}}, \boldsymbol{\phi}_t) = \frac{\partial \ln\hat{p}_t({\bm{x}})}{\partial t} + (\nabla \ln{\hat{p}_t}({\bm{x}}) + \nabla) \cdot \boldsymbol{\phi}_t({\bm{x}})$.

Figures (5)

  • Figure 1: An illustration of the effectiveness of PGPS over LD in handling mode-missing.
  • Figure 2: Different Log-weighted Shrinkage paths from the initial (left) to target (right) distribution with different hyper-parameters. (A):$\alpha=0, \beta=1$ (blue); (B):$\alpha=1, \beta=0.5$ (orange); (C):$\alpha=0.2, \beta=0.5$ (green).
  • Figure 3: The performances of different methods: (a, c) $\text{score}_1$ and $\text{score}_2$ indicating the mode capture ability with the true score illustrated by the red dashed line; (b, d) KDE estimated probability distributions for different methods. The letter following PGPS indicates different hyperparameters. (A): $\alpha=0$, $\beta = 1$, $\text{steps} = 0$ (B): $\alpha=1$, $\beta = 0.8$, $\text{steps} = 0$ (C): $\alpha=0$, $\beta = 1$, $\text{steps} = 10$ (D): $\alpha=1$, $\beta = 0.8$, $\text{steps} = 10$, where '$\text{steps}$' indicates the number of performed Langevin Adjustment steps. We report the performance of PGPS with $\psi\in\{0.5, 0.1, 0.05, 0.01\}$.
  • Figure 4: The weight mismatch error. The letter after PGPS indicates different hyperparameters. (A): $\alpha=0$, $\beta = 1$, $\text{steps} = 0$ (B): $\alpha=0$, $\beta = 0.5$, $\text{steps} = 0$ (C): $\alpha=0$, $\beta = 1$, $\text{steps} = 100$ (D): $\alpha=0$, $\beta = 0.5$, $\text{steps} = 100$, where '$\text{steps}$' is the number the Langevin Adjustment steps.
  • Figure 5: Particles for tf-PGPS following the LwS-path with different hyperparameters discretized with a constant time step of $0.01$, $30$ LD steps for each intermediate distribution. With the same computational demand, the hyperparameter choices can influence the sample quality. The no-shrinkage setup $(\alpha=0, \beta=1)$ leads to the worst performance and the hyperparameter choices that incorporate shrinkage capture much better the mode on the right.

Theorems & Definitions (12)

  • Proposition 3.1
  • Theorem 4.2
  • Proposition 4.4
  • Proposition 4.1
  • proof
  • Lemma 4.2: Proposition 3 of albergo2023building
  • Lemma 4.3
  • proof
  • Theorem 4.4
  • proof
  • ...and 2 more