Table of Contents
Fetching ...

Accelerating Particle-based Energetic Variational Inference

Xuelian Bao, Lulu Kang, Chun Liu, Yiwei Wang

TL;DR

This work targets efficient particle-based variational inference (ParVI) by accelerating the Energetic Variational Inference (EVI) framework. It introduces ImEQ, an implicit scheme that applies energy quadratization to only part of the objective, reducing the frequency of inter-particle term evaluations while preserving energy stability in a modified energy. The method decomposes the objective into $F=G+H$ with $G$ bounded below and quadratizes $G$ via an auxiliary variable $r=q(z)$, yielding a coupled, well-conditioned update involving a positive-definite operator ${oldsymbol{B}}^n$ and a single inner optimization per time step. Numerical experiments across toy distributions, Bayesian logistic regression, and Bayesian neural networks demonstrate that ImEQ achieves comparable accuracy to EVI-Im with substantially lower CPU time, and generally outperforms AEGD and AdaGrad in robustness and efficiency. The results suggest ImEQ as a practical, extensible approach for accelerating gradient-based sampling in high-dimensional, interacting-particle settings.

Abstract

In this work, we propose a novel particle-based variational inference (ParVI) method that accelerates the EVI-Im. Inspired by energy quadratization (EQ) and operator splitting techniques for gradient flows, our approach efficiently drives particles towards the target distribution. Unlike EVI-Im, which employs the implicit Euler method to solve variational-preserving particle dynamics for minimizing the KL divergence, derived using a "discretize-then-variational" approach, the proposed algorithm avoids repeated evaluation of inter-particle interaction terms, significantly reducing computational cost. The framework is also extensible to other gradient-based sampling techniques. Through several numerical experiments, we demonstrate that our method outperforms existing ParVI approaches in efficiency, robustness, and accuracy.

Accelerating Particle-based Energetic Variational Inference

TL;DR

This work targets efficient particle-based variational inference (ParVI) by accelerating the Energetic Variational Inference (EVI) framework. It introduces ImEQ, an implicit scheme that applies energy quadratization to only part of the objective, reducing the frequency of inter-particle term evaluations while preserving energy stability in a modified energy. The method decomposes the objective into with bounded below and quadratizes via an auxiliary variable , yielding a coupled, well-conditioned update involving a positive-definite operator and a single inner optimization per time step. Numerical experiments across toy distributions, Bayesian logistic regression, and Bayesian neural networks demonstrate that ImEQ achieves comparable accuracy to EVI-Im with substantially lower CPU time, and generally outperforms AEGD and AdaGrad in robustness and efficiency. The results suggest ImEQ as a practical, extensible approach for accelerating gradient-based sampling in high-dimensional, interacting-particle settings.

Abstract

In this work, we propose a novel particle-based variational inference (ParVI) method that accelerates the EVI-Im. Inspired by energy quadratization (EQ) and operator splitting techniques for gradient flows, our approach efficiently drives particles towards the target distribution. Unlike EVI-Im, which employs the implicit Euler method to solve variational-preserving particle dynamics for minimizing the KL divergence, derived using a "discretize-then-variational" approach, the proposed algorithm avoids repeated evaluation of inter-particle interaction terms, significantly reducing computational cost. The framework is also extensible to other gradient-based sampling techniques. Through several numerical experiments, we demonstrate that our method outperforms existing ParVI approaches in efficiency, robustness, and accuracy.

Paper Structure

This paper contains 10 sections, 2 theorems, 46 equations, 5 figures, 2 tables.

Key Result

Proposition 2.1

The numerical scheme SAV satisfies the following energy stability: with $\tilde{F}^n = (r^{n})^2$.

Figures (5)

  • Figure 1: "Double-banana" (a), "Star" (b) and "Eight-component" (c) cases: particles obtained by the ImEQ method after 200 iterations (left); plot of MMD$^2$ (middle) and KL divergence (right) with respect to CPU time for different methods. For AdaGrad and EVI-Im methods, ${\rm lr}=0.1$ in all cases. In the case of ImEQ method, ${\rm lr}=0.01$ for "Double-banana" and "Star" cases, while ${\rm lr}=0.1$ for "Eight-component" case. For AEGD method, ${\rm lr}=0.001$ for "Double-banana" case, ${\rm lr}=0.01$ for "Star" case and ${\rm lr}=0.1$ for "Eight-component" case.
  • Figure 2: (a): Particles obtained by the ImEQ method after 200 iterations with ${\rm lr}=0.1$ (up) and the AEGD method after 2000 iterations with ${\rm lr}=0.1$ (bottom). (b): KL divergence with respect to CPU time for different learning rates for the ImEQ. (c): KL divergence with respect to CPU time for different learning rates for the AEGD.
  • Figure 3: "Star" case with the initial distribution set as a Gaussian distribution with a nonzero mean. (a)-(b): Particles obtained by AdaGrad and AEGD at iterations 500, 1000, 2000, and 5000 (from left to right). (c)-(d): Particles obtained by EVI-Im and ImEQ at iterations 20, 100, 200, and 500 (from left to right). (e)-(f): Plots of MMD$^2$ and KL divergence as functions of CPU time for different methods.
  • Figure 4: The train log likelihood and test accuracy of the "Diabetes" (a), "Image" (b) and "Covertype" (c) datasets returned by different methods.
  • Figure 5: Boxplot of RMSE (left) and predictive Log-likelihood (right) for different datasets: (a) "Yacht Hydrodynamics", (b) "Boston Housing", and (c) "Concrete Data".

Theorems & Definitions (10)

  • Proposition 2.1
  • Proof 1
  • Remark 2.2
  • Remark 3.1
  • Proposition 3.2
  • Proof 2
  • Remark 3.3
  • Remark 3.4
  • Remark 3.5
  • Remark 4.1