Stein Boltzmann Sampling: A Variational Approach for Global Optimization

Gaëtan Serré; Argyris Kalogeratos; Nicolas Vayatis

Stein Boltzmann Sampling: A Variational Approach for Global Optimization

Gaëtan Serré, Argyris Kalogeratos, Nicolas Vayatis

TL;DR

The paper introduces Stein Boltzmann Sampling (SBS), a flow-based approach for global optimization of continuous Sobolev functions that uses Stein Variational Gradient Descent to move an initial uniform particle set toward a Boltzmann target $m^{(\kappa)}(x) \propto e^{-\kappa f(x)}$ on a compact domain $\Omega$. By extending SVGD theory to BD targets on $\Omega$, the authors prove weak convergence of the particle flow to the target distribution and establish SBS as asymptotically convergent to the global minimum as $\kappa$, the number of particles $N$, and the step size $\varepsilon$ grow appropriately; they also relate the KS discrepancy to the optimization objective. They present two practical variants, SBS-PF and SBS-HYBRID, to improve budget efficiency and to combine SBS with other optimization methods, respectively. Empirical results on standard benchmark functions show SBS variants outperform many state-of-the-art methods in average performance and offer favorable accuracy-budget trade-offs, with SBS-PF achieving substantial budget reductions and SBS-HYBRID often delivering top performance in practice.

Abstract

In this paper, we present a flow-based method for global optimization of continuous Sobolev functions, called Stein Boltzmann Sampling (SBS). SBS initializes uniformly a number of particles representing candidate solutions, then uses the Stein Variational Gradient Descent (SVGD) algorithm to sequentially and deterministically move those particles in order to approximate a target distribution whose mass is concentrated around promising areas of the domain of the optimized function. The target is chosen to be a properly parametrized Boltzmann distribution. For the purpose of global optimization, we adapt the generic SVGD theoretical framework allowing to address more general target distributions over a compact subset of $\mathbb{R}^d$, and we prove SBS's asymptotic convergence. In addition to the main SBS algorithm, we present two variants: the SBS-PF that includes a particle filtering strategy, and the SBS-HYBRID one that uses SBS or SBS-PF as a continuation after other particle- or distribution-based optimization methods. A detailed comparison with state-of-the-art methods on benchmark functions demonstrates that SBS and its variants are highly competitive, while the combination of the two variants provides the best trade-off between accuracy and computational cost.

Stein Boltzmann Sampling: A Variational Approach for Global Optimization

TL;DR

on a compact domain

. By extending SVGD theory to BD targets on

, the authors prove weak convergence of the particle flow to the target distribution and establish SBS as asymptotically convergent to the global minimum as

, the number of particles

, and the step size

grow appropriately; they also relate the KS discrepancy to the optimization objective. They present two practical variants, SBS-PF and SBS-HYBRID, to improve budget efficiency and to combine SBS with other optimization methods, respectively. Empirical results on standard benchmark functions show SBS variants outperform many state-of-the-art methods in average performance and offer favorable accuracy-budget trade-offs, with SBS-PF achieving substantial budget reductions and SBS-HYBRID often delivering top performance in practice.

Abstract

, and we prove SBS's asymptotic convergence. In addition to the main SBS algorithm, we present two variants: the SBS-PF that includes a particle filtering strategy, and the SBS-HYBRID one that uses SBS or SBS-PF as a continuation after other particle- or distribution-based optimization methods. A detailed comparison with state-of-the-art methods on benchmark functions demonstrates that SBS and its variants are highly competitive, while the combination of the two variants provides the best trade-off between accuracy and computational cost.

Paper Structure (26 sections, 12 theorems, 90 equations, 5 figures, 3 tables, 2 algorithms)

This paper contains 26 sections, 12 theorems, 90 equations, 5 figures, 3 tables, 2 algorithms.

Introduction
Stein Boltzmann Sampling
Theory of SBS
SBS variants
Choice of hyperparameters
Experimental evaluation
Discussion
Conclusion
Theoretical foundations
Boltzmann distribution
Stein Variational Gradient Descent
Definitions
Stein discrepancy
Kernelized Stein Discrepancy
Proofs
...and 11 more sections

Key Result

Theorem 3.1

Let $\mu, \pi \in \mathcal{P}_2(\Omega)$. Let $(T_t)_{0 \leq t} : \Omega \to \Omega$ be a locally Lipschitz family of diffeomorphisms, representing the trajectories associated with the vector field $\phi^\star_{\mu_t}$ (see eq:svgd-update), such that $T_0 = I_d$. Let $\mu_t = {T_t}_\#\mu$. Then, $\m

Figures (5)

Figure 1: Illustration of the flow of measures and the trajectories of particles over the iterations. The color gradient represents the $2$d Ackley function value, from blue (low) to red (high). The trajectories draw the discretized flow of measures. a)sbs: the particles are initialized uniformly at random over the domain, and then get updated by making a small step in the direction induced by svgd forces. b)sbs-pf variant with particle filtering: the particles are initialized and updated as before, but the less promising ones get rapidly removed and are not replaced. This is visible as there are less persisting trajectories in areas where the function has high values. This strategy results in a significant reduction of the budget while having comparable performance.
Figure 2: a) The density of the Boltzmann distribution $m^{(\kappa)}$ (\ref{['def:boltzmann-distribution']}) (blue lines) becomes uniform over the set of minimizers $X^*$ of the given function $f$ to optimize ( black lines), as its parameter $\kappa$ tends to infinity. b) In this example, the volume of the set $X^*$ is much smaller than the volume of local minimizers in the flat region. The value of the function at the local minimizers is also closer to the value of the global ones. Setting $\kappa$ to $100$ does not suffice to concentrate the majority of the mass of $m^{(\kappa)}$ around the global minimizers.
Figure 3: Illustration of the vector field induced by \ref{['eq:svgd-diff-eq']} in a discrete-time setting where $\pi$ is the BD. a) The optimized function $x \mapsto \sin \lVert x\rVert_2$ and the two manifolds at which it is minimized (dashed gray lines). b) The initial particles (not shown) start getting attracted toward the two ring-shaped manifolds. c) After some svgd iterations, there are stronger forces in the vector field and the particles get concentrated around those minimizing regions.
Figure 4: Illustration of the exploration/exploitation trade-off in sbs with different values of $\sigma$. In black, the function $x \mapsto \mathrm{cos}(5x) + x/5 + 1$; in grey, the distribution of the particles; in blue, the BD $m^{(\kappa)}$. When $\sigma$ is too small, the particles are uniformly distributed over $X^*$. When $\sigma$ is too large, they are uniformly distributed over the whole domain $\Omega$.
Figure 5: Insights for the compared algorithms: a) shows the low impact of $\kappa$ on the performance of sbs. b) shows the time to run for bayesopt and adalipo grows exponentially and is significantly higher than for cma-es or sbs. In each case (a) and (b), the left plot is for the Himmelblau, and the right is for the Levy function.

Theorems & Definitions (30)

Definition 2.1: Continuous Boltzmann distribution
Theorem 3.1: Weak convergence of svgd
Lemma 3.2: KSD valid discrepancy
Lemma 3.3: Unique fixed point
Theorem 3.4: sbs asymptotic convergence
Definition A.2: Stein class of measures Liu2016Kernel
Lemma A.3: Stein identity Stein1972
Definition A.4: Product RKHS Liu2016
Definition A.5: Kernelized Stein Discrepancy Liu2016Kernel
Theorem A.6: Steepest trajectory Liu2016Kernel
...and 20 more

Stein Boltzmann Sampling: A Variational Approach for Global Optimization

TL;DR

Abstract

Stein Boltzmann Sampling: A Variational Approach for Global Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (30)