Variational Representations of Annealing Paths: Bregman Information under Monotonic Embedding

Rob Brekelmans; Frank Nielsen

Variational Representations of Annealing Paths: Bregman Information under Monotonic Embedding

Rob Brekelmans, Frank Nielsen

TL;DR

This work generalizes the barycenter property of Bregman divergences to quasi-arithmetic means under monotone embeddings, enabling variational interpretations of annealing paths in MCMC. By embedding densities through a monotone representation $\rho$, the authors show that intermediate densities along an annealing path minimize a rho-tau Bregman divergence to endpoints, yielding a unifying view of $q$-paths and geometric paths within deformed exponential families. The framework connects to a wide range of divergences (KL, Jensen-Shannon, Amari's $\alpha$-divergence, Beta, and $(\alpha,\beta)$ families) and expresses $q$-paths as Bregman barycenters with a parametric interpretation via a one-parameter deformed exponential family. This perspective offers new variational characterizations of annealing in unnormalized-density settings and suggests broader applications to variational inference, prediction losses, and reinforcement learning, with future work on adaptive path design and $ ho$-deformed convex dualities.

Abstract

Markov Chain Monte Carlo methods for sampling from complex distributions and estimating normalization constants often simulate samples from a sequence of intermediate distributions along an annealing path, which bridges between a tractable initial distribution and a target density of interest. Prior works have constructed annealing paths using quasi-arithmetic means, and interpreted the resulting intermediate densities as minimizing an expected divergence to the endpoints. To analyze these variational representations of annealing paths, we extend known results showing that the arithmetic mean over arguments minimizes the expected Bregman divergence to a single representative point. In particular, we obtain an analogous result for quasi-arithmetic means, when the inputs to the Bregman divergence are transformed under a monotonic embedding function. Our analysis highlights the interplay between quasi-arithmetic means, parametric families, and divergence functionals using the rho-tau representational Bregman divergence framework, and associates common divergence functionals with intermediate densities along an annealing path.

Variational Representations of Annealing Paths: Bregman Information under Monotonic Embedding

TL;DR

, the authors show that intermediate densities along an annealing path minimize a rho-tau Bregman divergence to endpoints, yielding a unifying view of

-paths and geometric paths within deformed exponential families. The framework connects to a wide range of divergences (KL, Jensen-Shannon, Amari's

-divergence, Beta, and

families) and expresses

-paths as Bregman barycenters with a parametric interpretation via a one-parameter deformed exponential family. This perspective offers new variational characterizations of annealing in unnormalized-density settings and suggests broader applications to variational inference, prediction losses, and reinforcement learning, with future work on adaptive path design and

-deformed convex dualities.

Abstract

Paper Structure (15 sections, 59 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 59 equations, 2 figures, 1 table, 1 algorithm.

Introduction
Bregman Divergence under Monotonic Embedding
Quasi-Arithmetic Means
Rho-Tau Bregman Divergence
Main Result
Annealing Paths and Divergence Minimization
$q$-Paths from a Divergence Minimization Perspective
Rho-Tau Bregman Divergence
Rho-Tau Bregman Information
Parametric Interpretation using $q$-Exponential Family with Parameter $\beta$
Conclusion and Discussion
Annealed Importance Sampling
Proof of Theorem 1
Interpretations of \ref{['thm:breg_info']}(iii):
Rho-Tau Bregman Information with Vector-Valued Inputs

Figures (2)

Figure 1: Illustration of our main results. The choice of a monotonic representation function $\rho$, along with either a convex function $f$ or a second monotonic function $\tau$, specifies a Bregman divergence for $\beta = \{0,1\}$. The quasi-arithmetic path ${\tilde{\pi}}_{\beta}^{(\rho)}$ with mixing weight $\beta$ minimizes the expected Bregman divergence $D_{\Psi_f}[\rho({\tilde{\pi}}_a):\rho({\tilde{\pi}}_b)]$ to the endpoints $\{ {\tilde{\pi}}_0, {\tilde{\pi}}_1 \}$, where optimization is performed over the second argument and $\Psi_f[\rho({\tilde{\pi}})] \coloneqq \int f(\rho({\tilde{\pi}}({x})) d{x}$. The (scaled) value of this objective is called the Bregman Information $\mathcal{I}_{f,\rho}^{(\beta)}$, and associates to each intermediate density ${\tilde{\pi}}_{\beta}^{(\rho)}$ a divergence $D_{f,\rho}^{(\beta)}[{\tilde{\pi}}_0:{\tilde{\pi}}_1] = \frac{1}{\beta(1-\beta)} \mathcal{I}_{f,\rho}^{(\beta)}[\bm{{\tilde{\pi}}}, \bm{\beta}]$zhang2004divergence which compares ${\tilde{\pi}}_0$ and ${\tilde{\pi}}_1$ (\ref{['example:zhang_div']}). Finally, the quasi-arithmetic path in the $\rho$-representation is a geodesic with respect to the affine connection induced by the Bregman divergence $D_{\Psi_f}[\rho({\tilde{\pi}}_a):\rho({\tilde{\pi}}_b)]$ for any $\Psi_f$ (\ref{['thm:geodesic']}), which is also true for the $\tau$-representation, dual divergence, and dual connection.
Figure : Annealed Importance Sampling

Theorems & Definitions (1)

proof

Variational Representations of Annealing Paths: Bregman Information under Monotonic Embedding

TL;DR

Abstract

Variational Representations of Annealing Paths: Bregman Information under Monotonic Embedding

Authors

TL;DR

Abstract

Table of Contents

Figures (2)

Theorems & Definitions (1)