Table of Contents
Fetching ...

Variational Representations of Annealing Paths: Bregman Information under Monotonic Embedding

Rob Brekelmans, Frank Nielsen

TL;DR

This work generalizes the barycenter property of Bregman divergences to quasi-arithmetic means under monotone embeddings, enabling variational interpretations of annealing paths in MCMC. By embedding densities through a monotone representation $\rho$, the authors show that intermediate densities along an annealing path minimize a rho-tau Bregman divergence to endpoints, yielding a unifying view of $q$-paths and geometric paths within deformed exponential families. The framework connects to a wide range of divergences (KL, Jensen-Shannon, Amari's $\alpha$-divergence, Beta, and $(\alpha,\beta)$ families) and expresses $q$-paths as Bregman barycenters with a parametric interpretation via a one-parameter deformed exponential family. This perspective offers new variational characterizations of annealing in unnormalized-density settings and suggests broader applications to variational inference, prediction losses, and reinforcement learning, with future work on adaptive path design and $ ho$-deformed convex dualities.

Abstract

Markov Chain Monte Carlo methods for sampling from complex distributions and estimating normalization constants often simulate samples from a sequence of intermediate distributions along an annealing path, which bridges between a tractable initial distribution and a target density of interest. Prior works have constructed annealing paths using quasi-arithmetic means, and interpreted the resulting intermediate densities as minimizing an expected divergence to the endpoints. To analyze these variational representations of annealing paths, we extend known results showing that the arithmetic mean over arguments minimizes the expected Bregman divergence to a single representative point. In particular, we obtain an analogous result for quasi-arithmetic means, when the inputs to the Bregman divergence are transformed under a monotonic embedding function. Our analysis highlights the interplay between quasi-arithmetic means, parametric families, and divergence functionals using the rho-tau representational Bregman divergence framework, and associates common divergence functionals with intermediate densities along an annealing path.

Variational Representations of Annealing Paths: Bregman Information under Monotonic Embedding

TL;DR

This work generalizes the barycenter property of Bregman divergences to quasi-arithmetic means under monotone embeddings, enabling variational interpretations of annealing paths in MCMC. By embedding densities through a monotone representation , the authors show that intermediate densities along an annealing path minimize a rho-tau Bregman divergence to endpoints, yielding a unifying view of -paths and geometric paths within deformed exponential families. The framework connects to a wide range of divergences (KL, Jensen-Shannon, Amari's -divergence, Beta, and families) and expresses -paths as Bregman barycenters with a parametric interpretation via a one-parameter deformed exponential family. This perspective offers new variational characterizations of annealing in unnormalized-density settings and suggests broader applications to variational inference, prediction losses, and reinforcement learning, with future work on adaptive path design and -deformed convex dualities.

Abstract

Markov Chain Monte Carlo methods for sampling from complex distributions and estimating normalization constants often simulate samples from a sequence of intermediate distributions along an annealing path, which bridges between a tractable initial distribution and a target density of interest. Prior works have constructed annealing paths using quasi-arithmetic means, and interpreted the resulting intermediate densities as minimizing an expected divergence to the endpoints. To analyze these variational representations of annealing paths, we extend known results showing that the arithmetic mean over arguments minimizes the expected Bregman divergence to a single representative point. In particular, we obtain an analogous result for quasi-arithmetic means, when the inputs to the Bregman divergence are transformed under a monotonic embedding function. Our analysis highlights the interplay between quasi-arithmetic means, parametric families, and divergence functionals using the rho-tau representational Bregman divergence framework, and associates common divergence functionals with intermediate densities along an annealing path.
Paper Structure (15 sections, 59 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 59 equations, 2 figures, 1 table, 1 algorithm.

Figures (2)

  • Figure 1: Illustration of our main results. The choice of a monotonic representation function $\rho$, along with either a convex function $f$ or a second monotonic function $\tau$, specifies a Bregman divergence for $\beta = \{0,1\}$. The quasi-arithmetic path ${\tilde{\pi}}_{\beta}^{(\rho)}$ with mixing weight $\beta$ minimizes the expected Bregman divergence $D_{\Psi_f}[\rho({\tilde{\pi}}_a):\rho({\tilde{\pi}}_b)]$ to the endpoints $\{ {\tilde{\pi}}_0, {\tilde{\pi}}_1 \}$, where optimization is performed over the second argument and $\Psi_f[\rho({\tilde{\pi}})] \coloneqq \int f(\rho({\tilde{\pi}}({x})) d{x}$. The (scaled) value of this objective is called the Bregman Information $\mathcal{I}_{f,\rho}^{(\beta)}$, and associates to each intermediate density ${\tilde{\pi}}_{\beta}^{(\rho)}$ a divergence $D_{f,\rho}^{(\beta)}[{\tilde{\pi}}_0:{\tilde{\pi}}_1] = \frac{1}{\beta(1-\beta)} \mathcal{I}_{f,\rho}^{(\beta)}[\bm{{\tilde{\pi}}}, \bm{\beta}]$zhang2004divergence which compares ${\tilde{\pi}}_0$ and ${\tilde{\pi}}_1$ (\ref{['example:zhang_div']}). Finally, the quasi-arithmetic path in the $\rho$-representation is a geodesic with respect to the affine connection induced by the Bregman divergence $D_{\Psi_f}[\rho({\tilde{\pi}}_a):\rho({\tilde{\pi}}_b)]$ for any $\Psi_f$ (\ref{['thm:geodesic']}), which is also true for the $\tau$-representation, dual divergence, and dual connection.
  • Figure : Annealed Importance Sampling

Theorems & Definitions (1)

  • proof