Table of Contents
Fetching ...

Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond

Wei Guo, Molei Tao, Yongxin Chen

TL;DR

This work provides a non-asymptotic complexity analysis for estimating normalizing constants $Z$ of unnormalized densities $\pi\propto e^{-V}$ using annealing-based methods such as the Jarzynski equality and annealed importance sampling, introducing an action-based framework that avoids strong isoperimetric assumptions. The authors derive a concrete oracle complexity bound $\widetilde{O}\left(\frac{d\beta^2\mathcal{A}^2}{\varepsilon^4}\right)$ tied to the curve’s action $\mathcal{A}$ and show a JE-based time bound $T=\mathcal{O}(\mathcal{A}/\varepsilon^2)$ to achieve $\varepsilon$-relative accuracy with high probability. Building on these insights, they establish a first non-asymptotic AIS complexity bound (with a geometric interpolation) and demonstrate that large actions hinder AIS performance, motivating a diffusion-based alternative via reverse diffusion samplers (RDS) with tractable action bounds. The paper also provides a framework and empirical evidence showing RDS can substantially improve multimodal sampling and normalizing constant estimation over AIS in challenging settings. Overall, the results offer finite-sample guarantees and practical algorithms for estimating partition functions in high-dimensional, multimodal landscapes without strong log-concavity assumptions, with broad implications for Bayesian model evidence, free-energy computations, and energy-based modeling.

Abstract

Given an unnormalized probability density $π\propto\mathrm{e}^{-V}$, estimating its normalizing constant $Z=\int_{\mathbb{R}^d}\mathrm{e}^{-V(x)}\mathrm{d}x$ or free energy $F=-\log Z$ is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning. It is challenging especially in high dimensions or when $π$ is multimodal. To mitigate the high variance of conventional importance sampling estimators, annealing-based methods such as Jarzynski equality and annealed importance sampling are commonly adopted, yet their quantitative complexity guarantees remain largely unexplored. We take a first step toward a non-asymptotic analysis of annealed importance sampling. In particular, we derive an oracle complexity of $\widetilde{O}\left(\frac{dβ^2{\mathcal{A}}^2}{\varepsilon^4}\right)$ for estimating $Z$ within $\varepsilon$ relative error with high probability, where $β$ is the smoothness of $V$ and $\mathcal{A}$ denotes the action of a curve of probability measures interpolating $π$ and a tractable reference distribution. Our analysis, leveraging Girsanov theorem and optimal transport, does not explicitly require isoperimetric assumptions on the target distribution. Finally, to tackle the large action of the widely used geometric interpolation, we propose a new algorithm based on reverse diffusion samplers, establish a framework for analyzing its complexity, and empirically demonstrate its efficiency in tackling multimodality.

Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond

TL;DR

This work provides a non-asymptotic complexity analysis for estimating normalizing constants of unnormalized densities using annealing-based methods such as the Jarzynski equality and annealed importance sampling, introducing an action-based framework that avoids strong isoperimetric assumptions. The authors derive a concrete oracle complexity bound tied to the curve’s action and show a JE-based time bound to achieve -relative accuracy with high probability. Building on these insights, they establish a first non-asymptotic AIS complexity bound (with a geometric interpolation) and demonstrate that large actions hinder AIS performance, motivating a diffusion-based alternative via reverse diffusion samplers (RDS) with tractable action bounds. The paper also provides a framework and empirical evidence showing RDS can substantially improve multimodal sampling and normalizing constant estimation over AIS in challenging settings. Overall, the results offer finite-sample guarantees and practical algorithms for estimating partition functions in high-dimensional, multimodal landscapes without strong log-concavity assumptions, with broad implications for Bayesian model evidence, free-energy computations, and energy-based modeling.

Abstract

Given an unnormalized probability density , estimating its normalizing constant or free energy is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning. It is challenging especially in high dimensions or when is multimodal. To mitigate the high variance of conventional importance sampling estimators, annealing-based methods such as Jarzynski equality and annealed importance sampling are commonly adopted, yet their quantitative complexity guarantees remain largely unexplored. We take a first step toward a non-asymptotic analysis of annealed importance sampling. In particular, we derive an oracle complexity of for estimating within relative error with high probability, where is the smoothness of and denotes the action of a curve of probability measures interpolating and a tractable reference distribution. Our analysis, leveraging Girsanov theorem and optimal transport, does not explicitly require isoperimetric assumptions on the target distribution. Finally, to tackle the large action of the widely used geometric interpolation, we propose a new algorithm based on reverse diffusion samplers, establish a framework for analyzing its complexity, and empirically demonstrate its efficiency in tackling multimodality.

Paper Structure

This paper contains 49 sections, 26 theorems, 185 equations, 3 figures, 1 table, 2 algorithms.

Key Result

Lemma 1

For any vector field $v=(v_t)_{t\in[a,b]}$ on $\mathbb{R}^d$ that generates an absolute continuous curve of probability measures $\rho=(\rho_t)_{t\in[a,b]}$ (i.e., the continuity equation$\partial_t\rho_t+\nabla\cdot(\rho_tv_t)=0$, $\forall t\in[a,b]$ holds), we have $|\dot\rho|_t\le\|v_t\|_{L^2(\rh

Figures (3)

  • Figure 1: Illustration of the proof idea for \ref{['thm:ais_complexity']}.
  • Figure 2: Visualization of the samples from the modified Müller Brown distribution. The generated samples are displayed on top of the level curves of the potential energy surface (darker color corresponds to lower potential energy, i.e., higher probability density).
  • Figure 3: Visualization of the samples from the Gaussian mixture distribution. The generated samples are displayed on top of the level curves of the potential (darker color corresponds to lower potential, i.e., higher probability density).

Theorems & Definitions (60)

  • Remark 1
  • Lemma 1: Informal version of \ref{['lem:metric']}
  • Remark 2
  • Theorem 1: Jarzynski equality jarzynski1997nonequilibrium
  • Theorem 2
  • Theorem 3: Annealed importance sampling equality neal2001annealed
  • proof
  • Theorem 4
  • Proposition 1
  • Proposition 2
  • ...and 50 more