Table of Contents
Fetching ...

Conditional diffusions for amortized neural posterior estimation

Tianyu Chen, Vansh Bansal, James G. Scott

TL;DR

This work addresses amortized Bayesian posterior estimation when the likelihood is intractable by introducing conditional diffusion decoders that are conditioned on learned data summaries. The authors prove a KL-divergence upper bound for jointly trained diffusion decoders and summary networks, and demonstrate via a comprehensive benchmark that diffusion-based decoders achieve higher stability and accuracy with faster training than normalizing flows across diverse problem classes and encoder architectures. They also provide three illustrative examples highlighting diffusion models' ability to recover complex, multimodal posteriors and boundary-transitions. The results suggest conditional diffusion with learned summaries offers a robust, scalable alternative for SBI, with practical implications for complex scientific applications where likelihoods are costly or unavailable.

Abstract

Neural posterior estimation (NPE), a simulation-based computational approach for Bayesian inference, has shown great success in approximating complex posterior distributions. Existing NPE methods typically rely on normalizing flows, which approximate a distribution by composing many simple, invertible transformations. But flow-based models, while state of the art for NPE, are known to suffer from several limitations, including training instability and sharp trade-offs between representational power and computational cost. In this work, we demonstrate the effectiveness of conditional diffusions coupled with high-capacity summary networks for amortized NPE. Conditional diffusions address many of the challenges faced by flow-based methods. Our results show that, across a highly varied suite of benchmarking problems for NPE architectures, diffusions offer improved stability, superior accuracy, and faster training times, even with simpler, shallower models. Building on prior work on diffusions for NPE, we show that these gains persist across a variety of different summary network architectures. Code is available at https://github.com/TianyuCodings/cDiff.

Conditional diffusions for amortized neural posterior estimation

TL;DR

This work addresses amortized Bayesian posterior estimation when the likelihood is intractable by introducing conditional diffusion decoders that are conditioned on learned data summaries. The authors prove a KL-divergence upper bound for jointly trained diffusion decoders and summary networks, and demonstrate via a comprehensive benchmark that diffusion-based decoders achieve higher stability and accuracy with faster training than normalizing flows across diverse problem classes and encoder architectures. They also provide three illustrative examples highlighting diffusion models' ability to recover complex, multimodal posteriors and boundary-transitions. The results suggest conditional diffusion with learned summaries offers a robust, scalable alternative for SBI, with practical implications for complex scientific applications where likelihoods are costly or unavailable.

Abstract

Neural posterior estimation (NPE), a simulation-based computational approach for Bayesian inference, has shown great success in approximating complex posterior distributions. Existing NPE methods typically rely on normalizing flows, which approximate a distribution by composing many simple, invertible transformations. But flow-based models, while state of the art for NPE, are known to suffer from several limitations, including training instability and sharp trade-offs between representational power and computational cost. In this work, we demonstrate the effectiveness of conditional diffusions coupled with high-capacity summary networks for amortized NPE. Conditional diffusions address many of the challenges faced by flow-based methods. Our results show that, across a highly varied suite of benchmarking problems for NPE architectures, diffusions offer improved stability, superior accuracy, and faster training times, even with simpler, shallower models. Building on prior work on diffusions for NPE, we show that these gains persist across a variety of different summary network architectures. Code is available at https://github.com/TianyuCodings/cDiff.

Paper Structure

This paper contains 90 sections, 3 theorems, 47 equations, 24 figures, 3 tables, 2 algorithms.

Key Result

Proposition 1

where $C$ is independent of $\phi$.

Figures (24)

  • Figure 1: Cosine example (above). We generate $10^5$ samples from both fitted posterior approximations (normalizing flows and diffusions), choosing $y=0$ for the observed data. The diffusion decoder is visibly better at capturing the undulating, multimodal character of the true posterior, a result confirmed by our experiments in Section \ref{['sec:results']}.
  • Figure 3: Dirichlet-multinomial example. We show the marginal of each method's estimated posterior for $\theta$ based on $N=315$ and $\hat{p}=(0.0667, 0.1600, 0.2367, 0.2200, 0.3167)$. Since the true posterior is a conjugate Dirichlet distribution whose marginals are beta distributions, the ground truth marginal (red line) is shown for reference.
  • Figure 4: Normalizing flows exhibited diverging ECP panel (A) and SBC (panel C) metrics on the fractional Brownian motion problem, despite apparent convergence of the training error (panel B). The difference between expected coverage and the reference coverage should converge to zero across all credibility levels with more training. This difference visibly diverges for normalizing flows (A), but tends toward zero for diffusions (D). This is confirmed by examining both models' SBC metrics (C) for a single parameter (in this case, the phase of the cosine drift). This metric should also go to 0 with more training. Qualitatively similar results were observed for the Lotka-Volterra model. More details are deferred to Appendix \ref{['app:evaluations']}.
  • Figure 5: SBC distance versus epochs for the Sum of Cosines problem.
  • Figure 6: SBC distance versus epochs for the Witch's hat problem.
  • ...and 19 more figures

Theorems & Definitions (9)

  • Example 1: Sum of cosines
  • Example 2: Witch's hat
  • Example 3: Dirichlet-multinomial
  • Proposition 1
  • proof : Proof Sketch
  • Proposition 2
  • Lemma 1
  • proof
  • proof : Complete Proof of Proposition \ref{['thm:diffusion_upperbound']}