Table of Contents
Fetching ...

Leveraging Nested MLMC for Sequential Neural Posterior Estimation with Intractable Likelihoods

Xiliang Yang, Yifei Xiong, Zhijian He

TL;DR

This paper tackles simulation-based Bayesian inference with intractable likelihoods by recasting automatic posterior transformation (APT) as a nested estimation problem and introducing multilevel Monte Carlo (MLMC) estimators to efficiently approximate the SNPE loss and its gradients. It develops three MLMC schemes—RU-MLMC (randomized), GRR-MLMC (generalized Russian roulette), and TGRR-MLMC (truncated GRR)—to balance bias, variance, and computational cost, and provides convergence guarantees for stochastic gradient descent under these estimators. The authors validate the approach on multimodal posteriors in moderate dimensions, demonstrating improved posterior accuracy (via MMD, C2ST, LMD, NLOG) while highlighting trade-offs in run time and memory. The work offers practical guidance on selecting MLMC variants under compute budgets and suggests future enhancements such as quasi-Monte Carlo techniques to further reduce variance and improve efficiency in high-dimensional settings.

Abstract

There is a growing interest in studying sequential neural posterior estimation (SNPE) techniques due to their advantages for simulation-based models with intractable likelihoods. The methods aim to learn the posterior from adaptively proposed simulations using neural network-based conditional density estimators. As an SNPE technique, the automatic posterior transformation (APT) method proposed by Greenberg et al. (2019) performs well and scales to high-dimensional data. However, the APT method requires computing the expectation of the logarithm of an intractable normalizing constant, i.e., a nested expectation. Although atomic proposals were used to render an analytical normalizing constant, it remains challenging to analyze the convergence of learning. In this paper, we reformulate APT as a nested estimation problem. Building on this, we construct several multilevel Monte Carlo (MLMC) estimators for the loss function and its gradients to accommodate different scenarios, including two unbiased estimators, and a biased estimator that trades a small bias for reduced variance and controlled runtime and memory usage. We also provide convergence results of stochastic gradient descent to quantify the interaction of the bias and variance of the gradient estimator. Numerical experiments for approximating complex posteriors with multimodality in moderate dimensions are provided to examine the effectiveness of the proposed methods.

Leveraging Nested MLMC for Sequential Neural Posterior Estimation with Intractable Likelihoods

TL;DR

This paper tackles simulation-based Bayesian inference with intractable likelihoods by recasting automatic posterior transformation (APT) as a nested estimation problem and introducing multilevel Monte Carlo (MLMC) estimators to efficiently approximate the SNPE loss and its gradients. It develops three MLMC schemes—RU-MLMC (randomized), GRR-MLMC (generalized Russian roulette), and TGRR-MLMC (truncated GRR)—to balance bias, variance, and computational cost, and provides convergence guarantees for stochastic gradient descent under these estimators. The authors validate the approach on multimodal posteriors in moderate dimensions, demonstrating improved posterior accuracy (via MMD, C2ST, LMD, NLOG) while highlighting trade-offs in run time and memory. The work offers practical guidance on selecting MLMC variants under compute budgets and suggests future enhancements such as quasi-Monte Carlo techniques to further reduce variance and improve efficiency in high-dimensional settings.

Abstract

There is a growing interest in studying sequential neural posterior estimation (SNPE) techniques due to their advantages for simulation-based models with intractable likelihoods. The methods aim to learn the posterior from adaptively proposed simulations using neural network-based conditional density estimators. As an SNPE technique, the automatic posterior transformation (APT) method proposed by Greenberg et al. (2019) performs well and scales to high-dimensional data. However, the APT method requires computing the expectation of the logarithm of an intractable normalizing constant, i.e., a nested expectation. Although atomic proposals were used to render an analytical normalizing constant, it remains challenging to analyze the convergence of learning. In this paper, we reformulate APT as a nested estimation problem. Building on this, we construct several multilevel Monte Carlo (MLMC) estimators for the loss function and its gradients to accommodate different scenarios, including two unbiased estimators, and a biased estimator that trades a small bias for reduced variance and controlled runtime and memory usage. We also provide convergence results of stochastic gradient descent to quantify the interaction of the bias and variance of the gradient estimator. Numerical experiments for approximating complex posteriors with multimodality in moderate dimensions are provided to examine the effectiveness of the proposed methods.
Paper Structure (19 sections, 5 theorems, 101 equations, 6 figures, 3 tables)

This paper contains 19 sections, 5 theorems, 101 equations, 6 figures, 3 tables.

Key Result

Theorem 3.1

If there exist $p,q >2$ with $(p-2)(q-2)\geq 4$ such that for any $\phi\in\Phi$, where $(\theta,x)\sim \tilde{p}(\theta)p(x|\theta)$, we have where $r_1 = \min(p(q-2)/(2q),2)\in(1,2]$.

Figures (6)

  • Figure 1: Left: Crude RU-MLMC after the third round. Right: The truncated RU-MLMC ($\overline{m}=4$) after the third round.
  • Figure 2: Approximated posterior for RU-MLMC, GRR-MLMC and TGRR-MLMCA. Two-Moon, from left to right: available ground truth, RU-MLMC, GRR-MLMC, and TGRR-MLMC. B. Lotka-Volterra, from left to right: ground truth simulated with SMC-ABC beaumont2009adaptive, RU-MLMC, GRR-MLMC, and TGRR-MLMC. C. M/G/1 queue model, the setting is the same as Lotka-Volterra. We show a scatter plot for each of the 2D subspaces. The histogram for each $\theta_{i},i=1,\dots,4$ is plotted on the diagonal with the ground truth parameter marked with dotted lines.
  • Figure 3: Performance of RU-MLMC, GRR-MLMC and TGRR-MLMCA. Two-Moon, B. Lotka-Volterra C. M/G/1 queue model, blue, green, and red correspond to RU-MLMC, GRR-MLMC, and TGRR-MLMC, respectively.
  • Figure 4: Approximated posterior for nested APT and atomic APT.A. Two-Moon model, from left to right: available ground truth, atomic APT with inner samples $M = 100$, nested APT with inner samples $M=100$. B. Lotka-Volterra model, from left to right: ground truth simulated with SMC-ABC beaumont2009adaptive, atomic APT with $M = 100$, nested APT with $M=100$. C. M/G/1 queue model, the settings are the same as Lotka-Volterra. We have plotted the scatter plot for each of the 2D subspaces. The histogram for each $\theta_{i},i=1,\dots,4$ is plotted on the diagonal with the ground truth parameter marked with dotted lines.
  • Figure 5: Ablation studies of $\alpha$ for TGRR-MLMC, where $\alpha=1.673$ is our proposed optimal value, $\alpha=1.8$ is the value that minimizes the expected cost.
  • ...and 1 more figures

Theorems & Definitions (9)

  • Theorem 3.1
  • proof
  • Theorem 3.2
  • Theorem 3.3
  • proof
  • Theorem 4.2
  • proof
  • Lemma A.1
  • proof : Proof of \ref{['theorem:var_delta_grad_query']}