Leveraging Nested MLMC for Sequential Neural Posterior Estimation with Intractable Likelihoods
Xiliang Yang, Yifei Xiong, Zhijian He
TL;DR
This paper tackles simulation-based Bayesian inference with intractable likelihoods by recasting automatic posterior transformation (APT) as a nested estimation problem and introducing multilevel Monte Carlo (MLMC) estimators to efficiently approximate the SNPE loss and its gradients. It develops three MLMC schemes—RU-MLMC (randomized), GRR-MLMC (generalized Russian roulette), and TGRR-MLMC (truncated GRR)—to balance bias, variance, and computational cost, and provides convergence guarantees for stochastic gradient descent under these estimators. The authors validate the approach on multimodal posteriors in moderate dimensions, demonstrating improved posterior accuracy (via MMD, C2ST, LMD, NLOG) while highlighting trade-offs in run time and memory. The work offers practical guidance on selecting MLMC variants under compute budgets and suggests future enhancements such as quasi-Monte Carlo techniques to further reduce variance and improve efficiency in high-dimensional settings.
Abstract
There is a growing interest in studying sequential neural posterior estimation (SNPE) techniques due to their advantages for simulation-based models with intractable likelihoods. The methods aim to learn the posterior from adaptively proposed simulations using neural network-based conditional density estimators. As an SNPE technique, the automatic posterior transformation (APT) method proposed by Greenberg et al. (2019) performs well and scales to high-dimensional data. However, the APT method requires computing the expectation of the logarithm of an intractable normalizing constant, i.e., a nested expectation. Although atomic proposals were used to render an analytical normalizing constant, it remains challenging to analyze the convergence of learning. In this paper, we reformulate APT as a nested estimation problem. Building on this, we construct several multilevel Monte Carlo (MLMC) estimators for the loss function and its gradients to accommodate different scenarios, including two unbiased estimators, and a biased estimator that trades a small bias for reduced variance and controlled runtime and memory usage. We also provide convergence results of stochastic gradient descent to quantify the interaction of the bias and variance of the gradient estimator. Numerical experiments for approximating complex posteriors with multimodality in moderate dimensions are provided to examine the effectiveness of the proposed methods.
