Table of Contents
Fetching ...

BayesFlow: Learning complex stochastic models with invertible neural networks

Stefan T. Radev, Ulf K. Mertens, Andreas Voss, Lynton Ardizzone, Ullrich Köthe

TL;DR

BayesFlow introduces a globally amortized Bayesian inference pipeline based on conditional invertible neural networks that map data to parameters learned from simulations. By coupling a summary network with a chain of affine coupling blocks, it achieves exact density estimation of $p(\boldsymbol{\theta}|\boldsymbol{x})$ without requiring handcrafted statistics and supports variable data sizes through learned summaries. The approach demonstrates strong accuracy, calibration, and posterior contraction across diverse forward models, including Ricker dynamics, Lévy-Flight decision making, SIR epidemiology, and Lotka–Volterra ecology, often outperforming existing amortized likelihood-free methods. With substantial speedups in inference after upfront training, BayesFlow offers a practical, scalable framework for rapid Bayesian parameter estimation in complex stochastic models. The work highlights the importance of learned summaries and invertible density estimation for robust, domain-agnostic amortized Bayesian inference on simulatable forward models.

Abstract

Estimating the parameters of mathematical models is a common problem in almost all branches of science. However, this problem can prove notably difficult when processes and model descriptions become increasingly complex and an explicit likelihood function is not available. With this work, we propose a novel method for globally amortized Bayesian inference based on invertible neural networks which we call BayesFlow. The method uses simulation to learn a global estimator for the probabilistic mapping from observed data to underlying model parameters. A neural network pre-trained in this way can then, without additional training or optimization, infer full posteriors on arbitrary many real datasets involving the same model family. In addition, our method incorporates a summary network trained to embed the observed data into maximally informative summary statistics. Learning summary statistics from data makes the method applicable to modeling scenarios where standard inference techniques with hand-crafted summary statistics fail. We demonstrate the utility of BayesFlow on challenging intractable models from population dynamics, epidemiology, cognitive science and ecology. We argue that BayesFlow provides a general framework for building amortized Bayesian parameter estimation machines for any forward model from which data can be simulated.

BayesFlow: Learning complex stochastic models with invertible neural networks

TL;DR

BayesFlow introduces a globally amortized Bayesian inference pipeline based on conditional invertible neural networks that map data to parameters learned from simulations. By coupling a summary network with a chain of affine coupling blocks, it achieves exact density estimation of without requiring handcrafted statistics and supports variable data sizes through learned summaries. The approach demonstrates strong accuracy, calibration, and posterior contraction across diverse forward models, including Ricker dynamics, Lévy-Flight decision making, SIR epidemiology, and Lotka–Volterra ecology, often outperforming existing amortized likelihood-free methods. With substantial speedups in inference after upfront training, BayesFlow offers a practical, scalable framework for rapid Bayesian parameter estimation in complex stochastic models. The work highlights the importance of learned summaries and invertible density estimation for robust, domain-agnostic amortized Bayesian inference on simulatable forward models.

Abstract

Estimating the parameters of mathematical models is a common problem in almost all branches of science. However, this problem can prove notably difficult when processes and model descriptions become increasingly complex and an explicit likelihood function is not available. With this work, we propose a novel method for globally amortized Bayesian inference based on invertible neural networks which we call BayesFlow. The method uses simulation to learn a global estimator for the probabilistic mapping from observed data to underlying model parameters. A neural network pre-trained in this way can then, without additional training or optimization, infer full posteriors on arbitrary many real datasets involving the same model family. In addition, our method incorporates a summary network trained to embed the observed data into maximally informative summary statistics. Learning summary statistics from data makes the method applicable to modeling scenarios where standard inference techniques with hand-crafted summary statistics fail. We demonstrate the utility of BayesFlow on challenging intractable models from population dynamics, epidemiology, cognitive science and ecology. We argue that BayesFlow provides a general framework for building amortized Bayesian parameter estimation machines for any forward model from which data can be simulated.

Paper Structure

This paper contains 22 sections, 2 theorems, 34 equations, 12 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

Assume that the cINN architecture and domain of $\phi$ are chosen such that $\widehat{\phi}$ is the global minimum of the objective in Eq.eq:15. Then, the latent output distribution will be statistically independent of the conditioning data, $p_{\boldsymbol{\hat{\phi}}}(\boldsymbol{z} \mid \boldsymb

Figures (12)

  • Figure 1: Graphical illustration of the main differences between case-based (neural) density estimation methods and BayesFlow. (a) Case-based methods require a separate optimization loop for each observed dataset from a given research domain. When case-based methods incorporate a training phase (e.g., APT), it must be repeated for each new dataset. Summary statistics are manually selected and may thus be sub-optimal; (b) BayesFlow incorporates a global upfront training (before any real data are collected) via simulations from the forward model (left panel). Summary and inference network are trained jointly, resulting in higher accuracy than hand-crafted summary statistics. In the inference phase (right panel), BayesFlow works entirely in a feed-forward manner, that is, no training or optimization happens in this phase. The upfront training effort is therefore amortized over arbitrary many observed datasets from a research domain working on the same model family. Note that the solid and dashed plates are swapped between case-based Bayesian inference and the training phase of BayesFlow.
  • Figure 2: Inference with pre-trained summary and inference networks. The posterior is approximated given real observed data via independent samples from a learned pushforward distribution. Thus, knowledge about the mapping between data and parameters (the inverse model) is compactly encoded within the weights of the two networks.
  • Figure 3: Results on the GMM toy example with colors indicating cluster assignments. Approximation of the multimodal posterior become closer to the ground truth distribution with increasing depth (number of ACBs) of the conditional invertible network.
  • Figure 4: Results on the Ricker model. (a) Approximate posteriors obtained by all implemented methods on a single Ricker dataset. Note that only BayesFlow and ABC-NN are able to approximate the uniform posterior of $u$; (b) NRMSE and $R^{2}$ performance metrics over all $T$s obtained by the BayesFlow method. We observe that parameter estimation remains good over all $T$s, and becomes progressively better as more data is available (shaded regions indicate bootstrap 95% CIs); (c) Parameter recovery with BayesFlow for the maximum number of generations used during training ($T=500$); (d) Posterior contraction in terms of posterior standard deviation for each parameter across increasing number of available generations (shaded regions indicate bootstrap 95% CIs).
  • Figure 5: Comparison results on the LFM model. (a) Marginal and bivariate posteriors obtained by BayesFlow and SMC-MMD on the single validation dataset. We observe markedly better sharpness in the BayesFlow posteriors; (b) Marginal posteriors obtained from all methods under comparison.
  • ...and 7 more figures

Theorems & Definitions (4)

  • Proposition 1
  • proof
  • Proposition 2
  • proof