Amortized Bayesian Mixture Models

Šimon Kucharský; Paul Christian Bürkner

Amortized Bayesian Mixture Models

Šimon Kucharský, Paul Christian Bürkner

TL;DR

This work addresses fast, joint Bayesian inference for finite mixture models in settings where likelihoods are intractable. It extends Amortized Bayesian Inference by factorizing the posterior into $p( heta|y)$ and $p(z|y, heta)$ and training neural surrogates to estimate both continuous parameters and discrete mixture indicators. Using normalizing flows for parameter posteriors and classification networks for mixture memberships, the approach supports both independent and dependent mixtures with filtering and smoothing, trained end-to-end on simulated data. Case studies across synthetic Gaussian mixtures, Gaussian HMMs, and cognitive-switch data demonstrate posterior and classification results that closely match Stan/MCMC while offering substantial speedups, with the BayesFlow implementation publicly available.

Abstract

Finite mixtures are a broad class of models useful in scenarios where observed data is generated by multiple distinct processes but without explicit information about the responsible process for each data point. Estimating Bayesian mixture models is computationally challenging due to issues such as high-dimensional posterior inference and label switching. Furthermore, traditional methods such as MCMC are applicable only if the likelihoods for each mixture component are analytically tractable. Amortized Bayesian Inference (ABI) is a simulation-based framework for estimating Bayesian models using generative neural networks. This allows the fitting of models without explicit likelihoods, and provides fast inference. ABI is therefore an attractive framework for estimating mixture models. This paper introduces a novel extension of ABI tailored to mixture models. We factorize the posterior into a distribution of the parameters and a distribution of (categorical) mixture indicators, which allows us to use a combination of generative neural networks for parameter inference, and classification networks for mixture membership identification. The proposed framework accommodates both independent and dependent mixture models, enabling filtering and smoothing. We validate and demonstrate our approach through synthetic and real-world datasets.

Amortized Bayesian Mixture Models

TL;DR

This work addresses fast, joint Bayesian inference for finite mixture models in settings where likelihoods are intractable. It extends Amortized Bayesian Inference by factorizing the posterior into

and

and training neural surrogates to estimate both continuous parameters and discrete mixture indicators. Using normalizing flows for parameter posteriors and classification networks for mixture memberships, the approach supports both independent and dependent mixtures with filtering and smoothing, trained end-to-end on simulated data. Case studies across synthetic Gaussian mixtures, Gaussian HMMs, and cognitive-switch data demonstrate posterior and classification results that closely match Stan/MCMC while offering substantial speedups, with the BayesFlow implementation publicly available.

Abstract

Paper Structure (14 sections, 30 equations, 17 figures, 1 table)

This paper contains 14 sections, 30 equations, 17 figures, 1 table.

Introduction
Methods
Bayesian mixture models
Simulation-Based Inference
Amortized Bayesian Inference
Neural estimation of Mixture Models
Alternative factorizations
Case studies
Evaluation
Parameter constraints
Case Study 1: Gaussian mixture model
Case Study 2: Gaussian hidden Markov model
Case Study 3: Latent switches in cognitive processing
Conclusion

Figures (17)

Figure 1: Examples of dependencies between observational units in mixture models. (a) Exchangeable observational units permit factorizing the distribution of mixture indicators as $p(z \mid y, \theta) = \prod_{i=1}^N p(z_i \mid y_i, \theta)$. For non-exchangeable observational units such as in (b) and (c), local decoding is used to factorize the joint distribution; filtering as $p(z \mid y, \theta) = \prod_{i=1}^N p(z_i \mid y_1, \dots, y_i, \theta)$, or smoothing as $p(z \mid y, \theta) = \prod_{i=1}^N p(z_i \mid y_1, \dots, y_N, \theta)$. Figure inspired by burkner_efficient_2021.
Figure 2: Schematic representation of training amortized mixture models. The boxes highlighted in gray represent the inputs (i.e., the training set) sampled from the Bayesian generative model. The observations $y_i$ are individually passed through the local summary network. For parameter posterior training, the complete set of local summaries is further passed through the global summary network. The global summary is passed together with the true parameters $\theta$ to the posterior network to obtain the loss from Eq. \ref{['eq:loss_npe']}. For classification training, the local summaries are concatenated with the true parameters $\theta$, and together passed with the true mixture indicators $z_i$ to the classification network, to obtain the loss from Eq. \ref{['eq:loss_foward']} (or in case of separate forward and backward networks, Eq. \ref{['eq:loss_foward_backward']}). Combining the two losses results in the joint loss in Eq. \ref{['eq:loss_combined']}. The objective of training is to minimize the total loss by optimizing network weights $\phi, \psi, \omega, \alpha$.
Figure 3: Schematic representation of the use of amortized mixture models for inference. The observations $y_i$ are individually passed through the local summary network. For parameter posterior inference, the complete set of local summaries is further passed through the global summary network. The global summary is passed to the posterior network to generate samples $\theta^{(s)}$ from the approximate posterior distribution. For classification, the local summaries are concatenated with the parameter samples $\theta^{(s)}$ and passed through the classification network to obtain the approximate mixture membership probabilities. If desired, the mixture indicators $z_i$ can be sampled from this approximate distribution.
Figure 4: Case Study 1: Gaussian mixture model. Simulation-based calibration results displayed as difference between the empirical and expected cumulative distribution function of the fractional rank statistic. The shaded area corresponds to the 95% Confidence bands.
Figure 5: Case Study 1: Gaussian mixture model. Parameter recovery shown as scatter plot between the true data generating parameter values and the estimated parameter values. The point estimates are the median, whereas the lines depict the 95% central credible interval.
...and 12 more figures

Amortized Bayesian Mixture Models

TL;DR

Abstract

Amortized Bayesian Mixture Models

Authors

TL;DR

Abstract

Table of Contents

Figures (17)