Table of Contents
Fetching ...

Adaptive Heterogeneous Mixtures of Normalising Flows for Robust Variational Inference

Benjamin Wiriyapong, Oktay Karakuş, Kirill Sidorov

TL;DR

The paper tackles brittle multimodal posterior inference in variational methods by proposing Adaptive Mixture Flow Variational Inference (AMF-VI), a two-stage framework that combines heterogeneous normalising flows (MAF, RealNVP, RBIG) with likelihood-driven moving-average weights on fresh data. AMF-VI trains diverse experts independently, then adapts their mixture weights without per-sample gating, effectively performing a data-driven Bayesian model averaging over architectural priors. Across six canonical 2D posterior families, AMF-VI achieves consistently lower negative log-likelihood ($NLL$) and robust transport and discrepancy metrics (e.g., $W_2$, MMD, and $KL(p||q)$) while maintaining non-collapsed, interpretable weight allocations ($N_{eff} \in [2.1,2.99]$). This approach provides a practical, architecture-agnostic path to robust multimodal variational inference that preserves each expert's inductive bias with minimal training overhead.

Abstract

Normalising-flow variational inference (VI) can approximate complex posteriors, yet single-flow models often behave inconsistently across qualitatively different distributions. We propose Adaptive Mixture Flow Variational Inference (AMF-VI), a heterogeneous mixture of complementary flows (MAF, RealNVP, RBIG) trained in two stages: (i) sequential expert training of individual flows, and (ii) adaptive global weight estimation via likelihood-driven updates, without per-sample gating or architectural changes. Evaluated on six canonical posterior families of banana, X-shape, two-moons, rings, a bimodal, and a five-mode mixture, AMF-VI achieves consistently lower negative log-likelihood than each single-flow baseline and delivers stable gains in transport metrics (Wasserstein-2) and maximum mean discrepancy (MDD), indicating improved robustness across shapes and modalities. The procedure is efficient and architecture-agnostic, incurring minimal overhead relative to standard flow training, and demonstrates that adaptive mixtures of diverse flows provide a reliable route to robust VI across diverse posterior families whilst preserving each expert's inductive bias.

Adaptive Heterogeneous Mixtures of Normalising Flows for Robust Variational Inference

TL;DR

The paper tackles brittle multimodal posterior inference in variational methods by proposing Adaptive Mixture Flow Variational Inference (AMF-VI), a two-stage framework that combines heterogeneous normalising flows (MAF, RealNVP, RBIG) with likelihood-driven moving-average weights on fresh data. AMF-VI trains diverse experts independently, then adapts their mixture weights without per-sample gating, effectively performing a data-driven Bayesian model averaging over architectural priors. Across six canonical 2D posterior families, AMF-VI achieves consistently lower negative log-likelihood () and robust transport and discrepancy metrics (e.g., , MMD, and ) while maintaining non-collapsed, interpretable weight allocations (). This approach provides a practical, architecture-agnostic path to robust multimodal variational inference that preserves each expert's inductive bias with minimal training overhead.

Abstract

Normalising-flow variational inference (VI) can approximate complex posteriors, yet single-flow models often behave inconsistently across qualitatively different distributions. We propose Adaptive Mixture Flow Variational Inference (AMF-VI), a heterogeneous mixture of complementary flows (MAF, RealNVP, RBIG) trained in two stages: (i) sequential expert training of individual flows, and (ii) adaptive global weight estimation via likelihood-driven updates, without per-sample gating or architectural changes. Evaluated on six canonical posterior families of banana, X-shape, two-moons, rings, a bimodal, and a five-mode mixture, AMF-VI achieves consistently lower negative log-likelihood than each single-flow baseline and delivers stable gains in transport metrics (Wasserstein-2) and maximum mean discrepancy (MDD), indicating improved robustness across shapes and modalities. The procedure is efficient and architecture-agnostic, incurring minimal overhead relative to standard flow training, and demonstrates that adaptive mixtures of diverse flows provide a reliable route to robust VI across diverse posterior families whilst preserving each expert's inductive bias.

Paper Structure

This paper contains 19 sections, 6 equations, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: Learned mixture weights per dataset for AMF-VI across the three flow components (RealNVP, MAF, RBIG).
  • Figure 2: Qualitative comparison on three posterior families, (TOP) X-shaped, (MIDDLE) Bimodal, and (BOTTOM) Rings. Each subfigure shows (left$\to$right) true data and samples from RealNVP, MAF, RBIG, and AMF-VI.