Table of Contents
Fetching ...

Stick-Breaking Mixture Normalizing Flows with Component-Wise Tail Adaptation for Variational Inference

Seungsu Han, Juyoung Hwang, Won Chang

TL;DR

The paper tackles posterior inference for highly multimodal and heavy-tailed distributions by replacing the standard Gaussian base in normalizing-flow variational inference with a stick-breaking mixture base, thereby reducing mode-seeking bias. It introduces a Monte Carlo tail-index estimator to guide per-component tail adaptivity and develops component-wise Tail Transform Flows to calibrate tails while preserving exact density evaluation. Empirical results on synthetic targets and real wind-speed data show that StiCTAF achieves near-MCMC accuracy in both bulk structure and tails, with superior forward KL and tail calibration compared to baselines. This approach enables more faithful and efficient posterior inference in complex Bayesian models where multimodality and heavy tails are prominent.

Abstract

Normalizing flows with a Gaussian base provide a computationally efficient way to approximate posterior distributions in Bayesian inference, but they often struggle to capture complex posteriors with multimodality and heavy tails. We propose a stick-breaking mixture base with component-wise tail adaptation (StiCTAF) for posterior approximation. The method first learns a flexible mixture base to mitigate the mode-seeking bias of reverse KL divergence through a weighted average of component-wise ELBOs. It then estimates local tail indices of unnormalized densities and finally refines each mixture component using a shared backbone combined with component-specific tail transforms calibrated by the estimated indices. This design enables accurate mode coverage and anisotropic tail modeling while retaining exact density evaluation and stable optimization. Experiments on synthetic posteriors demonstrate improved tail recovery and better coverage of multiple modes compared to benchmark models. We also present a real-data analysis illustrating the practical benefits of our approach for posterior inference.

Stick-Breaking Mixture Normalizing Flows with Component-Wise Tail Adaptation for Variational Inference

TL;DR

The paper tackles posterior inference for highly multimodal and heavy-tailed distributions by replacing the standard Gaussian base in normalizing-flow variational inference with a stick-breaking mixture base, thereby reducing mode-seeking bias. It introduces a Monte Carlo tail-index estimator to guide per-component tail adaptivity and develops component-wise Tail Transform Flows to calibrate tails while preserving exact density evaluation. Empirical results on synthetic targets and real wind-speed data show that StiCTAF achieves near-MCMC accuracy in both bulk structure and tails, with superior forward KL and tail calibration compared to baselines. This approach enables more faithful and efficient posterior inference in complex Bayesian models where multimodality and heavy tails are prominent.

Abstract

Normalizing flows with a Gaussian base provide a computationally efficient way to approximate posterior distributions in Bayesian inference, but they often struggle to capture complex posteriors with multimodality and heavy tails. We propose a stick-breaking mixture base with component-wise tail adaptation (StiCTAF) for posterior approximation. The method first learns a flexible mixture base to mitigate the mode-seeking bias of reverse KL divergence through a weighted average of component-wise ELBOs. It then estimates local tail indices of unnormalized densities and finally refines each mixture component using a shared backbone combined with component-specific tail transforms calibrated by the estimated indices. This design enables accurate mode coverage and anisotropic tail modeling while retaining exact density evaluation and stable optimization. Experiments on synthetic posteriors demonstrate improved tail recovery and better coverage of multiple modes compared to benchmark models. We also present a real-data analysis illustrating the practical benefits of our approach for posterior inference.

Paper Structure

This paper contains 51 sections, 10 theorems, 111 equations, 6 figures, 5 tables.

Key Result

Theorem 2.1

Let $X$ be a random vector and let $f:\mathbb{R}^d\!\to\!\mathbb{R}^d$ be a bi-Lipschitz bijective map (i.e. $f$ and $f^{-1}$ are globally Lipschitz). If $X \in \mathcal{E}^p_\alpha$, then $f(X) \in \mathcal{E}^p_{\tilde{\alpha}}$ for some $\tilde{\alpha}>0$. In addition, if $X \in \mathcal{L}^p_\al

Figures (6)

  • Figure 1: Normal $\times$ Inverse-Gamma Target: Each panel compares the model and target distributions using Monte Carlo samples of size $10^4$. The dotted lines indicate the $0.1\%$ and $99.9\%$ marginal percentiles for $\beta$, and the $99.9\%$ percentile for $\sigma^2$. From left to right: NF (Gaussian), ATAF, and StiCTAF.
  • Figure 2: Complex Multimodal Target: Each panel compares the model and target distributions using Monte Carlo samples of size $2 \times 10^4$. The curves along the top and right margins show the univariate marginal densities. From left to right: NF (Gaussian), NF (Gaussian Mixture), and StiCTAF.
  • Figure 3: Estimated posteriors for two parameters from the real data analysis: Panel (a) shows $\varepsilon^{(\eta)}_{2}$ and panel (b) shows $\alpha^{*}_{4}$. Insets display the left 5% tail density. The black curve represents the MCMC reference, and the red curve corresponds to StiCTAF. Baselines include normalizing flows with Gaussian and Gaussian mixture bases, as well as TAF, gTAF, gTAF mixture, and ATAF.
  • Figure 4: Normal $\times$ Inverse-Gamma Target: Full comparison with benchmark methods using samples of size $10^4$ per model; dotted lines indicate the $0.1\%$/$99.9\%$ marginal percentiles.
  • Figure 5: Complex Multimodal Target: Full comparison with benchmark methods using samples of size $10^4$ per model; curves along the top and right margins show the marginal densities.
  • ...and 1 more figures

Theorems & Definitions (21)

  • Definition 2.1: Tail classes
  • Definition 2.2: Directional tail index
  • Theorem 2.1: liang2022fat
  • Theorem 2.2: Tail dominance
  • Theorem 3.1: Consistency of the Directional Tail-Index Estimator
  • Corollary 3.1
  • Theorem A.1: liang2022fat
  • proof : Proof.
  • Theorem A.2: Tail dominance
  • proof : Proof.
  • ...and 11 more