Table of Contents
Fetching ...

Historical Consensus: Preventing Posterior Collapse via Iterative Selection of Gaussian Mixture Priors

Zegu Zhang, Jian Zhang

TL;DR

This paper introduces Historical Consensus Training, an iterative selection procedure that progressively refines a set of candidate GMM priors through alternating optimization and selection that eliminates the possibility of collapse altogether by leveraging the multiplicity of Gaussian mixture model clusterings.

Abstract

Variational autoencoders (VAEs) frequently suffer from posterior collapse, where latent variables become uninformative and the approximate posterior degenerates to the prior. Recent work has characterized this phenomenon as a phase transition governed by the spectral properties of the data covariance matrix. In this paper, we propose a fundamentally different approach: instead of avoiding collapse through architectural constraints or hyperparameter tuning, we eliminate the possibility of collapse altogether by leveraging the multiplicity of Gaussian mixture model (GMM) clusterings. We introduce Historical Consensus Training, an iterative selection procedure that progressively refines a set of candidate GMM priors through alternating optimization and selection. The key insight is that models trained to satisfy multiple distinct clustering constraints develop a historical barrier -- a region in parameter space that remains stable even when subsequently trained with a single objective. We prove that this barrier excludes the collapsed solution, and demonstrate through extensive experiments on synthetic and real-world datasets that our method achieves non-collapsed representations regardless of decoder variance or regularization strength. Our approach requires no explicit stability conditions (e.g., $σ^{\prime 2} < λ_{\max}$) and works with arbitrary neural architectures. The code is available at https://github.com/tsegoochang/historical-consensus-vae.

Historical Consensus: Preventing Posterior Collapse via Iterative Selection of Gaussian Mixture Priors

TL;DR

This paper introduces Historical Consensus Training, an iterative selection procedure that progressively refines a set of candidate GMM priors through alternating optimization and selection that eliminates the possibility of collapse altogether by leveraging the multiplicity of Gaussian mixture model clusterings.

Abstract

Variational autoencoders (VAEs) frequently suffer from posterior collapse, where latent variables become uninformative and the approximate posterior degenerates to the prior. Recent work has characterized this phenomenon as a phase transition governed by the spectral properties of the data covariance matrix. In this paper, we propose a fundamentally different approach: instead of avoiding collapse through architectural constraints or hyperparameter tuning, we eliminate the possibility of collapse altogether by leveraging the multiplicity of Gaussian mixture model (GMM) clusterings. We introduce Historical Consensus Training, an iterative selection procedure that progressively refines a set of candidate GMM priors through alternating optimization and selection. The key insight is that models trained to satisfy multiple distinct clustering constraints develop a historical barrier -- a region in parameter space that remains stable even when subsequently trained with a single objective. We prove that this barrier excludes the collapsed solution, and demonstrate through extensive experiments on synthetic and real-world datasets that our method achieves non-collapsed representations regardless of decoder variance or regularization strength. Our approach requires no explicit stability conditions (e.g., ) and works with arbitrary neural architectures. The code is available at https://github.com/tsegoochang/historical-consensus-vae.
Paper Structure (52 sections, 3 theorems, 11 equations, 8 figures, 3 tables, 2 algorithms)

This paper contains 52 sections, 3 theorems, 11 equations, 8 figures, 3 tables, 2 algorithms.

Key Result

Lemma 4.3

The feasible regions are nested: $\mathcal{F}_{T} \subset \mathcal{F}_{T-1} \subset \cdots \subset \mathcal{F}_0$.

Figures (8)

  • Figure 1: Overview of Historical Consensus Training. (1) Run EM multiple times to obtain diverse clusterings. (2) Iteratively train the VAE with all current clusterings and retain the best half. (3) Refine with the final two clusterings to ultra-low loss. (4) Train with a single clustering to verify non-collapse.
  • Figure 2: Verification of historical barrier. Left: Loss on discarded clusterings remains low throughout training, indicating memory. Right: Distance to collapsed solution increases as training progresses.
  • Figure 3: KL divergence trajectories across datasets. Our method maintains high KL throughout training, demonstrating effective prevention of posterior collapse.
  • Figure 4: Ablation studies. Left: Effect of initial number of clusterings $R_0$. Right: Effect of refinement threshold $\epsilon$.
  • Figure 5: KL divergence under violating conditions ($\sigma'^2 > \lambda_{\max}$). Our method remains non-collapsed while vanilla VAE collapses.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Definition 4.1: Historical Loss
  • Definition 4.2: Feasible Region
  • Lemma 4.3: Nested Feasible Regions
  • proof
  • Theorem 4.4: Exclusion of Collapsed Solutions
  • proof
  • Corollary 4.5: Historical Inertia
  • proof : Detailed proof of Theorem 1