Historical Consensus: Preventing Posterior Collapse via Iterative Selection of Gaussian Mixture Priors

Zegu Zhang; Jian Zhang

Historical Consensus: Preventing Posterior Collapse via Iterative Selection of Gaussian Mixture Priors

Zegu Zhang, Jian Zhang

TL;DR

This paper introduces Historical Consensus Training, an iterative selection procedure that progressively refines a set of candidate GMM priors through alternating optimization and selection that eliminates the possibility of collapse altogether by leveraging the multiplicity of Gaussian mixture model clusterings.

Abstract

Variational autoencoders (VAEs) frequently suffer from posterior collapse, where latent variables become uninformative and the approximate posterior degenerates to the prior. Recent work has characterized this phenomenon as a phase transition governed by the spectral properties of the data covariance matrix. In this paper, we propose a fundamentally different approach: instead of avoiding collapse through architectural constraints or hyperparameter tuning, we eliminate the possibility of collapse altogether by leveraging the multiplicity of Gaussian mixture model (GMM) clusterings. We introduce Historical Consensus Training, an iterative selection procedure that progressively refines a set of candidate GMM priors through alternating optimization and selection. The key insight is that models trained to satisfy multiple distinct clustering constraints develop a historical barrier -- a region in parameter space that remains stable even when subsequently trained with a single objective. We prove that this barrier excludes the collapsed solution, and demonstrate through extensive experiments on synthetic and real-world datasets that our method achieves non-collapsed representations regardless of decoder variance or regularization strength. Our approach requires no explicit stability conditions (e.g., $σ^{\prime 2} < λ_{\max}$) and works with arbitrary neural architectures. The code is available at https://github.com/tsegoochang/historical-consensus-vae.

Historical Consensus: Preventing Posterior Collapse via Iterative Selection of Gaussian Mixture Priors

TL;DR

Abstract

) and works with arbitrary neural architectures. The code is available at https://github.com/tsegoochang/historical-consensus-vae.

Paper Structure (52 sections, 3 theorems, 11 equations, 8 figures, 3 tables, 2 algorithms)

This paper contains 52 sections, 3 theorems, 11 equations, 8 figures, 3 tables, 2 algorithms.

Introduction
Related Work
Posterior Collapse in VAEs
Gaussian Mixture VAEs
Multi-Task and Continual Learning
Preliminaries
Variational Autoencoders
Posterior Collapse as Phase Transition
Gaussian Mixture Models
GMM-Conditioned VAE Training
Method: Historical Consensus Training
Motivation: Multiplicity as a Resource
The Selection Pipeline
Stage 1: Power-of-Two Selection
Stage 2: Consensus Refinement
...and 37 more sections

Key Result

Lemma 4.3

The feasible regions are nested: $\mathcal{F}_{T} \subset \mathcal{F}_{T-1} \subset \cdots \subset \mathcal{F}_0$.

Figures (8)

Figure 1: Overview of Historical Consensus Training. (1) Run EM multiple times to obtain diverse clusterings. (2) Iteratively train the VAE with all current clusterings and retain the best half. (3) Refine with the final two clusterings to ultra-low loss. (4) Train with a single clustering to verify non-collapse.
Figure 2: Verification of historical barrier. Left: Loss on discarded clusterings remains low throughout training, indicating memory. Right: Distance to collapsed solution increases as training progresses.
Figure 3: KL divergence trajectories across datasets. Our method maintains high KL throughout training, demonstrating effective prevention of posterior collapse.
Figure 4: Ablation studies. Left: Effect of initial number of clusterings $R_0$. Right: Effect of refinement threshold $\epsilon$.
Figure 5: KL divergence under violating conditions ($\sigma'^2 > \lambda_{\max}$). Our method remains non-collapsed while vanilla VAE collapses.
...and 3 more figures

Theorems & Definitions (8)

Definition 4.1: Historical Loss
Definition 4.2: Feasible Region
Lemma 4.3: Nested Feasible Regions
proof
Theorem 4.4: Exclusion of Collapsed Solutions
proof
Corollary 4.5: Historical Inertia
proof : Detailed proof of Theorem 1

Historical Consensus: Preventing Posterior Collapse via Iterative Selection of Gaussian Mixture Priors

TL;DR

Abstract

Historical Consensus: Preventing Posterior Collapse via Iterative Selection of Gaussian Mixture Priors

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (8)