Matching aggregate posteriors in the variational autoencoder

Surojit Saha; Sarang Joshi; Ross Whitaker

Matching aggregate posteriors in the variational autoencoder

Surojit Saha, Sarang Joshi, Ross Whitaker

TL;DR

This work tackles the VAE’s failure to match the aggregate posterior to the prior, which creates holes in latent space and can cause posterior collapse. It introduces the Aggregate Variational Autoencoder (AVAE), which models the aggregate posterior $q_{\boldsymbol\phi}(\mathbf{z})$ with kernel density estimates and preserves the standard ELBO structure, avoiding extra regularization terms. A KDE bandwidth estimation strategy tailored to high-dimensional latent spaces and a data-driven schedule for updating the regularization weight $\beta$ enable effective training without hyperparameter tuning. Empirically, AVAE achieves superior data distribution modeling (lower FID, higher precision/recall) and higher latent-space entropy across MNIST, CelebA, and CIFAR-10, demonstrating robust handling of holes and posterior collapse. The approach suggests a practical, principled path for exact aggregate-posterior matching in high-dimensional latent spaces with strong generative performance.

Abstract

The variational autoencoder (VAE) is a well-studied, deep, latent-variable model (DLVM) that efficiently optimizes the variational lower bound of the log marginal data likelihood and has a strong theoretical foundation. However, the VAE's known failure to match the aggregate posterior often results in \emph{pockets/holes} in the latent distribution (i.e., a failure to match the prior) and/or \emph{posterior collapse}, which is associated with a loss of information in the latent space. This paper addresses these shortcomings in VAEs by reformulating the objective function associated with VAEs in order to match the aggregate/marginal posterior distribution to the prior. We use kernel density estimate (KDE) to model the aggregate posterior in high dimensions. The proposed method is named the \emph{aggregate variational autoencoder} (AVAE) and is built on the theoretical framework of the VAE. Empirical evaluation of the proposed method on multiple benchmark data sets demonstrates the effectiveness of the AVAE relative to state-of-the-art (SOTA) methods.

Matching aggregate posteriors in the variational autoencoder

TL;DR

with kernel density estimates and preserves the standard ELBO structure, avoiding extra regularization terms. A KDE bandwidth estimation strategy tailored to high-dimensional latent spaces and a data-driven schedule for updating the regularization weight

enable effective training without hyperparameter tuning. Empirically, AVAE achieves superior data distribution modeling (lower FID, higher precision/recall) and higher latent-space entropy across MNIST, CelebA, and CIFAR-10, demonstrating robust handling of holes and posterior collapse. The approach suggests a practical, principled path for exact aggregate-posterior matching in high-dimensional latent spaces with strong generative performance.

Abstract

Paper Structure (16 sections, 8 equations, 1 figure, 4 tables, 1 algorithm)

This paper contains 16 sections, 8 equations, 1 figure, 4 tables, 1 algorithm.

Introduction
Related Work
Method
Background
Aggregate Variational Autoencoder (AVAE)
Training:
Estimation of $\beta$:
Properties of the Aggregate Posterior of the AVAE
KDE Bandwidth Estimate
Experiments
Experimental Setup
Results
Evaluation of the Model Data Distribution
Entropy of the Aggregate Posterior Distribution
Ablation Study
...and 1 more sections

Figures (1)

Figure 1: The metric multidimensional scaling (mMDS) metric_MDS plot in 2D of the latent representations ($\mathcal{Z} \in \mathbb{R}{}^{16}$) produced by the VAE kingma2014auto, $\beta$-TCVAE TC_VAE_Neurips_2019 and the AVAE (proposed method) on the MNIST dataset MNIST. Samples from the target distribution, $\mathcal{N}\left(\mathbf{0},\mathbf{I}\right)$, are used as the ground truth. The regions of low probability and unwanted aggregation of data points in different parts of the latent space of the VAE and $\beta$-TCVAE clearly show the mismatch with the ground truth. The AVAE closely matches the target distribution corroborated by empirical evaluations.

Matching aggregate posteriors in the variational autoencoder

TL;DR

Abstract

Matching aggregate posteriors in the variational autoencoder

Authors

TL;DR

Abstract

Table of Contents

Figures (1)