Matching aggregate posteriors in the variational autoencoder
Surojit Saha, Sarang Joshi, Ross Whitaker
TL;DR
This work tackles the VAE’s failure to match the aggregate posterior to the prior, which creates holes in latent space and can cause posterior collapse. It introduces the Aggregate Variational Autoencoder (AVAE), which models the aggregate posterior $q_{\boldsymbol\phi}(\mathbf{z})$ with kernel density estimates and preserves the standard ELBO structure, avoiding extra regularization terms. A KDE bandwidth estimation strategy tailored to high-dimensional latent spaces and a data-driven schedule for updating the regularization weight $\beta$ enable effective training without hyperparameter tuning. Empirically, AVAE achieves superior data distribution modeling (lower FID, higher precision/recall) and higher latent-space entropy across MNIST, CelebA, and CIFAR-10, demonstrating robust handling of holes and posterior collapse. The approach suggests a practical, principled path for exact aggregate-posterior matching in high-dimensional latent spaces with strong generative performance.
Abstract
The variational autoencoder (VAE) is a well-studied, deep, latent-variable model (DLVM) that efficiently optimizes the variational lower bound of the log marginal data likelihood and has a strong theoretical foundation. However, the VAE's known failure to match the aggregate posterior often results in \emph{pockets/holes} in the latent distribution (i.e., a failure to match the prior) and/or \emph{posterior collapse}, which is associated with a loss of information in the latent space. This paper addresses these shortcomings in VAEs by reformulating the objective function associated with VAEs in order to match the aggregate/marginal posterior distribution to the prior. We use kernel density estimate (KDE) to model the aggregate posterior in high dimensions. The proposed method is named the \emph{aggregate variational autoencoder} (AVAE) and is built on the theoretical framework of the VAE. Empirical evaluation of the proposed method on multiple benchmark data sets demonstrates the effectiveness of the AVAE relative to state-of-the-art (SOTA) methods.
