On the Shape of Latent Variables in a Denoising VAE-MoG: A Posterior Sampling-Based Study
Fernanda Zapata Bascuñán
TL;DR
This paper tackles whether a denoising VAE with a mixture-of-Gaussians prior learns a latent space that faithfully reflects the data-generating process when trained on gravitational-wave signals. It combines a CNN-based DVAE with a $D=256$ latent space and posterior sampling via Hamiltonian Monte Carlo conditioned on clean inputs, contrasted against encoder outputs from noisy data. Using a BGMM prior and KS tests across latent dimensions, the authors find a pronounced mismatch in latent-space structure despite effective denoising, highlighting the need for posterior-based validation in evaluating generative models for scientific data. The work underscores that reconstruction quality alone is insufficient to ensure latent interpretability and reliability, with implications for how such models are validated and deployed in gravitational-wave analysis and other noisy scientific domains.
Abstract
In this work, we explore the latent space of a denoising variational autoencoder with a mixture-of-Gaussians prior (VAE-MoG), trained on gravitational wave data from event GW150914. To evaluate how well the model captures the underlying structure, we use Hamiltonian Monte Carlo (HMC) to draw posterior samples conditioned on clean inputs, and compare them to the encoder's outputs from noisy data. Although the model reconstructs signals accurately, statistical comparisons reveal a clear mismatch in the latent space. This shows that strong denoising performance doesn't necessarily mean the latent representations are reliable highlighting the importance of using posterior-based validation when evaluating generative models.
