Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network
Yuri Kinoshita, Kenta Oono, Kenji Fukumizu, Yuichi Yoshida, Shin-ichi Maeda
TL;DR
The paper tackles posterior collapse in variational autoencoders by enforcing an inverse Lipschitz constraint on the decoder, realized via Brenier maps and Input Convex Neural Networks to produce an L-inverse Lipschitz function. It proves a theoretical bound showing the posterior–prior discrepancy, measured by the relative Fisher information divergence, grows with the square of the inverse-Lipschitz constant, enabling controllable latent identifiability and collapse avoidance. The authors introduce IL-LIDVAE and its variants, demonstrate controllable posterior collapse and improved performance on toy data, images, and text, and offer an annealing strategy to ease tuning. While increasing L imposes a trade-off with model flexibility and computational cost, the approach provides a simple, general, and theoretically grounded mechanism to mitigate posterior collapse across a broad class of VAEs with practical benefits for representation learning.
Abstract
Variational autoencoders (VAEs) are one of the deep generative models that have experienced enormous success over the past decades. However, in practice, they suffer from a problem called posterior collapse, which occurs when the encoder coincides, or collapses, with the prior taking no information from the latent structure of the input data into consideration. In this work, we introduce an inverse Lipschitz neural network into the decoder and, based on this architecture, provide a new method that can control in a simple and clear manner the degree of posterior collapse for a wide range of VAE models equipped with a concrete theoretical guarantee. We also illustrate the effectiveness of our method through several numerical experiments.
