Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network

Yuri Kinoshita; Kenta Oono; Kenji Fukumizu; Yuichi Yoshida; Shin-ichi Maeda

Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network

Yuri Kinoshita, Kenta Oono, Kenji Fukumizu, Yuichi Yoshida, Shin-ichi Maeda

TL;DR

The paper tackles posterior collapse in variational autoencoders by enforcing an inverse Lipschitz constraint on the decoder, realized via Brenier maps and Input Convex Neural Networks to produce an L-inverse Lipschitz function. It proves a theoretical bound showing the posterior–prior discrepancy, measured by the relative Fisher information divergence, grows with the square of the inverse-Lipschitz constant, enabling controllable latent identifiability and collapse avoidance. The authors introduce IL-LIDVAE and its variants, demonstrate controllable posterior collapse and improved performance on toy data, images, and text, and offer an annealing strategy to ease tuning. While increasing L imposes a trade-off with model flexibility and computational cost, the approach provides a simple, general, and theoretically grounded mechanism to mitigate posterior collapse across a broad class of VAEs with practical benefits for representation learning.

Abstract

Variational autoencoders (VAEs) are one of the deep generative models that have experienced enormous success over the past decades. However, in practice, they suffer from a problem called posterior collapse, which occurs when the encoder coincides, or collapses, with the prior taking no information from the latent structure of the input data into consideration. In this work, we introduce an inverse Lipschitz neural network into the decoder and, based on this architecture, provide a new method that can control in a simple and clear manner the degree of posterior collapse for a wide range of VAE models equipped with a concrete theoretical guarantee. We also illustrate the effectiveness of our method through several numerical experiments.

Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network

TL;DR

Abstract

Paper Structure (42 sections, 12 theorems, 40 equations, 6 figures, 5 tables)

This paper contains 42 sections, 12 theorems, 40 equations, 6 figures, 5 tables.

Introduction
Background and Organization
Contributions
Related Works
Organization
Notation
Preliminaries
Variational Autoencoders
Posterior Collapse
Theoretical Analysis
Assumptions and Problem Setting
Generative Model
Construction of inverse Lipschitz functions
Criterion
Theoretical Guarantee
...and 27 more sections

Key Result

Theorem 3.4

Under model model, Assumption as1 and $l=t$, the following holds for all $i$ and $\theta\in\Theta_L$:

Figures (6)

Figure 1: Accuracy of the learned posterior (left) and Relative Fisher divergence between the posterior and prior (right) for different standard deviations $\sigma$ and inverse Lipschitz constants $L$. $L=0$ is also the LIDVAE WBC2021. Posterior collapse is happening for LIDVAE but can be controlled with IL-LIDVAE. "VAE" in the legend refers to the GMVAE.
Figure 2: Posterior of GMVAE (left) and IL-LIDMVAE with $L_1=L_2=5.0$ (right) for the toy data with $\sigma=7.5$. Black points are the means of $N((0,0)^\top,\sigma^2I_2)$ and $N((10,10)^\top,\sigma^2 I_2)$, and dashed circles delimit the $2\sigma$ area of each distributions. IL-LIDMVAE performs better. See Figure \ref{['fig:aptoy']} for more data.
Figure 3: Samples of Fashion-MNIST data generated with different inverse Lipschitz parameters of IL-LIDMVAE with $c=10$, and all distributions were Gaussian. Each row corresponds to a different category. With $L_1=L_2=1.5$, we obtain the ten true classes with varied images.
Figure 4: Posterior of VAE (left), LIDVAE (middle) and IL-LIDVAE (right) for the toy data with different standard deviations. $\sigma=0.5$ (top) and $\sigma=7.5$ (bottom). The black points are the means of $N((0,0)^\top,\sigma^2I_2)$ and $N((10,10)^\top,\sigma^2 I_2)$, and the dashed circles delimit the $2\sigma$ area of each distributions.
Figure 5: Reconstruction of randomly chosen data of Fashion-MNIST for different inverse Lipschitz parameters of IL-LIDMVAE. The number of classes was set to $10$, and all distributions were Gaussian. Each row corresponds to a different category.
...and 1 more figures

Theorems & Definitions (31)

Definition 2.1: $\epsilon$-posterior collapse
Definition 2.2: posterior collapse, WBC2021
Definition 2.3: latent variable non-identifiability, WBC2021
Definition 3.1: inverse Lipschitzness
Definition 3.3
Theorem 3.4
Remark 3.5
Corollary 3.6
Theorem 3.7
Theorem 3.8
...and 21 more

Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network

TL;DR

Abstract

Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (31)