Table of Contents
Fetching ...

Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network

Yuri Kinoshita, Kenta Oono, Kenji Fukumizu, Yuichi Yoshida, Shin-ichi Maeda

TL;DR

The paper tackles posterior collapse in variational autoencoders by enforcing an inverse Lipschitz constraint on the decoder, realized via Brenier maps and Input Convex Neural Networks to produce an L-inverse Lipschitz function. It proves a theoretical bound showing the posterior–prior discrepancy, measured by the relative Fisher information divergence, grows with the square of the inverse-Lipschitz constant, enabling controllable latent identifiability and collapse avoidance. The authors introduce IL-LIDVAE and its variants, demonstrate controllable posterior collapse and improved performance on toy data, images, and text, and offer an annealing strategy to ease tuning. While increasing L imposes a trade-off with model flexibility and computational cost, the approach provides a simple, general, and theoretically grounded mechanism to mitigate posterior collapse across a broad class of VAEs with practical benefits for representation learning.

Abstract

Variational autoencoders (VAEs) are one of the deep generative models that have experienced enormous success over the past decades. However, in practice, they suffer from a problem called posterior collapse, which occurs when the encoder coincides, or collapses, with the prior taking no information from the latent structure of the input data into consideration. In this work, we introduce an inverse Lipschitz neural network into the decoder and, based on this architecture, provide a new method that can control in a simple and clear manner the degree of posterior collapse for a wide range of VAE models equipped with a concrete theoretical guarantee. We also illustrate the effectiveness of our method through several numerical experiments.

Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network

TL;DR

The paper tackles posterior collapse in variational autoencoders by enforcing an inverse Lipschitz constraint on the decoder, realized via Brenier maps and Input Convex Neural Networks to produce an L-inverse Lipschitz function. It proves a theoretical bound showing the posterior–prior discrepancy, measured by the relative Fisher information divergence, grows with the square of the inverse-Lipschitz constant, enabling controllable latent identifiability and collapse avoidance. The authors introduce IL-LIDVAE and its variants, demonstrate controllable posterior collapse and improved performance on toy data, images, and text, and offer an annealing strategy to ease tuning. While increasing L imposes a trade-off with model flexibility and computational cost, the approach provides a simple, general, and theoretically grounded mechanism to mitigate posterior collapse across a broad class of VAEs with practical benefits for representation learning.

Abstract

Variational autoencoders (VAEs) are one of the deep generative models that have experienced enormous success over the past decades. However, in practice, they suffer from a problem called posterior collapse, which occurs when the encoder coincides, or collapses, with the prior taking no information from the latent structure of the input data into consideration. In this work, we introduce an inverse Lipschitz neural network into the decoder and, based on this architecture, provide a new method that can control in a simple and clear manner the degree of posterior collapse for a wide range of VAE models equipped with a concrete theoretical guarantee. We also illustrate the effectiveness of our method through several numerical experiments.
Paper Structure (42 sections, 12 theorems, 40 equations, 6 figures, 5 tables)

This paper contains 42 sections, 12 theorems, 40 equations, 6 figures, 5 tables.

Key Result

Theorem 3.4

Under model model, Assumption as1 and $l=t$, the following holds for all $i$ and $\theta\in\Theta_L$:

Figures (6)

  • Figure 1: Accuracy of the learned posterior (left) and Relative Fisher divergence between the posterior and prior (right) for different standard deviations $\sigma$ and inverse Lipschitz constants $L$. $L=0$ is also the LIDVAE WBC2021. Posterior collapse is happening for LIDVAE but can be controlled with IL-LIDVAE. "VAE" in the legend refers to the GMVAE.
  • Figure 2: Posterior of GMVAE (left) and IL-LIDMVAE with $L_1=L_2=5.0$ (right) for the toy data with $\sigma=7.5$. Black points are the means of $N((0,0)^\top,\sigma^2I_2)$ and $N((10,10)^\top,\sigma^2 I_2)$, and dashed circles delimit the $2\sigma$ area of each distributions. IL-LIDMVAE performs better. See Figure \ref{['fig:aptoy']} for more data.
  • Figure 3: Samples of Fashion-MNIST data generated with different inverse Lipschitz parameters of IL-LIDMVAE with $c=10$, and all distributions were Gaussian. Each row corresponds to a different category. With $L_1=L_2=1.5$, we obtain the ten true classes with varied images.
  • Figure 4: Posterior of VAE (left), LIDVAE (middle) and IL-LIDVAE (right) for the toy data with different standard deviations. $\sigma=0.5$ (top) and $\sigma=7.5$ (bottom). The black points are the means of $N((0,0)^\top,\sigma^2I_2)$ and $N((10,10)^\top,\sigma^2 I_2)$, and the dashed circles delimit the $2\sigma$ area of each distributions.
  • Figure 5: Reconstruction of randomly chosen data of Fashion-MNIST for different inverse Lipschitz parameters of IL-LIDMVAE. The number of classes was set to $10$, and all distributions were Gaussian. Each row corresponds to a different category.
  • ...and 1 more figures

Theorems & Definitions (31)

  • Definition 2.1: $\epsilon$-posterior collapse
  • Definition 2.2: posterior collapse, WBC2021
  • Definition 2.3: latent variable non-identifiability, WBC2021
  • Definition 3.1: inverse Lipschitzness
  • Definition 3.3
  • Theorem 3.4
  • Remark 3.5
  • Corollary 3.6
  • Theorem 3.7
  • Theorem 3.8
  • ...and 21 more