Table of Contents
Fetching ...

How to train your VAE

Mariano Rivera

TL;DR

This paper explores a nuanced aspect of VAEs, focusing on interpreting the Kullback-Leibler (KL) Divergence, a critical component within the Evidence Lower Bound (ELBO) that governs the trade-off between reconstruction accuracy and regularization.

Abstract

Variational Autoencoders (VAEs) have become a cornerstone in generative modeling and representation learning within machine learning. This paper explores a nuanced aspect of VAEs, focusing on interpreting the Kullback-Leibler (KL) Divergence, a critical component within the Evidence Lower Bound (ELBO) that governs the trade-off between reconstruction accuracy and regularization. Meanwhile, the KL Divergence enforces alignment between latent variable distributions and a prior imposing a structure on the overall latent space but leaves individual variable distributions unconstrained. The proposed method redefines the ELBO with a mixture of Gaussians for the posterior probability, introduces a regularization term to prevent variance collapse, and employs a PatchGAN discriminator to enhance texture realism. Implementation details involve ResNetV2 architectures for both the Encoder and Decoder. The experiments demonstrate the ability to generate realistic faces, offering a promising solution for enhancing VAE-based generative models.

How to train your VAE

TL;DR

This paper explores a nuanced aspect of VAEs, focusing on interpreting the Kullback-Leibler (KL) Divergence, a critical component within the Evidence Lower Bound (ELBO) that governs the trade-off between reconstruction accuracy and regularization.

Abstract

Variational Autoencoders (VAEs) have become a cornerstone in generative modeling and representation learning within machine learning. This paper explores a nuanced aspect of VAEs, focusing on interpreting the Kullback-Leibler (KL) Divergence, a critical component within the Evidence Lower Bound (ELBO) that governs the trade-off between reconstruction accuracy and regularization. Meanwhile, the KL Divergence enforces alignment between latent variable distributions and a prior imposing a structure on the overall latent space but leaves individual variable distributions unconstrained. The proposed method redefines the ELBO with a mixture of Gaussians for the posterior probability, introduces a regularization term to prevent variance collapse, and employs a PatchGAN discriminator to enhance texture realism. Implementation details involve ResNetV2 architectures for both the Encoder and Decoder. The experiments demonstrate the ability to generate realistic faces, offering a promising solution for enhancing VAE-based generative models.
Paper Structure (9 sections, 20 equations, 6 figures, 1 algorithm)

This paper contains 9 sections, 20 equations, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: General scheme of a VAE
  • Figure 2: Illustration of the global posterior $q_{\phi}$ as a product of individual posteriors $q_{\phi_i}$.
  • Figure 3: Residual-VAE components. (a) Residual encoder, (b) Gaussian sampler, and (c) residual decoder. In the presented experiments, we use an architecture with six residuals blocks ($R=6$) with two identity blocks each one ($r=2$).
  • Figure 4: Generated variants with the VAE using the proposed training scheme. Varying $z_0$ and $z_1$ according $\delta_k \times \sigma^{(i)}$ for $\delta=[-20,0,20]$.
  • Figure 5: Transitions generated with VAEs by convex combinations of latent variables: $\hat{{\mathbf x}} \sim p_\theta({\mathbf x} | {\mathbf z} = \alpha \,{\mathbf z}^{(i)} + (1-\alpha) \,{\mathbf z}^{(j)} )$, for $\alpha \in [0,1]$. First row, VAE trained with the proposed scheme. Second row, the same VAE model trained with the standard training strategy: $\beta =[1,5000]$ in \ref{['eq:ELBO_KL']}-\ref{['eq:ELBO_l']}, and $L_1$ norm for the negative log-likelihood.
  • ...and 1 more figures