Table of Contents
Fetching ...

Generative Latent Flow

Zhisheng Xiao, Qing Yan, Yali Amit

TL;DR

GLF addresses the challenge of producing high-quality samples with auto-encoder-based generative models by placing a normalizing flow on the latent space to map to Gaussian noise, enabling end-to-end single-stage training. It combines a deterministic encoder–decoder with a bijective latent-space transform, computed via affine coupling blocks, and trains with a joint reconstruction and NLL objective while employing a stop-gradient technique to maintain stable latent representations. Empirically, GLF achieves state-of-the-art AE-based sample quality across MNIST, Fashion-MNIST, CIFAR-10, and CelebA, with faster convergence than competing methods and competitive results relative to GANs. The work clarifies connections to VAEs with flow priors and highlights the practical benefits of the stop-gradient design for reliable, efficient density matching in latent space.

Abstract

In this work, we propose the Generative Latent Flow (GLF), an algorithm for generative modeling of the data distribution. GLF uses an Auto-encoder (AE) to learn latent representations of the data, and a normalizing flow to map the distribution of the latent variables to that of simple i.i.d noise. In contrast to some other Auto-encoder based generative models, which use various regularizers that encourage the encoded latent distribution to match the prior distribution, our model explicitly constructs a mapping between these two distributions, leading to better density matching while avoiding over regularizing the latent variables. We compare our model with several related techniques, and show that it has many relative advantages including fast convergence, single stage training and minimal reconstruction trade-off. We also study the relationship between our model and its stochastic counterpart, and show that our model can be viewed as a vanishing noise limit of VAEs with flow prior. Quantitatively, under standardized evaluations, our method achieves state-of-the-art sample quality among AE based models on commonly used datasets, and is competitive with GANs' benchmarks.

Generative Latent Flow

TL;DR

GLF addresses the challenge of producing high-quality samples with auto-encoder-based generative models by placing a normalizing flow on the latent space to map to Gaussian noise, enabling end-to-end single-stage training. It combines a deterministic encoder–decoder with a bijective latent-space transform, computed via affine coupling blocks, and trains with a joint reconstruction and NLL objective while employing a stop-gradient technique to maintain stable latent representations. Empirically, GLF achieves state-of-the-art AE-based sample quality across MNIST, Fashion-MNIST, CIFAR-10, and CelebA, with faster convergence than competing methods and competitive results relative to GANs. The work clarifies connections to VAEs with flow priors and highlights the practical benefits of the stop-gradient design for reliable, efficient density matching in latent space.

Abstract

In this work, we propose the Generative Latent Flow (GLF), an algorithm for generative modeling of the data distribution. GLF uses an Auto-encoder (AE) to learn latent representations of the data, and a normalizing flow to map the distribution of the latent variables to that of simple i.i.d noise. In contrast to some other Auto-encoder based generative models, which use various regularizers that encourage the encoded latent distribution to match the prior distribution, our model explicitly constructs a mapping between these two distributions, leading to better density matching while avoiding over regularizing the latent variables. We compare our model with several related techniques, and show that it has many relative advantages including fast convergence, single stage training and minimal reconstruction trade-off. We also study the relationship between our model and its stochastic counterpart, and show that our model can be viewed as a vanishing noise limit of VAEs with flow prior. Quantitatively, under standardized evaluations, our method achieves state-of-the-art sample quality among AE based models on commonly used datasets, and is competitive with GANs' benchmarks.

Paper Structure

This paper contains 25 sections, 6 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: (a) Illustration of GLF model. The red arraw contains a stop gradient operation. See Section \ref{['strategies']}. (b) Structure of one flow block. It splits the input into two parts $y = (y_1,y_2)$, goes through the coupling layer $C$, and applies the random permutation $P$.
  • Figure 2: (a)-(e): Randomly generated samples from our method trained on different datasets. (f): Random noise interpolation on CelebA.
  • Figure 3: (a) Record of FID scores on CIFAR-10 for VAEs+flow prior with different values of $\beta$ and GLF. (b) Record of entropy losses for corresponding models. (c) Record of NLL losses for corresponding models.
  • Figure 4: (a) Record of FID scores on CIFAR-10 for regularized GLF with different values of $\beta$ and GLF. $\beta=1$ and $10$ are omitted because they leads to divergence in reconstruction loss. (b) Record of reconstruction losses for corresponding models. (c) Record of NLL losses for corresponding models.
  • Figure 5: (a)-(d) Randomly generated samples from our method with MSE loss. (e)-(h) Randomly generated samples from our method with perceptual loss.
  • ...and 2 more figures