Table of Contents
Fetching ...

Residual Flows for Invertible Generative Modeling

Ricky T. Q. Chen, Jens Behrmann, David Duvenaud, Jörn-Henrik Jacobsen

TL;DR

This work tackles reliable density estimation with invertible flow models by addressing bias in density evaluation that plagued prior i-ResNets. It introduces Residual Flows, which combine an unbiased log-density estimator via a Russian roulette scheme with memory-efficient backprop through a Neumann-series-based gradient, enabling expressive Lipschitz-constrained networks to be trained with maximum likelihood. The approach is augmented with LipSwish activations and generalized induced mixed-norm Lipschitz constraints, and is demonstrated to achieve competitive density scores, high-quality samples, and strong hybrid modeling performance. Overall, the paper expands the design space of flow-based models by offering a principled, unbiased, and memory-efficient framework for learning complex distributions in high dimensions.

Abstract

Flow-based generative models parameterize probability distributions through an invertible transformation and can be trained by maximum likelihood. Invertible residual networks provide a flexible family of transformations where only Lipschitz conditions rather than strict architectural constraints are needed for enforcing invertibility. However, prior work trained invertible residual networks for density estimation by relying on biased log-density estimates whose bias increased with the network's expressiveness. We give a tractable unbiased estimate of the log density using a "Russian roulette" estimator, and reduce the memory required during training by using an alternative infinite series for the gradient. Furthermore, we improve invertible residual blocks by proposing the use of activation functions that avoid derivative saturation and generalizing the Lipschitz condition to induced mixed norms. The resulting approach, called Residual Flows, achieves state-of-the-art performance on density estimation amongst flow-based models, and outperforms networks that use coupling blocks at joint generative and discriminative modeling.

Residual Flows for Invertible Generative Modeling

TL;DR

This work tackles reliable density estimation with invertible flow models by addressing bias in density evaluation that plagued prior i-ResNets. It introduces Residual Flows, which combine an unbiased log-density estimator via a Russian roulette scheme with memory-efficient backprop through a Neumann-series-based gradient, enabling expressive Lipschitz-constrained networks to be trained with maximum likelihood. The approach is augmented with LipSwish activations and generalized induced mixed-norm Lipschitz constraints, and is demonstrated to achieve competitive density scores, high-quality samples, and strong hybrid modeling performance. Overall, the paper expands the design space of flow-based models by offering a principled, unbiased, and memory-efficient framework for learning complex distributions in high dimensions.

Abstract

Flow-based generative models parameterize probability distributions through an invertible transformation and can be trained by maximum likelihood. Invertible residual networks provide a flexible family of transformations where only Lipschitz conditions rather than strict architectural constraints are needed for enforcing invertibility. However, prior work trained invertible residual networks for density estimation by relying on biased log-density estimates whose bias increased with the network's expressiveness. We give a tractable unbiased estimate of the log density using a "Russian roulette" estimator, and reduce the memory required during training by using an alternative infinite series for the gradient. Furthermore, we improve invertible residual blocks by proposing the use of activation functions that avoid derivative saturation and generalizing the Lipschitz condition to induced mixed norms. The resulting approach, called Residual Flows, achieves state-of-the-art performance on density estimation amongst flow-based models, and outperforms networks that use coupling blocks at joint generative and discriminative modeling.

Paper Structure

This paper contains 37 sections, 3 theorems, 29 equations, 19 figures, 4 tables.

Key Result

Theorem 1

Let $f(x) = x + g(x)$ with $\mathrm{Lip}(g) < 1$ and $N$ be a random variable with support over the positive integers. Then where $n\sim p(N)$ and $v \sim \mathcal{N}(0,I)$.

Figures (19)

  • Figure 1: Pathways to designing scalable normalizing flows and their enforced Jacobian structure. Residual Flows fall under unbiased estimation with free-form Jacobian.
  • Figure 2: i-ResNets suffer from substantial bias when using expressive networks, whereas Residual Flows principledly perform maximum likelihood with unbiased stochastic gradients.
  • Figure 3: Memory usage (GB) per minibatch of 64 samples when computing $n$=10 terms in the corresponding power series. CIFAR10-small uses immediate downsampling before any residual blocks.
  • Figure 4: Common smooth Lipschitz activation functions $\phi$ usually have vanishing $\phi"$ when $\phi'$ is maximal. LipSwish has a non-vanishing $\phi"$ in the region where $\phi'$ is close to one.
  • Figure 5: Qualitative samples. Real (left) and random samples (right) from a model trained on 5bit 64$\times$64 CelebA. The most visually appealing samples were picked out of 5 random batches.
  • ...and 14 more figures

Theorems & Definitions (6)

  • Theorem 1: Unbiased log density estimator
  • Theorem 2: Unbiased log-determinant gradient estimator
  • Lemma 3: Unbiased randomized truncated series
  • proof
  • proof
  • proof