Table of Contents
Fetching ...

AdaGAN: Boosting Generative Models

Ilya Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-Johann Simon-Gabriel, Bernhard Schölkopf

TL;DR

AdaGAN introduces a boosting-style meta-algorithm that builds a strong mixture of generative models by iteratively reweighting data to focus on uncovered regions, addressing the missing modes problem in GANs.The paper develops a general f-divergence framework for additive mixtures, derives optimal update components, and proves exponential or finite convergence under various conditions, even with approximate learners.The approach is instantiated for GANs, including practical methods to compute the necessary weighting factors, and is validated with toy and MNIST experiments showing improved mode coverage and reduced variance compared to baselines.Overall, AdaGAN offers a theoretically grounded, practically effective strategy to construct diverse generative models via successive reweighted training of components.

Abstract

Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an effective method for training generative models of complex data such as natural images. However, they are notoriously hard to train and can suffer from the problem of missing modes where the model is not able to produce examples in certain regions of the space. We propose an iterative procedure, called AdaGAN, where at every step we add a new component into a mixture model by running a GAN algorithm on a reweighted sample. This is inspired by boosting algorithms, where many potentially weak individual predictors are greedily aggregated to form a strong composite predictor. We prove that such an incremental procedure leads to convergence to the true distribution in a finite number of steps if each step is optimal, and convergence at an exponential rate otherwise. We also illustrate experimentally that this procedure addresses the problem of missing modes.

AdaGAN: Boosting Generative Models

TL;DR

AdaGAN introduces a boosting-style meta-algorithm that builds a strong mixture of generative models by iteratively reweighting data to focus on uncovered regions, addressing the missing modes problem in GANs.The paper develops a general f-divergence framework for additive mixtures, derives optimal update components, and proves exponential or finite convergence under various conditions, even with approximate learners.The approach is instantiated for GANs, including practical methods to compute the necessary weighting factors, and is validated with toy and MNIST experiments showing improved mode coverage and reduced variance compared to baselines.Overall, AdaGAN offers a theoretically grounded, practically effective strategy to construct diverse generative models via successive reweighted training of components.

Abstract

Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an effective method for training generative models of complex data such as natural images. However, they are notoriously hard to train and can suffer from the problem of missing modes where the model is not able to produce examples in certain regions of the space. We propose an iterative procedure, called AdaGAN, where at every step we add a new component into a mixture model by running a GAN algorithm on a reweighted sample. This is inspired by boosting algorithms, where many potentially weak individual predictors are greedily aggregated to form a strong composite predictor. We prove that such an incremental procedure leads to convergence to the true distribution in a finite number of steps if each step is optimal, and convergence at an exponential rate otherwise. We also illustrate experimentally that this procedure addresses the problem of missing modes.

Paper Structure

This paper contains 36 sections, 14 theorems, 73 equations, 5 figures, 5 tables, 3 algorithms.

Key Result

Lemma 1

Let $f\in\mathcal{F}$. Given two distributions $P_d,P_g$ and some $\beta\in[0,1]$, for any distribution $Q$ and any distribution $R$ such that $\beta dR\le dP_d$, we have If furthermore $f\in\mathcal{F}_H$, then, for any $R$, we have

Figures (5)

  • Figure 1: A toy illustration of the missing mode problem and the effect of sample reweighting, following the discussion in Section \ref{['sec:intro-our-approach']}. On the left images, the red dots are the training (true data) points, the blue dots are points sampled from the model mixture of generators $G_t$. On the right images, the color corresponds to the weights of training points, following the reweighting scheme proposed in this work. The top row corresponds to the first iteration of AdaGAN, and the bottom row to the second iteration.
  • Figure 2: Coverage $C$ of the true data by the model distribution $P_{model}^T$, as a function of iterations $T$. Experiments correspond to the data distribution with 5 modes. Each blue point is the median over 35 runs. Green intervals are defined by the 5% and 95% percentiles (see Section \ref{['sec:metrics']}). Iteration 0 is equivalent to one vanilla GAN. The left plot corresponds to taking the best generator out of $T$ runs. The middle plot corresponds to the "ensemble GAN", simply taking a uniform mixture of $T$ independently trained GAN generators. The right plot corresponds to our boosting approach (AdaGAN), carefully reweighting the examples based on the previous generators, with $\beta_t=1/t$. Both the ensemble and boosting approaches significantly outperform the vanilla approach with few additional iterations. They also outperform taking the best out of $T$ runs. The boosting outperforms all other approaches. For AdaGAN the variance of the performance is also significantly decreased.
  • Figure 3: Digits from the MNIST dataset corresponding to the smallest ( left) and largest ( right) weights, obtained by the AdaGAN procedure (see Section \ref{['sec:adaGAN']}) in one of the runs. Bold digits (left) are already covered and next GAN will concentrate on thin (right) digits.
  • Figure 4: Comparison of AdaGAN ran with a GAN (top row) and with an unrolled GAN MPPS2017 (bottom). Coverage $C$ of the true data by the model distribution $P_{model}^T$, as a function of iterations $T$. Experiments are similar to those of Figure \ref{['fig:coverage_per_iteration']}, but with 10 modes. Top and bottom rows correspond to the usual and the unrolled GAN (with 5 unrolling steps) respectively, trained with the Jensen-Shannon objective \ref{['eq:gan']} on the left, and with the modified objective originally proposed by goodfellow2014generative on the right. In terms of computation time, one step of AdaGAN with unrolled GAN corresponds to roughly 3 steps of AdaGAN with a usual GAN. On all images $T=1$ corresponds to vanilla unrolled GAN.
  • Figure 5: AdaGAN on MNIST. Bottom row are true MNIST digits with smallest (left) and highest (right) weights after re-weighting at the end of the first AdaGAN step. Those with small weight are thick and resemble those generated by the GAN after the first AdaGAN step (top left). After training with the re-weighted dataset during the second iteration of AdaGAN, the new mixture produces more thin digits (top right).

Theorems & Definitions (21)

  • Lemma 1
  • Theorem 1
  • Remark 1
  • Theorem 2
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Lemma 2
  • Lemma 3
  • ...and 11 more