Table of Contents
Fetching ...

Multi-Agent Diverse Generative Adversarial Networks

Arnab Ghosh, Viveka Kulharia, Vinay Namboodiri, Philip H. S. Torr, Puneet K. Dokania

TL;DR

MAD-GAN introduces a multi-generator, single-discriminator framework to mitigate mode collapse by making the discriminator identify which generator produced each fake sample, effectively enforcing diversity. The approach yields a mixture-model interpretation with a proven optimality condition and demonstrates superior mode coverage and sample diversity across synthetic benchmarks, image-to-image translation, diverse-class data, and unsupervised representation learning. Extensions such as MAD-GAN-Sim further promote diversity via similarity-based objectives, broadening applicability to high-dimensional generation tasks. Overall, the work provides both theoretical insights and strong empirical evidence for using multiple generators with a generator-identification discriminator to capture rich, multimodal data distributions.

Abstract

We propose MAD-GAN, an intuitive generalization to the Generative Adversarial Networks (GANs) and its conditional variants to address the well known problem of mode collapse. First, MAD-GAN is a multi-agent GAN architecture incorporating multiple generators and one discriminator. Second, to enforce that different generators capture diverse high probability modes, the discriminator of MAD-GAN is designed such that along with finding the real and fake samples, it is also required to identify the generator that generated the given fake sample. Intuitively, to succeed in this task, the discriminator must learn to push different generators towards different identifiable modes. We perform extensive experiments on synthetic and real datasets and compare MAD-GAN with different variants of GAN. We show high quality diverse sample generations for challenging tasks such as image-to-image translation and face generation. In addition, we also show that MAD-GAN is able to disentangle different modalities when trained using highly challenging diverse-class dataset (e.g. dataset with images of forests, icebergs, and bedrooms). In the end, we show its efficacy on the unsupervised feature representation task. In Appendix, we introduce a similarity based competing objective (MAD-GAN-Sim) which encourages different generators to generate diverse samples based on a user defined similarity metric. We show its performance on the image-to-image translation, and also show its effectiveness on the unsupervised feature representation task.

Multi-Agent Diverse Generative Adversarial Networks

TL;DR

MAD-GAN introduces a multi-generator, single-discriminator framework to mitigate mode collapse by making the discriminator identify which generator produced each fake sample, effectively enforcing diversity. The approach yields a mixture-model interpretation with a proven optimality condition and demonstrates superior mode coverage and sample diversity across synthetic benchmarks, image-to-image translation, diverse-class data, and unsupervised representation learning. Extensions such as MAD-GAN-Sim further promote diversity via similarity-based objectives, broadening applicability to high-dimensional generation tasks. Overall, the work provides both theoretical insights and strong empirical evidence for using multiple generators with a generator-identification discriminator to capture rich, multimodal data distributions.

Abstract

We propose MAD-GAN, an intuitive generalization to the Generative Adversarial Networks (GANs) and its conditional variants to address the well known problem of mode collapse. First, MAD-GAN is a multi-agent GAN architecture incorporating multiple generators and one discriminator. Second, to enforce that different generators capture diverse high probability modes, the discriminator of MAD-GAN is designed such that along with finding the real and fake samples, it is also required to identify the generator that generated the given fake sample. Intuitively, to succeed in this task, the discriminator must learn to push different generators towards different identifiable modes. We perform extensive experiments on synthetic and real datasets and compare MAD-GAN with different variants of GAN. We show high quality diverse sample generations for challenging tasks such as image-to-image translation and face generation. In addition, we also show that MAD-GAN is able to disentangle different modalities when trained using highly challenging diverse-class dataset (e.g. dataset with images of forests, icebergs, and bedrooms). In the end, we show its efficacy on the unsupervised feature representation task. In Appendix, we introduce a similarity based competing objective (MAD-GAN-Sim) which encourages different generators to generate diverse samples based on a user defined similarity metric. We show its performance on the image-to-image translation, and also show its effectiveness on the unsupervised feature representation task.

Paper Structure

This paper contains 53 sections, 3 theorems, 15 equations, 13 figures, 13 tables, 1 algorithm.

Key Result

Theorem 1

Given the optimal discriminator, the objective for training the generators boils down to minimizing where, $p_{avg} (x) = \frac{p_d(x) + \sum_{i=1}^k p_{g_i}(x)}{k+1}$. The above objective function obtains its global minimum if $p_d = \frac{1}{k} \sum_{i=1}^k p_{g_i}$ with the objective value of $-(k+1) \log (k+1) + k \log k$.

Figures (13)

  • Figure 1: Diverse-class data generation using MAD-GAN. Diverse-class dataset contains images from different classes/modalities (in this case, forests, icebergs, and bedrooms). Each row represents generations by a particular generator and each column represents generations for a given random noise input $z$. As shown, once trained using this dataset, generators of MAD-GAN are able to disentangle different modalities, hence, each generator is able to generate images from a particular modality.
  • Figure 2: Multi-Agent Diverse GAN (MAD-GAN). The discriminator outputs $k+1$ softmax scores signifying the probability of its input sample being from either one of the $k$ generators or the real distribution.
  • Figure 3: Visualization of different generators getting pushed towards different modes. Here, $M_1$ and $M_2$ could be a cluster of modes where each cluster itself contains many modes. The arrows abstractly represent generator specific gradients for the purpose of building intuition.
  • Figure 4: A toy example to understand the behaviour of different GAN variants in order to compare with MAD-GAN (each method was trained for 198000 iterations). The orange bars show the density estimate of the training data and the blue ones for the generated data points. After careful cross-validation, we chose the bin size of $0.1$.
  • Figure 5: A toy example to understand the behavior of MAD-GAN with different number of generators (each method was trained for $1,98,000$ iterations). The orange bars show the density estimate of the training data and the blue ones for the generated data points. After careful cross-validation, we chose the bin size of $0.1$.
  • ...and 8 more figures

Theorems & Definitions (6)

  • Theorem 1
  • proof
  • Corollary 1
  • proof
  • Proposition 1
  • proof