Table of Contents
Fetching ...

MCGAN: Enhancing GAN Training with Regression-Based Generator Loss

Baoren Xiao, Hao Ni, Weixin Yang

TL;DR

MCGAN addresses GAN training instability by introducing a regression-based generator loss that supervises the generator through the discriminator's outputs. By replacing the standard generator objective with a regression loss $\mathcal{L}_{R}$ and using a Monte Carlo estimator for the fake-discriminator expectation, the method achieves improved stability and convergence without requiring an optimal discriminator. The paper provides theoretical foundations, including discriminability and connections to $f$-divergence, and demonstrates consistent, substantial improvements across image, video, and time-series tasks on multiple datasets and backbones. This approach offers a flexible, broadly applicable enhancement to GAN training with practical impact for high-fidelity generation and robust latent representations.

Abstract

Generative adversarial networks (GANs) have emerged as a powerful tool for generating high-fidelity data. However, the main bottleneck of existing approaches is the lack of supervision on the generator training, which often results in undamped oscillation and unsatisfactory performance. To address this issue, we propose an algorithm called Monte Carlo GAN (MCGAN). This approach, utilizing an innovative generative loss function, termly the regression loss, reformulates the generator training as a regression task and enables the generator training by minimizing the mean squared error between the discriminator's output of real data and the expected discriminator of fake data. We demonstrate the desirable analytic properties of the regression loss, including discriminability and optimality, and show that our method requires a weaker condition on the discriminator for effective generator training. These properties justify the strength of this approach to improve the training stability while retaining the optimality of GAN by leveraging strong supervision of the regression loss. Extensive experiments on diverse datasets, including image data (CIFAR-10/100, FFHQ256, ImageNet, and LSUN Bedroom), time series data (VAR and stock data) and video data, are conducted to demonstrate the flexibility and effectiveness of our proposed MCGAN. Numerical results show that the proposed MCGAN is versatile in enhancing a variety of backbone GAN models and achieves consistent and significant improvement in terms of quality, accuracy, training stability, and learned latent space.

MCGAN: Enhancing GAN Training with Regression-Based Generator Loss

TL;DR

MCGAN addresses GAN training instability by introducing a regression-based generator loss that supervises the generator through the discriminator's outputs. By replacing the standard generator objective with a regression loss and using a Monte Carlo estimator for the fake-discriminator expectation, the method achieves improved stability and convergence without requiring an optimal discriminator. The paper provides theoretical foundations, including discriminability and connections to -divergence, and demonstrates consistent, substantial improvements across image, video, and time-series tasks on multiple datasets and backbones. This approach offers a flexible, broadly applicable enhancement to GAN training with practical impact for high-fidelity generation and robust latent representations.

Abstract

Generative adversarial networks (GANs) have emerged as a powerful tool for generating high-fidelity data. However, the main bottleneck of existing approaches is the lack of supervision on the generator training, which often results in undamped oscillation and unsatisfactory performance. To address this issue, we propose an algorithm called Monte Carlo GAN (MCGAN). This approach, utilizing an innovative generative loss function, termly the regression loss, reformulates the generator training as a regression task and enables the generator training by minimizing the mean squared error between the discriminator's output of real data and the expected discriminator of fake data. We demonstrate the desirable analytic properties of the regression loss, including discriminability and optimality, and show that our method requires a weaker condition on the discriminator for effective generator training. These properties justify the strength of this approach to improve the training stability while retaining the optimality of GAN by leveraging strong supervision of the regression loss. Extensive experiments on diverse datasets, including image data (CIFAR-10/100, FFHQ256, ImageNet, and LSUN Bedroom), time series data (VAR and stock data) and video data, are conducted to demonstrate the flexibility and effectiveness of our proposed MCGAN. Numerical results show that the proposed MCGAN is versatile in enhancing a variety of backbone GAN models and achieves consistent and significant improvement in terms of quality, accuracy, training stability, and learned latent space.
Paper Structure (67 sections, 3 theorems, 38 equations, 15 figures, 13 tables, 1 algorithm)

This paper contains 67 sections, 3 theorems, 38 equations, 15 figures, 13 tables, 1 algorithm.

Key Result

Theorem 1

Assume Assumption assumption:H holds, and let $\phi'_{\cdot,\cdot}:\mathcal{P(X)}\times\mathcal{P(X)}\rightarrow\Phi$ be a parameterization map such that $D^{\phi'_{\cdot,\cdot}}:\mathcal{P}(\mathcal{X}) \times \mathcal{P}(\mathcal{X}) \times \mathcal{X}\rightarrow \mathbb{R}$ has discriminability,

Figures (15)

  • Figure 1: Dirac-GAN example
  • Figure 2: The learning curves in terms of (a) Fréchet Inception Distance and (b)Inception Score along the training on the CIFAR-10 dataset using BigGAN with different loss combinations.
  • Figure 3: CIFAR-10 samples generated by the BigGAN backbone trained via Hinge + DiffAug + MC. Images in each row belong to one of the 10 classes. Images misclassified by ResNet-50 are in red boxes.
  • Figure 4: Latent space interpolation based on cStyleGAN2 backbone trained via Hinge loss w/o and with our MC method. Red and yellow boxes highlight two types of undesirable transitions between generated images.
  • Figure 5: Results of predicting the next frame given the past 5 frames using ConvLSTM w/o and with our MC method.
  • ...and 10 more figures

Theorems & Definitions (8)

  • Example 1
  • Definition 1: Discriminability
  • Theorem 1
  • Lemma 1
  • Theorem 2
  • proof
  • proof
  • proof