MCGAN: Enhancing GAN Training with Regression-Based Generator Loss
Baoren Xiao, Hao Ni, Weixin Yang
TL;DR
MCGAN addresses GAN training instability by introducing a regression-based generator loss that supervises the generator through the discriminator's outputs. By replacing the standard generator objective with a regression loss $\mathcal{L}_{R}$ and using a Monte Carlo estimator for the fake-discriminator expectation, the method achieves improved stability and convergence without requiring an optimal discriminator. The paper provides theoretical foundations, including discriminability and connections to $f$-divergence, and demonstrates consistent, substantial improvements across image, video, and time-series tasks on multiple datasets and backbones. This approach offers a flexible, broadly applicable enhancement to GAN training with practical impact for high-fidelity generation and robust latent representations.
Abstract
Generative adversarial networks (GANs) have emerged as a powerful tool for generating high-fidelity data. However, the main bottleneck of existing approaches is the lack of supervision on the generator training, which often results in undamped oscillation and unsatisfactory performance. To address this issue, we propose an algorithm called Monte Carlo GAN (MCGAN). This approach, utilizing an innovative generative loss function, termly the regression loss, reformulates the generator training as a regression task and enables the generator training by minimizing the mean squared error between the discriminator's output of real data and the expected discriminator of fake data. We demonstrate the desirable analytic properties of the regression loss, including discriminability and optimality, and show that our method requires a weaker condition on the discriminator for effective generator training. These properties justify the strength of this approach to improve the training stability while retaining the optimality of GAN by leveraging strong supervision of the regression loss. Extensive experiments on diverse datasets, including image data (CIFAR-10/100, FFHQ256, ImageNet, and LSUN Bedroom), time series data (VAR and stock data) and video data, are conducted to demonstrate the flexibility and effectiveness of our proposed MCGAN. Numerical results show that the proposed MCGAN is versatile in enhancing a variety of backbone GAN models and achieves consistent and significant improvement in terms of quality, accuracy, training stability, and learned latent space.
