Table of Contents
Fetching ...

The GAN is dead; long live the GAN! A Modern GAN Baseline

Yiwen Huang, Aaron Gokaslan, Volodymyr Kuleshov, James Tompkin

TL;DR

The paper argues that GAN training does not need to be brittle if paired with a well-behaved loss. It introduces RpGAN, a regularized relativistic GAN objective, and proves local convergence when combined with zero-centered gradient penalties $R_1$ and $R_2$, addressing mode dropping and instability. Building on this, the authors present a roadmap to a minimalist baseline, R3GAN, by stripping StyleGAN2 tricks and adopting modern backbones (ConvNeXt/ResNet) under Configs B–E, achieving substantial FID improvements. Empirically, R3GAN with the final Config E configuration surpasses StyleGAN2 and competes with diffusion models across FFHQ, CIFAR-10, ImageNet, and Stacked MNIST while maintaining a lean parameter budget. The work advocates a simpler, principled GAN foundation capable of scaling with modern architectures, while noting limitations and directions for future improvements.

Abstract

There is a widely-spread claim that GANs are difficult to train, and GAN architectures in the literature are littered with empirical tricks. We provide evidence against this claim and build a modern GAN baseline in a more principled manner. First, we derive a well-behaved regularized relativistic GAN loss that addresses issues of mode dropping and non-convergence that were previously tackled via a bag of ad-hoc tricks. We analyze our loss mathematically and prove that it admits local convergence guarantees, unlike most existing relativistic losses. Second, our new loss allows us to discard all ad-hoc tricks and replace outdated backbones used in common GANs with modern architectures. Using StyleGAN2 as an example, we present a roadmap of simplification and modernization that results in a new minimalist baseline -- R3GAN. Despite being simple, our approach surpasses StyleGAN2 on FFHQ, ImageNet, CIFAR, and Stacked MNIST datasets, and compares favorably against state-of-the-art GANs and diffusion models.

The GAN is dead; long live the GAN! A Modern GAN Baseline

TL;DR

The paper argues that GAN training does not need to be brittle if paired with a well-behaved loss. It introduces RpGAN, a regularized relativistic GAN objective, and proves local convergence when combined with zero-centered gradient penalties and , addressing mode dropping and instability. Building on this, the authors present a roadmap to a minimalist baseline, R3GAN, by stripping StyleGAN2 tricks and adopting modern backbones (ConvNeXt/ResNet) under Configs B–E, achieving substantial FID improvements. Empirically, R3GAN with the final Config E configuration surpasses StyleGAN2 and competes with diffusion models across FFHQ, CIFAR-10, ImageNet, and Stacked MNIST while maintaining a lean parameter budget. The work advocates a simpler, principled GAN foundation capable of scaling with modern architectures, while noting limitations and directions for future improvements.

Abstract

There is a widely-spread claim that GANs are difficult to train, and GAN architectures in the literature are littered with empirical tricks. We provide evidence against this claim and build a modern GAN baseline in a more principled manner. First, we derive a well-behaved regularized relativistic GAN loss that addresses issues of mode dropping and non-convergence that were previously tackled via a bag of ad-hoc tricks. We analyze our loss mathematically and prove that it admits local convergence guarantees, unlike most existing relativistic losses. Second, our new loss allows us to discard all ad-hoc tricks and replace outdated backbones used in common GANs with modern architectures. Using StyleGAN2 as an example, we present a roadmap of simplification and modernization that results in a new minimalist baseline -- R3GAN. Despite being simple, our approach surpasses StyleGAN2 on FFHQ, ImageNet, CIFAR, and Stacked MNIST datasets, and compares favorably against state-of-the-art GANs and diffusion models.
Paper Structure (59 sections, 32 equations, 20 figures, 4 tables)

This paper contains 59 sections, 32 equations, 20 figures, 4 tables.

Figures (20)

  • Figure 1: Generator $G$ loss for different objectives over training. Regardless of which objective is used, training diverges with only $R_1$ and succeeded with both $R_1$ and $R_2$. Convergence failure with only $R_1$ was noted by Lee et al. vitgan.
  • Figure 2: StackedMNIST pacgan result for each loss function. The maximum possible mode coverage is 1000. "Fail" indicates that training diverged early on.
  • Figure 3: Architecture comparison. For image generation, $G$ and $D$ are often both deep ConvNets with either partially or fully symmetric architectures. (a) StyleGAN2 sg2$G$ uses a network to map noise vector $z$ to an intermediate style space $\mathcal{W}$. We use a traditional generator as style mapping is not necessary for a minimal working model. (b) StyleGAN2's building blocks have intricate layers but are themselves simple, with a ConvNet architecture from 2015 alexnetvggresnet. ResNet's identity mapping principle is also violated in the discriminator. (c) We remove tricks and modernize the architecture. Our design has clean layers with a more powerful ConvNet architecture.
  • Figure 4: FFHQ-256. * denotes models that leak ImageNet features.
  • Figure 5: FFHQ-64.
  • ...and 15 more figures