Table of Contents
Fetching ...

UCD: Unconditional Discriminator Promotes Nash Equilibrium in GANs

Mengfei Xia, Nan Xue, Jiapeng Zhu, Yujun Shen

TL;DR

The paper addresses instability and mode collapse in GANs by examining Nash equilibrium, revealing that conditional discrimination introduces shortcuts that impair learning. It proposes Unconditional Discriminator (UCD) with Config B, and enhances robustness via a DINO-inspired loss in Config C, providing a theoretical guarantee that convergence yields $p_g(\mathbf x|c)=q(\mathbf x|c)$ and a practical, plug-in method for improved synthesis. Empirically, UCD achieves substantial gains on ImageNet-64, including a 1.47 FID that surpasses StyleGAN-XL, with improved precision and recall and minimal computational overhead. The approach offers a new lens on discriminator design to stabilize adversarial training and bolster one-step generation, with potential extensions to text-conditioned and diffusion-based distillation settings.

Abstract

Adversarial training turns out to be the key to one-step generation, especially for Generative Adversarial Network (GAN) and diffusion model distillation. Yet in practice, GAN training hardly converges properly and struggles in mode collapse. In this work, we quantitatively analyze the extent of Nash equilibrium in GAN training, and conclude that redundant shortcuts by inputting condition in $D$ disables meaningful knowledge extraction. We thereby propose to employ an unconditional discriminator (UCD), in which $D$ is enforced to extract more comprehensive and robust features with no condition injection. In this way, $D$ is able to leverage better knowledge to supervise $G$, which promotes Nash equilibrium in GAN literature. Theoretical guarantee on compatibility with vanilla GAN theory indicates that UCD can be implemented in a plug-in manner. Extensive experiments confirm the significant performance improvements with high efficiency. For instance, we achieved \textbf{1.47 FID} on the ImageNet-64 dataset, surpassing StyleGAN-XL and several state-of-the-art one-step diffusion models. The code will be made publicly available.

UCD: Unconditional Discriminator Promotes Nash Equilibrium in GANs

TL;DR

The paper addresses instability and mode collapse in GANs by examining Nash equilibrium, revealing that conditional discrimination introduces shortcuts that impair learning. It proposes Unconditional Discriminator (UCD) with Config B, and enhances robustness via a DINO-inspired loss in Config C, providing a theoretical guarantee that convergence yields and a practical, plug-in method for improved synthesis. Empirically, UCD achieves substantial gains on ImageNet-64, including a 1.47 FID that surpasses StyleGAN-XL, with improved precision and recall and minimal computational overhead. The approach offers a new lens on discriminator design to stabilize adversarial training and bolster one-step generation, with potential extensions to text-conditioned and diffusion-based distillation settings.

Abstract

Adversarial training turns out to be the key to one-step generation, especially for Generative Adversarial Network (GAN) and diffusion model distillation. Yet in practice, GAN training hardly converges properly and struggles in mode collapse. In this work, we quantitatively analyze the extent of Nash equilibrium in GAN training, and conclude that redundant shortcuts by inputting condition in disables meaningful knowledge extraction. We thereby propose to employ an unconditional discriminator (UCD), in which is enforced to extract more comprehensive and robust features with no condition injection. In this way, is able to leverage better knowledge to supervise , which promotes Nash equilibrium in GAN literature. Theoretical guarantee on compatibility with vanilla GAN theory indicates that UCD can be implemented in a plug-in manner. Extensive experiments confirm the significant performance improvements with high efficiency. For instance, we achieved \textbf{1.47 FID} on the ImageNet-64 dataset, surpassing StyleGAN-XL and several state-of-the-art one-step diffusion models. The code will be made publicly available.

Paper Structure

This paper contains 19 sections, 1 theorem, 12 equations, 5 figures, 4 tables, 6 algorithms.

Key Result

Theorem 1

Let $c$ be the corresponding condition of $\mathbf x$, then the $c$-th component of the optimal $d$ training with eq:ucd_losseq:ucd_g_losseq:ucd_d_loss equals to $D^*(\mathbf x,c)$ in eq:nash_equilibrium_d. Therefore when training converges we have $p_g(\mathbf x|c)=q(\mathbf x|c)$, i.e., Nash equil

Figures (5)

  • Figure 1: Visualization of Nash equilibrium across different models. We integrate $D$ every several iterations into a classification task without further fine-tuning according to \ref{['eq:classification']}. A higher classification accuracy suggests better Nash equilibrium. We use Config A (see \ref{['sec:exp.1']}) as the baseline (i.e., the curve in red). As a comparison, our UCD is capable of consistently improving Nash equilibrium (i.e., the curve in blue and the curve in green, respectively). To make a further step, UCD under Config C (\ref{['sec:method.4']}) achieves more robust $D$, thus better Nash equilibrium. We report both the smoothed values (darker-color curve) and the original values (lighter-color curve) for clearer demonstration, and the horizontal axis suggests the training progress.
  • Figure 2: Diverse results generated by UCD on ImageNet 64 dataset deng2009imagenet. We randomly sample eight global latent codes $\mathbf z$ for each label condition $c$, demonstrated in each row.
  • Figure 3: Comparison of $D$ backbone between Config B and Config C during training. We freeze $D$ backbone and train a linear classification head upon it every several iterations. A higher classification accuracy suggests better and more robust $D$ backbone. Compared to Config B (i.e. with blue curve, DINO-alike loss in Config C with green curve enables more robust $D$ backbone. We report both the smoothed values (darker-color curve) and the original values (lighter-color curve) for clearer demonstration, and the horizontal axis suggests the training progress.
  • Figure S1: Diverse results generated by UCD on ImageNet 64 dataset deng2009imagenet. We randomly sample six global latent codes $\mathbf z$ for each label condition $c$, demonstrated in each row.
  • Figure S2: Diverse results generated by UCD on ImageNet 64 dataset deng2009imagenet. We randomly sample six global latent codes $\mathbf z$ for each label condition $c$, demonstrated in each row.

Theorems & Definitions (2)

  • Theorem 1
  • proof